You want to convert a JSON file that Unstructured produces into a separate JSON file that uses a different JSON schema than the one that Unstructured uses.
Use a Python package such as json-converter in your Python code project to transform your source JSON file into a target JSON file that conforms to your own schema.
json-converter
package is not owned or supported by Unstructured. For questions and
requests, see the Issues tab of the
json-converter
repository in GitHub.Install dependencies
In your local Python code project, install the json-converter package.
Identify the JSON file to transform
Find the local source JSON file that you want to transform.
Note the JSON field names and structures that you want to transform. For example, the JSON file might look like the following (the ellipses indicate content omitted for brevity):
Create the JSON field mappings file
Decide what you want the JSON schema in the transformed file to look like. For example, the transformed JSON file might look like the following (the ellipses indicate content omitted for brevity):
Create the JSON field mappings file, for example:
This file declares the following mappings:
type
field is renamed to content_type
.element_id
field is renamed to content_id
.text
field is renamed to content
.page_number
field nested inside metadata
is renamed to page
and is nested inside content_properties
.filetype
, languages
, and filename
) are dropped.For more information about the format of this JSON field mappings file, see the
Project Description in the json-converter
page on PyPI or the
README in the json-converter
repository in GitHub.
Add and run the transform code
Set the following local environment variables:
LOCAL_FILE_INPUT_PATH
to the local path to the source JSON file.LOCAL_FILE_OUTPUT_PATH
to the local path to the target JSON file.LOCAL_FIELD_MAPPINGS_PATH
to the local path to the JSON field mappings file.Add the following Python code file to your project:
Run the Python code file.
Check the path specified by LOCAL_FILE_OUTPUT_PATH
for the transformed JSON file.
Issue: When you run your Python code file, the following error message appears: “ImportError: cannot import name ‘Mapping’ from ‘collections’”.
Cause: When you use the json-converter
package with newer versions of Python such as 3.11 and later,
Python tries to use an outdated import in this json-converter
package.
Solution: Update the json-converter
package’s source code to use a different import, as follows:
In your Python project, find the json-converter
package’s source location, by running the pip show
command:
Note the path in the Location field.
Use your code editor to the open the path to the json-converter
package’s source code.
In the source code, open the file named json_mapper.py
.
Change the following line of code…
…to the following line of code, by adding .abc
:
Save this source code file.
Run your Python code file again.
You want to convert a JSON file that Unstructured produces into a separate JSON file that uses a different JSON schema than the one that Unstructured uses.
Use a Python package such as json-converter in your Python code project to transform your source JSON file into a target JSON file that conforms to your own schema.
json-converter
package is not owned or supported by Unstructured. For questions and
requests, see the Issues tab of the
json-converter
repository in GitHub.Install dependencies
In your local Python code project, install the json-converter package.
Identify the JSON file to transform
Find the local source JSON file that you want to transform.
Note the JSON field names and structures that you want to transform. For example, the JSON file might look like the following (the ellipses indicate content omitted for brevity):
Create the JSON field mappings file
Decide what you want the JSON schema in the transformed file to look like. For example, the transformed JSON file might look like the following (the ellipses indicate content omitted for brevity):
Create the JSON field mappings file, for example:
This file declares the following mappings:
type
field is renamed to content_type
.element_id
field is renamed to content_id
.text
field is renamed to content
.page_number
field nested inside metadata
is renamed to page
and is nested inside content_properties
.filetype
, languages
, and filename
) are dropped.For more information about the format of this JSON field mappings file, see the
Project Description in the json-converter
page on PyPI or the
README in the json-converter
repository in GitHub.
Add and run the transform code
Set the following local environment variables:
LOCAL_FILE_INPUT_PATH
to the local path to the source JSON file.LOCAL_FILE_OUTPUT_PATH
to the local path to the target JSON file.LOCAL_FIELD_MAPPINGS_PATH
to the local path to the JSON field mappings file.Add the following Python code file to your project:
Run the Python code file.
Check the path specified by LOCAL_FILE_OUTPUT_PATH
for the transformed JSON file.
Issue: When you run your Python code file, the following error message appears: “ImportError: cannot import name ‘Mapping’ from ‘collections’”.
Cause: When you use the json-converter
package with newer versions of Python such as 3.11 and later,
Python tries to use an outdated import in this json-converter
package.
Solution: Update the json-converter
package’s source code to use a different import, as follows:
In your Python project, find the json-converter
package’s source location, by running the pip show
command:
Note the path in the Location field.
Use your code editor to the open the path to the json-converter
package’s source code.
In the source code, open the file named json_mapper.py
.
Change the following line of code…
…to the following line of code, by adding .abc
:
Save this source code file.
Run your Python code file again.