Task
You want to get, manipulate, and print or save, the contents of the document elements and metadata from the processed data that Unstructured returns.Approach
Each element in the document elements contains fields for that element’s type, its ID, the extracted text, and associated metadata. The programmatic approach you take to get these document elements will depend on which tool, SDK, or library you use:Ingest CLI
Ingest CLI
For the Unstructured Ingest CLI, you can use a tool such as jq
to work with a JSON file that the CLI outputs after the processing is complete.For example, the following script uses
jq
to access and print each element’s ID, text, and originating file name:Shell
Ingest Python library
Ingest Python library
For the Unstructured Ingest Python library, you can use the standard Python
json.load function to load into a Python dictionary the contents of a JSON
file that the Ingest Python library outputs after the processing is complete.For example, the following code example uses standard Python to access and print each element’s ID, text, and originating file name:
Python
Open-source library
Open-source library
For the Unstructured open-source library, calling the You can use standard Python list operations on this list.You can also use standard Python looping techniques on this list to access each element in this list.Each individual element has the following attributes:To serialize this list as a Python dictionary, you can use the To serialize this list as JSON, you can use the
partition_via_api
function returns a list of elements (list[Element]
). For example:Python
.text
provides the element’stext
field value as astr
. See Element example..metadata
provides the element’smetadata
field as anElementMetadata
object. See Metadata..category
provides the element’stype
field value as astr
. See Element type..id
provides the element’selement_id
value as astr
. See Element ID.
.convert_coordinates_to_new_system()
converts the element’s location coordinates, if any, to a new coordinate system. See Element’s coordinates..to_dict()
gets the element’s content as a standard Python key-value dictionary (dict[str, Any]
).
Python
elements_to_dicts
method, for example:Python
elements_to_json
function to convert the list of elements (Iterable[Element]
) into a JSON-formatted string and then print or save that string. For example:Python