You want to get, save, or show the contents of elements that are represented as HTML, such as tables that are embedded in a PDF document.
Extract the contents of an element’s text_as_html
JSON object, which is nested inside of its parent metadata
object.
You will need a document that is one of the document types that can output the text_as_html
JSON object. For the list of applicable document types, see the entries in the table at the beginning of Partitioning where “Table Support” is “Yes.”
This example uses a PDF file with an embedded table.
For the Unstructured Ingest Python library, you can use the standard Python json.load function to load into a Python dictionary the contents of a JSON file that the Ingest Python library outputs after the processing is complete.
You want to get, save, or show the contents of elements that are represented as HTML, such as tables that are embedded in a PDF document.
Extract the contents of an element’s text_as_html
JSON object, which is nested inside of its parent metadata
object.
You will need a document that is one of the document types that can output the text_as_html
JSON object. For the list of applicable document types, see the entries in the table at the beginning of Partitioning where “Table Support” is “Yes.”
This example uses a PDF file with an embedded table.
For the Unstructured Ingest Python library, you can use the standard Python json.load function to load into a Python dictionary the contents of a JSON file that the Ingest Python library outputs after the processing is complete.