Partition Endpoint
Extract tables as HTML
Task
You want to get, save, or show the contents of elements that are represented as HTML, such as tables that are embedded in a PDF document.
Approach
Extract the contents of an element’s text_as_html
JSON object, which is nested inside of its parent metadata
object.
To run this example
You will need a document that is one of the document types that can output the text_as_html
JSON object. For the list of applicable document types, see the entries in the table at the beginning of Partitioning where “Table Support” is “Yes.”
This example uses a PDF file with an embedded table.
Code
For the Unstructured Python SDK, you’ll need:
These environment variables:
UNSTRUCTURED_API_KEY
- Your Unstructured API key value.UNSTRUCTURED_API_URL
- Your Unstructured API URL.
Python SDK
See also
Was this page helpful?