After partitioning, you can have Unstructured generate representations of each detected table in HTML markup format.
This table-to-HTML output is done by using GPT-4o, provided through OpenAI.
Here is an example of the HTML markup output of a detected table using GPT-4o. Note specifically the
text_as_html field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
The
image_base64 field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for
documents or PDF pages that are partitioned by using the Fast or VLM strategy.- If a
Tableelement must be chunked, theTableelement is replaced by a set of relatedTableChunkelements. - Each of these
TableChunkelements will contain HTML table output for only its own element. - None of the these
TableChunkelements will contain animage_base64field.
Generate table-to-HTML output
To generate table-to-HTML output, in an Enrichment node in a workflow, for Model, select OpenAI (GPT-4o). Make sure after you choose this provider and model, that Table to HTML is also selected.You can change a workflow’s table description settings only through Custom workflow settings.For workflows that use chunking, the Chunker node should be placed after all Enrichment nodes. Placing the
Chunker node before a table-to-HTML output Enrichment node could cause incomplete or no table-to-HTML output to be generated.

