text_as_html
field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
The
image_base64
field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for
documents or PDF pages that are partitioned by using the Fast or VLM strategy.- If a
Table
element must be chunked, theTable
element is replaced by a set of relatedTableChunk
elements. - Each of these
TableChunk
elements will contain HTML table output for only its own element. - None of the these
TableChunk
elements will contain animage_base64
field.
Generate table-to-HTML output
To generate table-to-HTML output, in an Enrichment node in a workflow, for Model, select OpenAI (GPT-4o). Make sure after you choose this provider and model, that Table to HTML is also selected.You can change a workflow’s table description settings only through Custom workflow settings.For workflows that use chunking, the Chunker node should be placed after all Enrichment nodes. Placing the
Chunker node before a table-to-HTML output Enrichment node could cause incomplete or no table-to-HTML output to be generated.
Unstructured can potentially generate table-to-HTML output only for workflows that are configured as follows:
- With a Partitioner node set to use the Auto or High Res partitioning strategy, and a table-to-HTML output node is added.
- With a Partitioner node set to use the VLM partitioning strategy. No table-to-HTML output node is needed (or allowed).
- High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
- VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
- With a Partitioner node set to use the Fast partitioning strategy.
- With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain tables.