Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
After partitioning, you can have Unstructured generate text-based summaries of detected tables.
This summarization is done by using models offered through various model providers.
Here is an example of the output of a detected table using GPT-4o. Note specifically the text field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
{
"type": "Table",
"element_id": "5713c0e90194ac7f0f2c60dd614bd24d",
"text": "The table consists of 6 rows and 7 columns. The columns represent
inhibitor concentration (g), bc (V/dec), ba (V/dec), Ecorr (V), icorr
(A/cm\u00b2), polarization resistance (\u03a9), and corrosion rate
(mm/year). As the inhibitor concentration increases, the corrosion
rate generally decreases, indicating the effectiveness of the
inhibitor. Notably, the polarization resistance increases with higher
inhibitor concentrations, peaking at 6 grams before slightly
decreasing. This suggests that the inhibitor is most effective at
6 grams, significantly reducing the corrosion rate and increasing
polarization resistance. The data provides valuable insights into the
optimal concentration of the inhibitor for corrosion prevention.",
"metadata": {
"text_as_html": "<table>...<full results omitted for brevity>...</table>",
"filetype": "application/pdf",
"languages": [
"eng"
],
"page_number": 1,
"image_base64": "/9j...<full results omitted for brevity>...//Z",
"image_mime_type": "image/jpeg",
"filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
"data_source": {}
}
}
The image_base64 field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
Here are two examples of the descriptions for detected tables. These descriptions are generated with GPT-4o by OpenAI:
The generated table’s summary will overwrite any text that Unstructured had previously extracted from that table into the text field.
The table’s original content is available in the image_base64 field.
The image_base64 field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
For workflows that use chunking, note the following changes:
- If a
Table element must be chunked, the Table element is replaced by a set of related TableChunk elements.
- Each of these
TableChunk elements will contain a summary description only for its own element, as part of the element’s text field.
- These
TableChunk elements will not contain an image_base64 field.
Any embeddings that are produced after these summaries are generated will be based on the new text field’s contents.
Generate table descriptions
To generate table descriptions, add a Table Description node to your workflow by clicking + in the workflow editor, and then click Enrich > Table Description.
Be sure also to select one of the available provider (and model) combinations that are shown.
You can change a workflow’s table description settings only through Custom workflow settings.For workflows that use chunking, the Chunker node should be placed after all enrichment nodes. Placing the
Chunker node before a table description enrichment node could cause incomplete or no table descriptions to be generated.
The following models are no longer available as of the following dates:
- Amazon Bedrock Claude Sonnet 3.5: October 22, 2025
- Anthropic Claude Sonnet 3.5: October 22, 2025
Unstructured recommends the following actions:
- For new workflows, do not use any of these models.
- For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.
Workflows that attempt to use any of these models on or after its associated date will return errors.
Unstructured can potentially generate table summary descriptions only for workflows that are configured as follows:
- With a Partitioner node set to use the Auto or High Res partitioning strategy, and a table summary description node is added.
- With a Partitioner node set to use the VLM partitioning strategy. No table summary description node is needed (or allowed).
Even with these configurations, Unstructured actually generates table summary descriptions only for files that contain tables and are also eligible
for processing with the following partitioning strategies:
- High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
- VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
Unstructured never generates table summary descriptions for workflows that are configured as follows:
- With a Partitioner node set to use the Fast partitioning strategy.
- With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain tables.
Learn more