Table descriptions
After partitioning and chunking, you can have Unstructured generate text-based summaries of detected tables.
This summarization is done by using models offered through these providers:
- GPT-4o, provided through OpenAI.
- Claude 3.5 Sonnet, provided through Anthropic.
- Claude 3.5 Sonnet, provided through Amazon Bedrock.
Here is an example of the output of a detected table using GPT-4o. Note specifically the text
field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
The generated table’s summary will overwrite any previous contents in the text
field. The table’s original content is available
in the image_base64
field.
Any embeddings that are produced after these summaries are generated will be based on the new text
field’s contents.
Generate table descriptions
To generate table descriptions, in an Enrichment node in a workflow, specify the following:
You can change a workflow’s table description settings only through Custom workflow settings.
Table summary descriptions are generated only when the Partitioner node in a workflow is set to use the High Res partitioning strategy and the workflow also contains a table description enrichment node.
Setting the Partitioner node to use Auto, VLM, or Fast in a workflow that also contains a table description enrichment node will not produce any table summary descriptions, and it could also cause the workflow to stop running or produce unexpected results.
Select Table, and then choose one of the following provider (and model) combinations to use:
- OpenAI (GPT-4o). Learn more.
- Anthropic (Claude 3.5 Sonnet). Learn more.
- Amazon Bedrock (Claude 3.5 Sonnet). Learn more.
Make sure after you choose the provider and model, that Table Description is also displayed. If Table Description and Table to HTML are both displayed, be sure to select Table Description.