Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
After partitioning, summarizing generates text-based summaries of images and tables.
This summarization is done by using models offered through various model providers.
Here is an example of the output of image summarization using GPT-4o. Note specifically the text field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
{
"type": "Image",
"element_id": "3303aa13098f5a26b9845bd18ee8c881",
"text": "{\n \"type\": \"graph\",\n \"description\": \"The graph shows
the relationship between Potential (V) and Current Density (A/cm2).
The x-axis is labeled 'Current Density (A/cm2)' and ranges from
0.0000001 to 0.1. The y-axis is labeled 'Potential (V)' and ranges
from -2.5 to 1.5. There are six different data series represented
by different colors: blue (10g), red (4g), green (6g), purple (2g),
orange (Control), and light blue (8g). The data points for each series
show how the potential changes with varying current density.\"\n}",
"metadata": {
"filetype": "application/pdf",
"languages": [
"eng"
],
"page_number": 1,
"image_base64": "/9j...<full results omitted for brevity>...Q==",
"image_mime_type": "image/jpeg",
"filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
"data_source": {}
}
}
Here is an example of the output of table summarization using GPT-4o. Note specifically the text field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
{
"type": "Table",
"element_id": "5713c0e90194ac7f0f2c60dd614bd24d",
"text": "The table consists of 6 rows and 7 columns. The columns represent
inhibitor concentration (g), bc (V/dec), ba (V/dec), Ecorr (V), icorr
(A/cm\u00b2), polarization resistance (\u03a9), and corrosion rate
(mm/year). As the inhibitor concentration increases, the corrosion
rate generally decreases, indicating the effectiveness of the
inhibitor. Notably, the polarization resistance increases with higher
inhibitor concentrations, peaking at 6 grams before slightly
decreasing. This suggests that the inhibitor is most effective at
6 grams, significantly reducing the corrosion rate and increasing
polarization resistance. The data provides valuable insights into the
optimal concentration of the inhibitor for corrosion prevention.",
"metadata": {
"text_as_html": "<table>...<full results omitted for brevity>...</table>",
"filetype": "application/pdf",
"languages": [
"eng"
],
"page_number": 1,
"image_base64": "/9j...<full results omitted for brevity>...//Z",
"image_mime_type": "image/jpeg",
"filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
"data_source": {}
}
}
The image_base64 field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
Summarize images or tables
To summarize images or tables, add a node to your workflow by clicking + in the workflow editor. Click Enrich, and then
click Image Description or Table Description, respectively.
You can change a workflow’s summarization settings only through
Custom workflow settings.
Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and generative OCR optimizations, only for workflows that are configured as follows:
- With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is added.
- With a Partitioner node set to use the VLM partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is needed (or allowed).
Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible
for processing with the following partitioning strategies:
- High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
- VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:
- With a Partitioner node set to use the Fast partitioning strategy.
- With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.
Unstructured never produces generative OCR optimizations for workflows with a Partitioner node set to use the Fast partitioning strategy.
For image summarization, select Image Description, and then choose one of the available provider (and model) combinations that are shown.
For table summarization, select Table Description, and then choose one of the available provider (and model) combinations that are shown. For a full list of the models available in Unstructured, see Available models.