After partitioning, summarizing generates text-based summaries of images and tables. This summarization is done by using models offered through these providers: Here is an example of the output of image summarization using GPT-4o. Note specifically the text field that is added. Line breaks have been inserted here for readability. The output will not contain these line breaks.
{
    "type": "Image",
    "element_id": "3303aa13098f5a26b9845bd18ee8c881",
    "text": "{\n  \"type\": \"graph\",\n  \"description\": \"The graph shows 
        the relationship between Potential (V) and Current Density (A/cm2). 
        The x-axis is labeled 'Current Density (A/cm2)' and ranges from 
        0.0000001 to 0.1. The y-axis is labeled 'Potential (V)' and ranges 
        from -2.5 to 1.5. There are six different data series represented 
        by different colors: blue (10g), red (4g), green (6g), purple (2g), 
        orange (Control), and light blue (8g). The data points for each series 
        show how the potential changes with varying current density.\"\n}",
    "metadata": {
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...Q==",
        "image_mime_type": "image/jpeg",
        "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
        "data_source": {}
    }
}
Here is an example of the output of table summarization using GPT-4o. Note specifically the text field that is added. Line breaks have been inserted here for readability. The output will not contain these line breaks.
{
    "type": "Table",
    "element_id": "5713c0e90194ac7f0f2c60dd614bd24d",
    "text": "The table consists of 6 rows and 7 columns. The columns represent 
        inhibitor concentration (g), bc (V/dec), ba (V/dec), Ecorr (V), icorr 
        (A/cm\u00b2), polarization resistance (\u03a9), and corrosion rate 
        (mm/year). As the inhibitor concentration increases, the corrosion 
        rate generally decreases, indicating the effectiveness of the 
        inhibitor. Notably, the polarization resistance increases with higher 
        inhibitor concentrations, peaking at 6 grams before slightly 
        decreasing. This suggests that the inhibitor is most effective at 
        6 grams, significantly reducing the corrosion rate and increasing 
        polarization resistance. The data provides valuable insights into the 
        optimal concentration of the inhibitor for corrosion prevention.",
    "metadata": {
        "text_as_html": "<table>...<full results omitted for brevity>...</table>",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...//Z",
        "image_mime_type": "image/jpeg",
        "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
        "data_source": {}
    }
}

Summarize images or tables

To summarize images or tables, in the Task drop-down list of an Enrichment node in a workflow, specify the following:
You can change a workflow’s summarization settings only through Custom workflow settings.
Unstructured can potentially generate image summary descriptions, table summary descriptions, and table-to-HTML output only for workflows that are configured as follows:
  • With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node, table summary description node, or table-to-HTML output node is added.
  • With a Partitioner node set to use the VLM partitioning strategy. No image summary description node, table summary description node, or table-to-HTML output node is needed (or allowed).
Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible for processing with the following partitioning strategies:
  • High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
  • VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:
  • With a Partitioner node set to use the Fast partitioning strategy.
  • With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.
For image summarization, select Image Description, and then choose one of the following provider (and model) combinations to use: For table summarization, select Table Description, and then choose one of the following provider (and model) combinations to use: