Image descriptions

After partitioning, you can have Unstructured generate text-based summaries of detected images. This summarization is done by using models offered through these providers:

GPT-4o, provided through OpenAI.
Claude 3.5 Sonnet, provided through Anthropic.
Claude 3.5 Sonnet, provided through Amazon Bedrock.

Here is an example of the output of a detected image using GPT-4o. Note specifically the text field that is added. Line breaks have been inserted here for readability. The output will not contain these line breaks.

{
    "type": "Image",
    "element_id": "3303aa13098f5a26b9845bd18ee8c881",
    "text": "{\n  \"type\": \"graph\",\n  \"description\": \"The graph shows 
        the relationship between Potential (V) and Current Density (A/cm2). 
        The x-axis is labeled 'Current Density (A/cm2)' and ranges from 
        0.0000001 to 0.1. The y-axis is labeled 'Potential (V)' and ranges 
        from -2.5 to 1.5. There are six different data series represented 
        by different colors: blue (10g), red (4g), green (6g), purple (2g), 
        orange (Control), and light blue (8g). The data points for each series 
        show how the potential changes with varying current density.\"\n}",
    "metadata": {
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...Q==",
        "image_mime_type": "image/jpeg",
        "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
        "data_source": {}
    }
}

For workflows that use chunking, note the following changes:

Each Image element is replaced by a CompositeElement element.
This CompositeElement element will contain the image’s summary description as part of the element’s text field.
This CompositeElement element will not contain an image_base64 field.

Here are three examples of the descriptions for detected images. These descriptions are generated with GPT-4o by OpenAI: Description of an image showing a scatter plot graph

Description of an image showing a scatter plot graph

Description of an image showing the Matthews Correlation Coefficient for different VQA datasets

Description of an image showing three scatter plots

Any embeddings that are produced after these summaries are generated will be based on the text field’s contents.

Generate image descriptions

To generate image descriptions, in an Enrichment node in a workflow, specify the following:

You can change a workflow’s image description settings only through Custom workflow settings.For workflows that use chunking, the Chunker node should be placed after all Enrichment nodes. Placing the Chunker node before an image descriptions Enrichment node could cause incomplete or no image descriptions to be generated.

Unstructured can potentially generate image summary descriptions only for workflows that are configured as follows:

With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node is added.
With a Partitioner node set to use the VLM partitioning strategy. No image summary description node is needed (or allowed).

Even with these configurations, Unstructured actually generates image summary descriptions only for files that contain images and are also eligible for processing with the following partitioning strategies:

High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
VLM or High Res, when the workflow’s Partitioner node is set to use VLM.

Unstructured never generates image summary descriptions for workflows that are configured as follows:

With a Partitioner node set to use the Fast partitioning strategy.
With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images.

Select Image, and then choose one of the following provider (and model) combinations to use:

OpenAI (GPT-4o). Learn more.
Anthropic (Claude 3.5 Sonnet). Learn more.
Amazon Bedrock (Claude 3.5 Sonnet). Learn more.

Unstructured UI

Getting started with the UI

Using the UI

Concepts

Generate image descriptions

Unstructured UI

Getting started with the UI

Using the UI

Concepts

​Generate image descriptions

Generate image descriptions