After partitioning and chunking, you can have Unstructured generate text-based summaries of detected images.

This summarization is done by using models offered through these providers:

Here is an example of the output of a detected image using GPT-4o. Note specifically the text field that is added. Line breaks have been inserted here for readability. The output will not contain these line breaks.

{
    "type": "Image",
    "element_id": "3303aa13098f5a26b9845bd18ee8c881",
    "text": "{\n  \"type\": \"graph\",\n  \"description\": \"The graph shows 
        the relationship between Potential (V) and Current Density (A/cm2). 
        The x-axis is labeled 'Current Density (A/cm2)' and ranges from 
        0.0000001 to 0.1. The y-axis is labeled 'Potential (V)' and ranges 
        from -2.5 to 1.5. There are six different data series represented 
        by different colors: blue (10g), red (4g), green (6g), purple (2g), 
        orange (Control), and light blue (8g). The data points for each series 
        show how the potential changes with varying current density.\"\n}",
    "metadata": {
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...Q==",
        "image_mime_type": "image/jpeg",
        "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
        "data_source": {}
    }
}

Any embeddings that are produced after these summaries are generated will be based on the text field’s contents.

Generate image descriptions

To generate image descriptions, in an Enrichment node in a workflow, specify the following:

You can change a workflow’s image description settings only through Custom workflow settings.

Image summary descriptions are generated only when the Partitioner node in a workflow is set to use the High Res partitioning strategy and the workflow also contains an image description enrichment node.

Setting the Partitioner node to use Auto, VLM, or Fast in a workflow that also contains an image description enrichment node will not produce any image summary descriptions, and it could also cause the workflow to stop running or produce unexpected results.

Select Image, and then choose one of the following provider (and model) combinations to use: