After partitioning and chunking, you can have Unstructured generate text-based summaries of detected images.

This summarization is done by using models offered through these providers:

Here is an example of the output of a detected image using GPT-4o. Note specifically the text field that is added. Line breaks have been inserted here for readability. The output will not contain these line breaks.

{
    "type": "Image",
    "element_id": "3303aa13098f5a26b9845bd18ee8c881",
    "text": "{\n  \"type\": \"graph\",\n  \"description\": \"The graph shows 
        the relationship between Potential (V) and Current Density (A/cm2). 
        The x-axis is labeled 'Current Density (A/cm2)' and ranges from 
        0.0000001 to 0.1. The y-axis is labeled 'Potential (V)' and ranges 
        from -2.5 to 1.5. There are six different data series represented 
        by different colors: blue (10g), red (4g), green (6g), purple (2g), 
        orange (Control), and light blue (8g). The data points for each series 
        show how the potential changes with varying current density.\"\n}",
    "metadata": {
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...Q==",
        "image_mime_type": "image/jpeg",
        "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
        "data_source": {}
    }
}

Any embeddings that are produced after these summaries are generated will be based on the text field’s contents.

Generate image descriptions

To generate image descriptions, in the Task drop-down list of an Enrichment node in a workflow, specify the following:

You can change a workflow’s image description settings only through Custom workflow settings.

Image summaries are generated only when the Partitioner node in a workflow is also set to use the High Res partitioning strategy. Learn more.

Select Image Description, and then choose one of the following provider (and model) combinations to use: