After partitioning, you can have Unstructured generate text-based summaries of detected images.
This summarization is done by using models offered through various model providers.
Here is an example of the output of a detected image using GPT-4o. Note specifically the
text field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
The
image_base64 field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for
documents or PDF pages that are partitioned by using the Fast or VLM strategy.- Each
Imageelement is replaced by aCompositeElementelement. - This
CompositeElementelement will contain the image’s summary description as part of the element’stextfield. - This
CompositeElementelement will not contain animage_base64field.



text field’s contents.
Generate image descriptions
To generate image descriptions, in an Enrichment node in a workflow, select Image, and then choose one of the available provider (and model) combinations that are shown.You can change a workflow’s image description settings only through Custom workflow settings.For workflows that use chunking, the Chunker node should be placed after all Enrichment nodes. Placing the
Chunker node before an image descriptions Enrichment node could cause incomplete or no image descriptions to be generated.

