Image descriptions
After partitioning and chunking, you can have Unstructured generate text-based summaries of detected images.
This summarization is done by using models offered through these providers:
- GPT-4o, provided through OpenAI.
- Claude 3.5 Sonnet, provided through Anthropic.
- Claude 3.5 Sonnet, provided through Amazon Bedrock.
Here is an example of the output of a detected image using GPT-4o. Note specifically the text
field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
Any embeddings that are produced after these summaries are generated will be based on the text
field’s contents.
Generate image descriptions
To generate image descriptions, in the Task drop-down list of an Enrichment node in a workflow, specify the following:
You can change a workflow’s image description settings only through Custom workflow settings.
Image summaries are generated only when the Partitioner node in a workflow is also set to use the High Res partitioning strategy. Learn more.
Select Image Description, and then choose one of the following provider (and model) combinations to use:
- OpenAI (GPT-4o). Learn more.
- Anthropic (Claude 3.5 Sonnet). Learn more.
- Amazon Bedrock (Claude 3.5 Sonnet). Learn more.
Was this page helpful?