Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
The Unstructured open source library does not support enriching.
Enriching adds enhancements to the processed data that Unstructured produces. These enrichments include:
- Providing a summarized description of the contents of a detected image. Image descriptions.
- Providing a summarized description of the contents of a detected table. Table descriptions.
- Providing a representation of a detected table in HTML markup format. Tables to HTML.
- Providing a list of recognized entities and their types, through a process known as named entity recognition (NER). Named entity recognition.
- Having a vision language model (VLM) use advanced optical character recognition (OCR) to improve the accuracy of initially-processed text blocks. Generative OCR.
To add an enrichment, in the workflow editor, click +, click Enrich, and then click one of the following enrichment node types:
You can change enrichment settings only through Custom workflow settings.
Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and generative OCR optimizations, only for workflows that are configured as follows:
- With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is added.
- With a Partitioner node set to use the VLM partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is needed (or allowed).
Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible
for processing with the following partitioning strategies:
- High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
- VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:
- With a Partitioner node set to use the Fast partitioning strategy.
- With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.
Unstructured never produces generative OCR optimizations for workflows with a Partitioner node set to use the Fast partitioning strategy.
- Image Description to provide a summarized description of the contents of each detected image. Image descriptions.
- Table Description to provide a summarized description of the contents of each detected table. Table descriptions.
- Table to HTML to provide a representation of each detected table in HTML markup format. Tables to HTML.
- NER to provide a list of recognized entities and their types by using a technique called named entity recognition (NER). Named entity recognition.
- Generative OCR to have a VLM use advanced OCR to improve the accuracy of initially-processed text blocks. Generative OCR.
To add multiple enrichments, create an additional enrichment node for each enrichment type that you want to add.