Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
The Unstructured open source library does not support enriching.
Enriching adds enhancements to the processed data that Unstructured produces. These enrichments include:
- Providing a summarized description of the contents of a detected image. See Image descriptions.
- Providing a summarized description of the contents of a detected table. See Table descriptions.
- Providing a representation of a detected table in HTML markup format. See Tables to HTML.
- Providing a list of recognized entities and their types, through a process known as named entity recognition (NER). See Named entity recognition.
- Having a vision language model (VLM) use advanced optical character recognition (OCR) to improve the accuracy of initially-processed text blocks. See Generative OCR.
To add an enrichment, add one of the following enrichment node types to an Unstructured workflow:
- Image Description to provide a summarized description of the contents of each detected image.
- Table Description to provide a summarized description of the contents of each detected table.
- Table to HTML to provide a representation of each detected table in HTML markup format.00
- NER to provide a list of recognized entities and their types by using a technique called named entity recognition (NER).
- Generative OCR to have a VLM use advanced OCR to improve the accuracy of initially-processed text blocks.
For multiple enrichments, add an enrichment node for each additional enrichment type to your workflow.
Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and generative OCR optimizations, only for workflows that are configured as follows:
- With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is added.
- With a Partitioner node set to use the VLM partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is needed (or allowed).
Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible
for processing with the following partitioning strategies:
- High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
- VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:
- With a Partitioner node set to use the Fast partitioning strategy.
- With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.
Unstructured never produces generative OCR optimizations for workflows with a Partitioner node set to use the Fast partitioning strategy.