Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
Type: prompter
Subtype: see Settings
Unstructured can produce generative OCR optimizations for workflows that are configured as follows:
- With a Partitioner node set to use the Auto or High Res partitioning strategy, and a generative OCR optimizations node is added.
- With a Partitioner node set to use the VLM partitioning strategy. No generative OCR optimization node is needed (or allowed).
Unstructured never produces generative OCR optimizations for workflows with a Partitioner node set to use the Fast partitioning strategy.
Generative OCR does not process any text blocks by default. You must also explicitly specify which document element
types containing text that you want generative OCR to process.To do this, in a workflow’s High Res partitioner node,
add each document element type that you want generative OCR to process to the
extract_image_block_types list within the partitioner node’s settings field definition.Generative OCR does not process the text of any Image or Table elements if they have already been processed by
image description or table description enrichments, respectively.
Settings
Enrichment algorithm and provider. Set at the WorkflowNode level, outside of settings. Allowed values:
anthropic_ocr
bedrock_ocr
openai_ocr
The preceding list applies only to Unstructured Let’s Go and Pay-As-You-Go accounts.For Unstructured Business accounts, to get your current list of available values, contact your
Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
support@unstructured.io.
Provider that matches the prefix of subtype. Allowed values: anthropic, bedrock, openai.
Model to use for generative OCR. For a full list of the models available in Unstructured, see Available models.
generative_ocr_enrichment_workflow_node = WorkflowNode(
name="Enrichment",
subtype="<subtype>",
type="prompter",
settings={
"provider_type": "<provider-type>",
"model": "<model>"
}
)