Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

Type: prompter Subtype: see Settings
Unstructured can produce generative OCR optimizations for workflows that are configured as follows:
  • With a Partitioner node set to use the Auto or High Res partitioning strategy, and a generative OCR optimizations node is added.
  • With a Partitioner node set to use the VLM partitioning strategy. No generative OCR optimization node is needed (or allowed).
Unstructured never produces generative OCR optimizations for workflows with a Partitioner node set to use the Fast partitioning strategy.
Generative OCR does not process any text blocks by default. You must also explicitly specify which document element types containing text that you want generative OCR to process.To do this, in a workflow’s High Res partitioner node, add each document element type that you want generative OCR to process to the extract_image_block_types list within the partitioner node’s settings field definition.Generative OCR does not process the text of any Image or Table elements if they have already been processed by image description or table description enrichments, respectively.

Settings

subtype
string
required
Enrichment algorithm and provider. Set at the WorkflowNode level, outside of settings. Allowed values:
  • anthropic_ocr
  • bedrock_ocr
  • openai_ocr
The preceding list applies only to Unstructured Let’s Go and Pay-As-You-Go accounts.For Unstructured Business accounts, to get your current list of available values, contact your Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at support@unstructured.io.
provider_type
string
required
Provider that matches the prefix of subtype. Allowed values: anthropic, bedrock, openai.
model
string
required
Model to use for generative OCR. For a full list of the models available in Unstructured, see Available models.
generative_ocr_enrichment_workflow_node = WorkflowNode(
    name="Enrichment",
    subtype="<subtype>",
    type="prompter",
    settings={
        "provider_type": "<provider-type>",
        "model": "<model>"
    }
)