Generative OCR optimization

After partitioning, you can have a vision language model (VLM) optimize the fidelity of text blocks that Unstructured initially processed during its partitioning phase. Here are a few examples of Unstructured’s output of text blocks that were initially processed, and the more accurate version of these text blocks that were optimized by using Claude Sonnet 4. Irrelevant lines of output have been omitted here for brevity. Example 1: Vertical watermarked text

Before (vertical watermarked text, represented incorrectly):

{
    "...": "...",
    "text": "3 2 0 2 t c O 9 2 ] V C . s c [ 2 v 9 0 8 6 1 . 0 1 3 2 : v i X r",
    "...": "..."
}

After (vertical watermarked text, now represented correctly from the original content):

{
    "...": "...",
    "text": "arXiv:2310.16809v2 [cs.CV] 29 Oct 2023",
    "...": "..."
}

Example 2: Hyperlink

Before (hyperlink, represented incorrectly):

{
    "...": "...",
    "text": "con/Yuliang-Liu/MultinodalOCR|",
    "...": "..."
}

After (hyperlink, now represented correctly from the original content):

{
    "...": "...",
    "text": "https://github.com/Yuliang-Liu/MultimodalOCR",
    "...": "..."
}

Example 3: Chinese characters

Before (Chinese characters, represented incorrectly):

{
    "...": "...",
    "text": "GT SHE GPT4-V: EHES",
    "...": "..."
}

After (Chinese characters, now represented correctly from the original content, expressed as Unicode):

{
    "...": "...",
    "text": "GT : \u91d1\u724c\u70e7\u814a GPT4-V: \u6587\u9759\u5019\u9e1f",
    "...": "..."
}

Improve text fidelity with generative OCR

To have Unstructured perform generative OCR optimization, do the following:

For Unstructured Pipelines users, add an Enrichment node of type Generative OCR to an Unstructured custom workflow.
For Unstructured API users, add a Generative OCR task. You add this task as either as an object in a workflow_nodes array (for curl) or as a WorkflowNode in a WorkflowNodes collection (for Python). This object or collection applies whenever you create a workflow, update a workflow, or create a workflow job that processes local files.

Structured data extractor

Enriching

Generative OCR optimization

Improve text fidelity with generative OCR

​Improve text fidelity with generative OCR

Improve text fidelity with generative OCR