Fast strategy - Unstructured

fast_partitioner_workflow_node = WorkflowNode(
    name="Partitioner",
    subtype="unstructured_api",
    type="partition",
    settings={
        "strategy": "fast",
        "include_page_breaks": <True|False>,
        "pdf_infer_table_structure": <True|False>,
        "exclude_elements": [
            "<element-name>",
            "<element-name>"
        ],
        "xml_keep_tags": <True|False>,
        "encoding": "<encoding>",
        "ocr_languages": [
            "<language>",
            "<language>"
        ],
        "extract_image_block_types": [
            "image",
            "table"
        ],
        "infer_table_structure": <True|False>,
        "coordinates": <True|False>
    }
)

{
    "name": "Partitioner",
    "type": "partition",
    "subtype": "unstructured_api",
    "settings": {
        "strategy": "fast",
        "include_page_breaks": <true|false>,
        "pdf_infer_table_structure": <true|false>,
        "exclude_elements": [
            "<element-name>",
            "<element-name>"
        ],
        "xml_keep_tags": <true|false>,
        "encoding": "<encoding>",
        "ocr_languages": [
            "<language-code>",
            "<language-code>"
        ],
        "extract_image_block_types": [
            "image",
            "table"
        ],
        "infer_table_structure": <true|false>,
        "coordinates": <true|false>
    }
}

Type: partition Subtype: unstructured_api For more information about how the various partitioning strategies balance output quality, speed, and cost, see Partitioning.

Usage guidance

Use this strategy for digitally-created documents that already contain an extractable text layer, such as digitally created PDFs, Office files, HTML, and email. It is the fastest and lowest-cost option but does not run OCR or detect layout structure, making it unsuitable for scanned documents, images, or documents requiring table detection or complex layout analysis. Use High Res instead for those cases.

Settings

strategy

string

required

Partitioning strategy to use. Must be set to fast.

include_page_breaks

boolean

If true, includes page breaks in the output where supported by the file type. Default: false.

pdf_infer_table_structure

boolean

Applies only to the hi_res strategy and has no effect here. Default: false.

exclude_elements

array

List of Unstructured element types to exclude from the output. Default: none. Allowed values:

Address
EmailAddress
FigureCaption
Footer
Formula
Header
Image
ListItem
NarrativeText
PageBreak
Table
Title
UncategorizedText

xml_keep_tags

boolean

If true, retains XML tags in the output. If false, extracts only the text content from XML tags. Default: false.

encoding

string

Encoding method used to decode the text input. Default: utf-8.

ocr_languages

array

Languages present in the input, for use in partitioning, OCR, or both. Multiple values indicate the text could be in any of the specified languages. Default: ['eng']. See the language codes list.

extract_image_block_types

array

Unstructured element types for which image blocks are extracted as Base64-encoded data and stored in metadata fields. Default: none. Allowed values:

Abstract
BulletedText
Caption
CodeSnippet
CompositeElement
Figure
FigureCaption
Form
FormKeysValues
Formula
Header
Image
List
List-item
ListItem
NarrativeText
Paragraph
Picture
Table
Text
Threading
Title
UncategorizedText

infer_table_structure

boolean

If true, any table elements extracted from a PDF include an additional text_as_html metadata field containing an HTML <table> representation. Default: false.

coordinates

boolean

If true, each element extracted from a PDF includes position information relative to its page. Default: false.

fast_partitioner_workflow_node = WorkflowNode(
    name="Partitioner",
    subtype="unstructured_api",
    type="partition",
    settings={
        "strategy": "fast",
        "include_page_breaks": <True|False>,
        "pdf_infer_table_structure": <True|False>,
        "exclude_elements": [
            "<element-name>",
            "<element-name>"
        ],
        "xml_keep_tags": <True|False>,
        "encoding": "<encoding>",
        "ocr_languages": [
            "<language>",
            "<language>"
        ],
        "extract_image_block_types": [
            "image",
            "table"
        ],
        "infer_table_structure": <True|False>,
        "coordinates": <True|False>
    }
)

{
    "name": "Partitioner",
    "type": "partition",
    "subtype": "unstructured_api",
    "settings": {
        "strategy": "fast",
        "include_page_breaks": <true|false>,
        "pdf_infer_table_structure": <true|false>,
        "exclude_elements": [
            "<element-name>",
            "<element-name>"
        ],
        "xml_keep_tags": <true|false>,
        "encoding": "<encoding>",
        "ocr_languages": [
            "<language-code>",
            "<language-code>"
        ],
        "extract_image_block_types": [
            "image",
            "table"
        ],
        "infer_table_structure": <true|false>,
        "coordinates": <true|false>
    }
}

High Res Enrichment

​Usage guidance

​Settings

Usage guidance

Settings