Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

Type: partition Subtype: unstructured_api

Settings

strategy
string
required
Partitioning strategy to use. Must be set to fast.
include_page_breaks
boolean
If true, includes page breaks in the output where supported by the file type. Default: false.
pdf_infer_table_structure
boolean
Applies only to the hi_res strategy and has no effect here. Default: false.
exclude_elements
array
List of Unstructured element types to exclude from the output. Default: none. Allowed values:
  • Address
  • EmailAddress
  • FigureCaption
  • Footer
  • Formula
  • Header
  • Image
  • ListItem
  • NarrativeText
  • PageBreak
  • Table
  • Title
  • UncategorizedText
xml_keep_tags
boolean
If true, retains XML tags in the output. If false, extracts only the text content from XML tags. Default: false.
encoding
string
Encoding method used to decode the text input. Default: utf-8.
ocr_languages
array
Languages present in the input, for use in partitioning, OCR, or both. Multiple values indicate the text could be in any of the specified languages. Default: ['eng']. See the language codes list.
extract_image_block_types
array
Unstructured element types for which image blocks are extracted as Base64-encoded data and stored in metadata fields. Default: none. Allowed values:
  • Abstract
  • BulletedText
  • Caption
  • CodeSnippet
  • CompositeElement
  • Figure
  • FigureCaption
  • Form
  • FormKeysValues
  • Formula
  • Header
  • Image
  • List
  • List-item
  • ListItem
  • NarrativeText
  • Paragraph
  • Picture
  • Table
  • Text
  • Threading
  • Title
  • UncategorizedText
infer_table_structure
boolean
If true, any table elements extracted from a PDF include an additional text_as_html metadata field containing an HTML <table> representation. Default: false.
coordinates
boolean
If true, each element extracted from a PDF includes position information relative to its page. Default: false.
fast_partitioner_workflow_node = WorkflowNode(
    name="Partitioner",
    subtype="unstructured_api",
    type="partition",
    settings={
        "strategy": "fast",
        "include_page_breaks": <True|False>,
        "pdf_infer_table_structure": <True|False>,
        "exclude_elements": [
            "<element-name>",
            "<element-name>"
        ],
        "xml_keep_tags": <True|False>,
        "encoding": "<encoding>",
        "ocr_languages": [
            "<language>",
            "<language>"
        ],
        "extract_image_block_types": [
            "image",
            "table"
        ],
        "infer_table_structure": <True|False>,
        "coordinates": <True|False>
    }
)