Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

Type: structured_data_extractor Subtype: llm The following 4-minute video provides an overview of the structured data extractor.

Settings

schema_to_extract
object
required
You must specify exactly one of the following. Structured LLM extraction always runs against a schema: either you supply that schema directly (json_schema), or you supply plain-language instructions (extraction_guidance) and Unstructured derives the extraction schema from that text first.
schema_to_extract.json_schema
string
The extraction schema, in OpenAI Structured Outputs format, for the structured data that you want to extract, expressed as a single string.
schema_to_extract.extraction_guidance
string
Plain-language instructions describing what to extract, expressed as a single string. Unstructured derives an extraction schema from this text (OpenAI Structured Outputs format) before the LLM performs structured extraction.
output_mode
string
The mode in which to output the extracted data. Allowed values:
  • elements_with_extracted_data: Output the extracted data as JSON into an extracted_data field inside of metadata within a parent DocumentData element, followed by other built-in Unstructured document elements.
  • extracted_data_only: Output only the extracted data as JSON, without any parent DocumentData element or any other built-in Unstructured document elements.
Default: elements_with_extracted_data.
provider
string
required
LLM provider to use. Allowed values: anthropic, azure_openai, bedrock, openai.
model
string
required
LLM model to use. For a full list of the models available in Unstructured, see Available models.
Learn more.
extract_workflow_node = WorkflowNode(
    name="Extractor",
    subtype="llm",
    type="structured_data_extractor",
    settings={
        "schema_to_extract": {
            "json_schema": "<json-schema>",
            "extraction_guidance": "<extraction-guidance>"
        },
        "output_mode": "<elements_with_extracted_data | extracted_data_only>",
        "provider": "<provider>",
        "model": "<model>"
    }
)