Type:Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
structured_data_extractor
Subtype: llm
The following 4-minute video provides an overview of the structured data extractor.
Settings
You must specify exactly one of the following. Structured LLM extraction always runs against a schema: either you supply that schema directly (
json_schema), or you supply plain-language instructions (extraction_guidance) and Unstructured derives the extraction schema from that text first.The extraction schema, in OpenAI Structured Outputs format, for the structured data that you want to extract, expressed as a single string.
Plain-language instructions describing what to extract, expressed as a single string. Unstructured derives an extraction schema from this text (OpenAI Structured Outputs format) before the LLM performs structured extraction.
The mode in which to output the extracted data. Allowed values:
elements_with_extracted_data: Output the extracted data as JSON into anextracted_datafield inside ofmetadatawithin a parentDocumentDataelement, followed by other built-in Unstructured document elements.extracted_data_only: Output only the extracted data as JSON, without any parentDocumentDataelement or any other built-in Unstructured document elements.
elements_with_extracted_data.LLM provider to use. Allowed values:
anthropic, azure_openai, bedrock, openai.LLM model to use. For a full list of the models available in Unstructured, see Available models.

