> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM

> The LLM method uses a language model to extract structured data fields from partitioned documents based on a JSON schema or plain-language extraction guidance.

*Type*: `structured_data_extractor`

*Subtype*: `llm`

The following 4-minute video provides an overview of the structured data extractor.

<iframe width="560" height="315" src="https://www.youtube.com/embed/bbpAu-_cxK8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

## Settings

<ParamField body="schema_to_extract" type="object" required>
  You must specify **exactly one** of the following. Structured LLM extraction always runs against a schema: either you supply that schema directly (`json_schema`), or you supply plain-language instructions (`extraction_guidance`) and Unstructured derives the extraction schema from that text first.
</ParamField>

<ParamField body="schema_to_extract.json_schema" type="string">
  The extraction schema, in [OpenAI Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas) format, for the structured data that you want to extract, expressed as a single string. Per OpenAI's guidelines, the maximum supported JSON schema nesting depth is 10 levels.
</ParamField>

<ParamField body="schema_to_extract.extraction_guidance" type="string">
  Plain-language instructions describing what to extract, expressed as a single string. Unstructured derives an extraction schema from this text (OpenAI Structured Outputs format) before the LLM performs structured extraction.
</ParamField>

<ParamField body="output_mode" type="string">
  The mode in which to output the extracted data. Allowed values:

  * `elements_with_extracted_data`: Output the extracted data as JSON into an `extracted_data` field inside of `metadata` within a parent `DocumentData` element, followed by other built-in Unstructured document elements.
  * `extracted_data_only`: Output only the extracted data as JSON, without any parent `DocumentData` element or any other built-in Unstructured document elements.

  Default: `elements_with_extracted_data`.
</ParamField>

<ParamField body="provider" type="string" required>
  LLM provider to use. Allowed values: `anthropic`, `azure_openai`, `bedrock`, `openai`.
</ParamField>

<ParamField body="model" type="string" required>
  LLM model to use. For a full list of the models available in Unstructured, see [Available models](/api-reference/workflow/models).
</ParamField>

[Learn more](/concepts/structured-data-extractor/data-extractor).

<RequestExample>
  ```python Python SDK theme={null}
  extract_workflow_node = WorkflowNode(
      name="Extractor",
      subtype="llm",
      type="structured_data_extractor",
      settings={
          "schema_to_extract": {
              "json_schema": "<json-schema>",
              "extraction_guidance": "<extraction-guidance>"
          },
          "output_mode": "<elements_with_extracted_data | extracted_data_only>",
          "provider": "<provider>",
          "model": "<model>"
      }
  )
  ```

  ```json cURL theme={null}
  {
      "name": "Extractor",
      "type": "structured_data_extractor",
      "subtype": "llm",
      "settings": {
          "schema_to_extract": {
              "json_schema": "<json-schema>",
              "extraction_guidance": "<extraction-guidance>"
          },
          "output_mode": "<elements_with_extracted_data | extracted_data_only>",
          "provider": "<provider>",
          "model": "<model>"
      }
  }
  ```
</RequestExample>
