> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured Extraction with LLM

When the extraction method is **LLM**, a model reads meaning from your documents and populates schema-defined fields with inferred values. This page covers those options — schema definition, model selection, schema prompt, and extraction guidance. To compare LLM and Regex before choosing, see [Choose an extraction method](/concepts/structured-data-extractor/choose-extraction-method).

* For **Unstructured UI** users, see [Unstructured UI settings for structured extraction with LLM](#unstructured-ui-settings-for-structured-extraction-with-llm)
* For **Unstructured API** users, see [Unstructured API Settings for structured extraction with LLM](#unstructured-api-settings-for-structured-extraction-with-llm)

## Unstructured UI settings for structured extraction with LLM

The following sections describe how to use the [Unstructured user interface (UI)](/ui/overview) to specify settings for
structured extraction with LLM.

### Define your schema (UI only)

In the Unstructured UI, you can build your extraction schema directly in the visual schema builder, or generate a starting point from a plain-language prompt. Once generated, you can refine the schema in the builder and export it as JSON. Be aware that generating a new schema from the plain-language prompt will overwrite any existing builder content.

<Tip>If you already have a schema in the visual schema builder and want to try generating one from a plain-language prompt, export your current schema to a JSON file first. You can upload it again later if you prefer the original.</Tip>

<h3 id="upload-a-json-file">
  Visual schema builder and JSON upload/export (UI only)
</h3>

In the Unstructured UI, on the **Start** page or in the [workflow editor](/ui/workflows#create-a-custom-workflow), you can access the visual schema builder in the **Define Schema** view. From there you can:

* **Upload** a JSON file to the editor.
* **Edit** the fields in the schema directly in the editor.
* **Export** the schema you have defined to a JSON file for reuse.

An extraction schema is a JSON-formatted schema that defines the structure of the data that Unstructured extracts. If you already have an extraction schema defined in a JSON file, you can click **Upload JSON** to upload the file to Unstructured.

<Note>The schema must conform to the [OpenAI Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas) guidelines, which are a subset of the [JSON Schema](https://json-schema.org/docs) language.</Note>

The following shows the extraction schema for the sample real estate listing — first in the visual schema builder, then as a JSON schema file.

**The LLM visual schema builder:**

<img src="https://mintcdn.com/unstructured-53/7hVji782dj7Jt1Mr/img/ui/data-extractor/schema-editor-llm-export-schema.png?fit=max&auto=format&n=7hVji782dj7Jt1Mr&q=85&s=3ea22f4330b7d8088ee9fa031224f141" alt="LLM visual schema builder showing an extraction schema with the Export schema as JSON option" width="721" height="572" data-path="img/ui/data-extractor/schema-editor-llm-export-schema.png" />

**JSON schema file:**

```json theme={null}
{
  "type": "object",
  "properties": {
    "street_address": {
      "type": "string",
      "description": "The full street address of the property including street number, street name, city, state, and postal code"
    },
    "square_footage": {
      "type": "number",
      "description": "The total living space area of the property, in square feet"
    },
    "price": {
      "type": "number",
      "description": "The listed selling price of the property, in local currency"
    },
    "features": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "A list of property features and highlights"
    },
    "agent_contact": {
      "type": "object",
      "properties": {
        "phone": {
          "type": "string",
          "description": "The agent's contact phone number"
        }
      },
      "required": [
        "phone"
      ],
      "additionalProperties": false,
      "description": "Contact information for the real estate agent"
    }
  },
  "additionalProperties": false,
  "required": [
    "street_address",
    "square_footage",
    "price",
    "features",
    "agent_contact"
  ]
}
```

<h3 id="prompt-a-schema">
  Plain language in a schema prompt (UI only)
</h3>

The Unstructured UI allows you to specify your extraction schema with a schema prompt instead
of by using a visual schema designer or a JSON schema.

A **schema prompt** is plain-language instructions that describe what to extract from your documents, similar to a prompt you would give a chatbot or AI agent. Unstructured *generates* an extraction schema from those instructions: a structured definition (fields, types, and constraints) that guides extraction from the source documents.

<Note>This option is only available from the **Start** page.</Note>

From the **Start** page click **Suggest**, enter your prompt in the **Prompt a Schema** dialog, then click **Generate schema**. Following your prompt instructions, Unstructured will generate a schema that will display in the visual schema builder.

<img src="https://mintcdn.com/unstructured-53/7hVji782dj7Jt1Mr/img/ui/data-extractor/prompt-schema.png?fit=max&auto=format&n=7hVji782dj7Jt1Mr&q=85&s=8040bd0d5a070130c7cddf6c7b92e7ec" alt="Prompt a Schema dialog showing a plain-language prompt for a real estate listing" width="619" height="296" data-path="img/ui/data-extractor/prompt-schema.png" />

<Warning>Selecting **Generate schema** overwrites the existing schema that's displayed in the **Define Schema** pane.  If you'd like to save the current schema before generating a new one, click the ellipses (three dots) icon, then click **Export schema as JSON**.</Warning>

The generated schema displays in the visual schema builder. You can continue to edit the schema from the visual schema builder if you wish.

For this real estate listing example, you might enter the following prompt:

```text theme={null}
Extract the following information from the listing, and present it in the following format:

- street_address: The full street address of the property including street number, street name, city, state, and postal code.
- square_footage: The total living space area of the property, in square feet.
- price: The listed selling price of the property, in local currency.
- features: A list of property features and highlights.
- agent_contact: Contact information for the real estate agent.
- phone: The agent's contact phone number.
```

The following image shows the generated schema that displays in the visual schema builder.

<img src="https://mintcdn.com/unstructured-53/7hVji782dj7Jt1Mr/img/ui/data-extractor/schema-editor-llm-from-prompt.png?fit=max&auto=format&n=7hVji782dj7Jt1Mr&q=85&s=4fb3cec88698693d2028632d189f9e22" alt="LLM visual schema builder displaying a schema generated from a plain-language prompt" width="694" height="572" data-path="img/ui/data-extractor/schema-editor-llm-from-prompt.png" />

<h3 id="select-llm-provider-and-model">
  Select your LLM provider and model (UI only)
</h3>

In the Unstructured UI, you can select a provider and model for the **LLM** extraction method. For **Model**, select your provider and model from the drop-down.

<img src="https://mintcdn.com/unstructured-53/7hVji782dj7Jt1Mr/img/ui/data-extractor/select-model.png?fit=max&auto=format&n=7hVji782dj7Jt1Mr&q=85&s=a34c9382cd79fe52ebdb6a1c033f4da5" alt="Provider and model selection dropdown in the workflow editor Extract node" width="484" height="162" data-path="img/ui/data-extractor/select-model.png" />

<Note>This option is only available from the [workflow editor](/ui/workflows#create-a-custom-workflow).</Note>

### Configure your output (UI only)

In the Unstructured UI, once your schema determines which fields to extract and what types they return, settings control what the output looks like. **Schema-only output** lets you strip away Unstructured's document elements and return just the extracted fields. **Extraction guidance** lets you tell the LLM how to format, normalize, or summarize values into the fields your schema defines.

<h3 id="schema-only-output-llm">
  Schema-only output (UI only)
</h3>

In the Unstructured UI, the **Schema-Only Output** setting controls whether
Unstructured's document elements are stripped away and returns just the extracted fields.

The **Schema-Only Output** setting applies to both the LLM and Regex extraction methods. In the **workflow editor**, select the workflow’s **Extract** node. Under **Output settings**, you can set **Schema-Only Output** to ON or OFF whenever you edit the workflow.

* When **Schema-Only Output** is ON, the **Extract** node returns only the JSON produced for your explicitly defined fields.

  In workflow JSON, that is the **extracted data only** layout from [Custom defined output](/concepts/structured-data-extractor/data-extractor#custom-defined-output) (no surrounding Unstructured element list).
* When **Schema-Only Output** is OFF (the default), Unstructured also emits the usual document elements and metadata alongside those extracted values.

  In workflow JSON, that is the **elements with extracted data** layout from the same [Custom defined output](/concepts/structured-data-extractor/data-extractor#custom-defined-output) section (structured fields under `DocumentData` plus the rest of the element list).

<img src="https://mintcdn.com/unstructured-53/7hVji782dj7Jt1Mr/img/ui/data-extractor/schema-only-output-toggle.png?fit=max&auto=format&n=7hVji782dj7Jt1Mr&q=85&s=42b9e7f1be1bcbbd1558cbb6b981eefa" alt="Schema-Only Output toggle in the Extract node Output settings" width="572" height="113" data-path="img/ui/data-extractor/schema-only-output-toggle.png" />

<Note>This option is only available from the **workflow editor**.</Note>

<h3 id="extraction-guidance-workflow-editor">
  Extraction guidance (UI only)
</h3>

In the Unstructured UI, in the [workflow editor](/ui/workflows#create-a-custom-workflow), use the **Extraction Guidance Prompt** to tell the LLM how to format, normalize, or present values *after* your schema defines which fields to extract.

<Note>This option is only available from the **workflow editor**.</Note>

The schema still defines *what* to extract (fields, types, and constraints). *Extraction guidance* adds plain-language direction for *how* to format, normalize, or summarize that output when JSON Schema alone is not enough. For example, you can ask the model to standardize addresses, return dates in a consistent format, or summarize long text into a predefined field. You can save this guidance in the **workflow editor** with the **Extract** node settings and with the workflow you're defining, so later runs, including API operations against that workflow, use the same guidance.

You can add or revise an **Extraction Guidance Prompt** in the **workflow editor** after you add or select the **Extract** node. From the **structured data extractor**, click **+ Add Prompt** to enter plain-language instructions for how the LLM should format or present values after your schema has defined the fields. Saving writes the prompt into the node's settings. Extracted values must still conform to the schema; the prompt only describes presentation and cleanup on top of that contract. You can edit and save the extraction guidance again as you iterate.

<img src="https://mintcdn.com/unstructured-53/7hVji782dj7Jt1Mr/img/ui/data-extractor/extraction-guidance-prompt.png?fit=max&auto=format&n=7hVji782dj7Jt1Mr&q=85&s=ce25acb9b77b8511e69e2965f8596bd6" alt="Extraction Guidance Prompt field in the workflow editor Extract node" width="619" height="296" data-path="img/ui/data-extractor/extraction-guidance-prompt.png" />

## Unstructured API settings for structured extraction with LLM

The following sections describe how to use the [Unstructured API](/api-reference/overview) to specify settings for
structured extraction with LLM.

<h3 id="api-settings">
  Define your schema (API only)
</h3>

An *extraction schema* is a JSON-formatted schema that defines the structure of the data that Unstructured extracts.

<Note>The schema must conform to the [OpenAI Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas) guidelines, which are a subset of the [JSON Schema](https://json-schema.org/docs) language.</Note>

To specify an extraction schema with the Unstructured API, use the [LLM method of an Extract node](/api-reference/workflow/nodes/extract/extract-llm).
In this node, set the `schema_to_extract.json_schema` key in the `settings` object
as either as an object in a `workflow_nodes` array
(for curl) or as a `WorkflowNode` in a `WorkflowNodes` collection (for Python). This object or collection applies whenever you
[create a workflow](/api-reference/api/workflow/create-workflow),
[update a workflow](/api-reference/api/workflow/update-workflow), or
[create an on-demand workflow job](/api-reference/api/job/create-job).

### Specify your LLM provider and model (API only)

You must specify an LLM provider and model for Unstructured to perform the extraction. To do this with the Unstructured API,
use the [LLM method of an Extract node](/api-reference/workflow/nodes/extract/extract-llm).
In this node, set the `provider` and `model` keys in the `settings` object
as either as an object in a `workflow_nodes` array
(for curl) or as a `WorkflowNode` in a `WorkflowNodes` collection (for Python). This object or collection applies whenever you
[create a workflow](/api-reference/api/workflow/create-workflow),
[update a workflow](/api-reference/api/workflow/update-workflow), or
[create an on-demand workflow job](/api-reference/api/job/create-job).

### Configure your output (API only)

Once your schema determines which fields to extract and what types they return, settings control what the output looks like. **Schema-only output** lets you strip away Unstructured's document elements and return just the extracted fields. **Extraction guidance** lets you tell the LLM how to format, normalize, or summarize values into the fields your schema defines.

### Schema-only output (API only)

You can use the `output_mode` setting with the Unstructured API to control whether
Unstructured's document elements are stripped away and returns just the extracted fields:

* Set `output_mode` to `extracted_data_only` to output only the extracted data as JSON, without any parent `DocumentData` element or any other built-in Unstructured document elements.
* Set `output_mode` to `elements_with_extracted_data` to output the extracted data as JSON, inside of a parent `DocumentData` element. This element is also
  included with any other built-in Unstructured document elements.

To specify this setting, use the [LLM method of an Extract node](/api-reference/workflow/nodes/extract/extract-llm).
In this node, set the `output_mode` key in the `settings` object.
You set this object as either as an object in a `workflow_nodes` array
(for curl) or as a `WorkflowNode` in a `WorkflowNodes` collection (for Python). This object or collection applies whenever you
[create a workflow](/api-reference/api/workflow/create-workflow),
[update a workflow](/api-reference/api/workflow/update-workflow), or
[create an on-demand workflow job](/api-reference/api/job/create-job).

### Extraction guidance (API only)

You can use the **Extraction Guidance Prompt** setting with the Unstructured API to tell the LLM how to format, normalize, or present values *after* your schema defines which fields to extract.

To specify this setting, use the [LLM method of an Extract node](/api-reference/workflow/nodes/extract/extract-llm).
In this node, set the `schema_to_extract.extraction_guidance` key in the `settings` object
as either as an object in a `workflow_nodes` array
(for curl) or as a `WorkflowNode` in a `WorkflowNodes` collection (for Python). This object or collection applies whenever you
[create a workflow](/api-reference/api/workflow/create-workflow),
[update a workflow](/api-reference/api/workflow/update-workflow), or
[create an on-demand workflow job](/api-reference/api/job/create-job).

### API limitations

The Unstructured API does not support the following options for structured extraction with LLM. To use either of these options, you must use the Unstructured user interface instead. To learn how, see the following links:

* [Visual schema builder and JSON upload/export (UI only)](#upload-a-json-file)
* [Plain language in a schema prompt (UI only)](#prompt-a-schema)
