The structured data extractor supports two extraction methods — LLM and Regex — each suited to different document types and use cases. The extraction method setting determines how each field is populated.Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
How each extraction method works
Use this table to compare the two methods at a high level — how each processes your documents, the schema format it expects, and what the output looks like.| LLM | Regex | |
|---|---|---|
| Choose when | Values depend on context; you need nested or typed fields | Values follow a stable, recognizable pattern (for example: invoice numbers, dates, phone numbers) |
| How it works | A model reads meaning from text and populates schema-defined fields with inferred values | The extractor scans partitioned text for named patterns and returns matched strings |
| Schema format | JSON in OpenAI Structured Outputs format: named fields, types, descriptions, optional nesting | name / pattern pairs: a label and a regex for each capture field |
| Output structure | Typed fields — objects, arrays, numbers, booleans, and strings. See output examples. | An array of matched substrings per pattern name. See output examples. |
| Model selection | The provider and model are configurable | No model required — extraction uses a regex engine that matches patterns directly against partitioned text, not a language model |
Available options
The following table shows which options are available for each method. Links go to the detail pages where each option is described.| LLM | Regex | |
|---|---|---|
| Model selection — choose the LLM provider and model that powers your extraction | Yes | No |
| Visual schema builder and JSON upload / export — build your schema visually or import from a JSON file | Yes | Yes |
| Schema-only output toggle — return extracted fields only, without Unstructured document elements | Yes | Yes |
| Schema prompt — generate a schema from plain-language instructions | Yes | No |
| Extraction guidance — instruct the LLM how to format or normalize extracted values | Yes | No |
Next steps
To learn more about the options for each method:- Structured extraction with LLM — schema definition, model selection, schema prompt, and extraction guidance
- Structured extraction with Regex — schema definition, validation behavior, output examples, pattern examples, and tools for testing your patterns

