The structured data extractor supports two extraction methods — LLM and Regex — each suited to different document types and use cases. The extraction method setting determines how each field is populated.
Use this table to compare the two methods at a high level — how each processes your documents, the schema format it expects, and what the output looks like.
| LLM | Regex |
|---|
| Choose when | Values depend on context; you need nested or typed fields | Values follow a stable, recognizable pattern (for example: invoice numbers, dates, phone numbers) |
| How it works | A model reads meaning from text and populates schema-defined fields with inferred values | The extractor scans partitioned text for named patterns and returns matched strings |
| Schema format | JSON in OpenAI Structured Outputs format: named fields, types, descriptions, optional nesting | name / pattern pairs: a label and a regex for each capture field |
| Output structure | Typed fields — objects, arrays, numbers, booleans, and strings. See output examples. | An array of matched substrings per pattern name. See output examples. |
| Model selection | The provider and model are configurable | No model required — extraction uses a regex engine that matches patterns directly against partitioned text, not a language model |
If both methods could fit, run a small sample with each and compare quality and maintenance cost before you standardize on one.
Available options
The following table shows which options are available for each method. Links go to the detail pages where each option is described.
| LLM | Regex |
|---|
| Model selection — choose the LLM provider and model that powers your extraction | Yes | No |
| Visual schema builder and JSON upload / export — build your schema visually or import from a JSON file | Yes | Yes |
| Schema-only output toggle — return extracted fields only, without Unstructured document elements | Yes | Yes |
| Schema prompt — generate a schema from plain-language instructions | Yes | No |
| Extraction guidance — instruct the LLM how to format or normalize extracted values | Yes | No |
Next steps
To learn more about the options for each method:
To go straight to step-by-step procedures for using either method: