Choose an extraction method: LLM or Regex

How each extraction method works

Use this table to compare the two methods at a high level — how each processes your documents, the schema format it expects, and what the output looks like.

	LLM	Regex
Choose when	Values depend on context; you need nested or typed fields	Values follow a stable, recognizable pattern (for example: invoice numbers, dates, phone numbers)
How it works	A model reads meaning from text and populates schema-defined fields with inferred values	The extractor scans partitioned text for named patterns and returns matched strings
Schema format	JSON in OpenAI Structured Outputs format: named fields, types, descriptions, optional nesting	`name` / `pattern` pairs: a label and a regex for each capture field
Output structure	Typed fields — objects, arrays, numbers, booleans, and strings. See output examples.	An array of matched substrings per pattern name. See output examples.
Model selection	The provider and model are configurable	No model required — extraction uses a regex engine that matches patterns directly against partitioned text, not a language model

If both methods could fit, run a small sample with each and compare quality and maintenance cost before you standardize on one.

Available options

The following table shows which options are available for each method. Links go to the detail pages where each option is described.

	LLM	Regex
Model selection — choose the LLM provider and model that powers your extraction	Yes	No
Visual schema builder and JSON upload / export — build your schema visually or import from a JSON file	Yes	Yes
Schema-only output toggle — return extracted fields only, without Unstructured document elements	Yes	Yes
Schema prompt — generate a schema from plain-language instructions	Yes	No
Extraction guidance — instruct the LLM how to format or normalize extracted values	Yes	No

Next steps

To learn more about the options for each method:

Structured extraction with LLM — schema definition, model selection, schema prompt, and extraction guidance

Structured extraction with Regex — schema definition, validation behavior, output examples, pattern examples, and tools for testing your patterns

To go straight to step-by-step procedures for using either method:

Unstructured UI

Getting started with the UI

Using the UI

Concepts

Choose an extraction method: LLM or Regex

How each extraction method works

Available options

Next steps

Unstructured UI

Getting started with the UI

Using the UI

Concepts

Documentation Index

​How each extraction method works

​Available options

​Next steps

How each extraction method works

Available options

Next steps