The structured data extractor works from two places in the Unstructured UI:Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
- From the Start page of your Unstructured account. This approach works only with a single file that is stored on your local machine. Follow the Start page procedure below.
- From the Unstructured workflow editor. This approach works with a single file that is stored on your local machine, or with any number of files that are stored in remote locations. Follow the workflow editor procedure below.
Use the structured data extractor from the Start page
To have Unstructured extract the data in a custom-defined format for a single file that is stored on your local machine, do the following from the Start page:- Sign in to your Unstructured account, if you are not already signed in.
- On the sidebar, click Start, if the Start page is not already showing.
-
In the Welcome, get started right away! tile, do one of the following:
-
To use a file on your local machine, click Browse files and then select the file, or drag and drop the file onto Drop file to test.
If you use a local file, the file must be 10 MB or less in size.
- To use a sample file provided by Unstructured, click one of the sample files that are shown, such as realestate.pdf.
-
To use a file on your local machine, click Browse files and then select the file, or drag and drop the file onto Drop file to test.
- After Unstructured partitions the selected file into Unstructured’s document element format, click Update results to have Unstructured apply generative enrichments, such as image descriptions and generative OCR, to those document elements.
- In the title bar, next to Transform, click Extract.
- In the Define Schema pane, choose an extraction Method: LLM (the default) or Regex. For a comparison of the two methods, see Choose an extraction method: LLM or Regex.
-
LLM only: Do one of the following to define your schema:
- To use a schema that Unstructured suggests for the selected file, click Run Schema.
- To use a custom schema that conforms to the OpenAI Structured Outputs guidelines, click Upload JSON; enter your own custom schema or upload a JSON file that contains your custom schema; click Use this Schema; and then click Run Schema. Learn about the OpenAI Structured Outputs format.
- To use a visual editor to define the schema, click the ellipses (three dots) icon; click Reset form; enter your own custom schema objects and their properties; and then click Run Schema. Learn about OpenAI Structured Outputs data types.
- To use a plain language prompt, click Suggest; enter your prompt in the Prompt a Schema dialog; click Generate schema; make any changes as needed; and then click Run Schema. For more information, see Plain language in a schema prompt.
-
Regex only: Do one of the following to define your schema:
-
To use the visual schema builder, for each field enter the values below. Add additional fields as needed.
- Pattern name – a descriptive label for the field.
- Regular expression – a regex pattern.
- To import an existing schema, click Upload JSON.
-
To use the visual schema builder, for each field enter the values below. Add additional fields as needed.
-
The extracted data appears in the Extract Results pane. You can do one of the following:
- To view a formatted view of the extracted data, click Formatted.
- To view the JSON representation of the extracted data, click JSON.
- To download the JSON representation of the extracted data as a local JSON file, click the download icon next to Formatted and JSON.
- To change the schema and then re-run the extraction, click the back arrow next to Extract Results, and then skip back to step 7 or 8 in this procedure.
Use the structured data extractor from the workflow editor
To have Unstructured extract the data in a custom-defined format for a single file that is stored on your local machine, or with any number of files that are stored in remote locations, do the following from the workflow editor:-
If you already have an Unstructured workflow that you want to use, open it in the workflow editor. Otherwise, create a new
workflow as follows:
a. Sign in to your Unstructured account, if you are not already signed in.
b. On the sidebar, click Workflows.
c. Click New Workflow +.
d. With Build it Myself already selected, click Continue. The workflow editor appears.
- Add an Extract node to your Unstructured workflow. This node must be added right before the workflow’s Destination node. To add this node, in the workflow designer, click the + (add node) button immediately before the Destination node, and then click Enrich > Extract.
- Click the newly added Extract node to select it.
- In the structured data extractor settings pane, under Schema, choose an extraction method: LLM (the default) or Regex. For a comparison of the two methods, see Choose an extraction method: LLM or Regex.
- Under Output settings, switch Schema-Only Output on to return only the extracted fields, or off (the default) to also include Unstructured’s document elements and metadata.
- LLM only: For Model, select your provider and model.
-
LLM only: To specify the custom schema, do one of the following:
- To use a custom schema that conforms to the OpenAI Structured Outputs guidelines, click Upload JSON; enter your own custom schema or upload a JSON file that contains your custom schema; and then click Use this Schema. Learn about the OpenAI Structured Outputs format.
- To use a visual editor to define the schema, enter your own custom schema objects and their properties. To clear the current schema and start over, click the ellipses (three dots) icon, and then click Reset form. Learn about OpenAI Structured Outputs data types.
- LLM only: The Extraction Guidance Prompt is available. Click + Add Prompt to display the dialog, where you can add plain-language instructions that tell the LLM how to format, normalize, or summarize extracted values. When you are ready to save, click Save Prompt. For more information, see Extraction guidance.
-
Regex only: To specify the custom schema, do one of the following:
-
To use the visual schema builder, for each field enter the values below. Add additional fields as needed.
- Pattern name – a descriptive label for the field.
- Regular expression – a regex pattern.
- To import an existing schema, click Upload JSON.
-
To use the visual schema builder, for each field enter the values below. Add additional fields as needed.
- To see the results of the structured data extractor, do one of the following:
- If you have already selected a local file as input to your workflow, click Test immediately above the Source node. Results appear in the Test output pane on-screen.
- If you are using source and destination connectors for your workflow, run the workflow as a job, monitor the job, and then examine the job’s results in your destination location.

