> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Using the structured data extractor

<Note>
  The following information applies only to the [Unstructured user interface (UI)](/ui/overview).

  To use the Unstructured API with the structured data extractor, see the API-related sections in:

  * [Structured Extraction with LLM](/concepts/structured-data-extractor/llm-options)
  * [Structured Extraction with Regex](/concepts/structured-data-extractor/regex-options)
</Note>

The [structured data extractor](/concepts/structured-data-extractor/data-extractor) works from two places in the Unstructured UI:

* From the **Start** page of your Unstructured account. This approach works only with a single file that is stored on your local machine. Follow the [Start page procedure](#use-the-structured-data-extractor-from-the-start-page) below.
* From the Unstructured **workflow editor**. This approach works with a single file that is stored on your local machine, or with any number of files that are stored in remote locations. Follow the [workflow editor procedure](#use-the-structured-data-extractor-from-the-workflow-editor) below.

<h2 id="use-the-structured-data-extractor-from-the-start-page">
  Use the structured data extractor from the Start page
</h2>

To have Unstructured [extract the data in a custom-defined format](/concepts/structured-data-extractor/data-extractor#custom-defined-output) for a single file that is stored on your local machine, do the following from the **Start** page:

1. Sign in to your Unstructured account, if you are not already signed in.

2. On the sidebar, click **Start**, if the **Start** page is not already showing.

3. In the **Welcome, get started right away!** tile, do one of the following:

   * To use a file on your local machine, click **Browse files** and then select the file, or drag and drop the file onto **Drop file to test**.

     <Note>
       If you use a local file, the file must be 10 MB or less in size.
     </Note>

   * To use a sample file provided by Unstructured, click one of the sample files that are shown, such as **realestate.pdf**.

4. After Unstructured partitions the selected file into Unstructured's document element format, click **Update results** to
   have Unstructured apply generative enrichments, such as [image descriptions](/concepts/enriching/image-descriptions) and
   [generative OCR](/concepts/enriching/generative-ocr), to those document elements.

5. In the title bar, next to **Transform**, click **Extract**.

6. In the **Define Schema** pane, choose an extraction **Method**: **LLM** (the default) or **Regex**. For a comparison of the two methods, see [Choose an extraction method: LLM or Regex](/concepts/structured-data-extractor/choose-extraction-method).

7. **LLM only:** Do one of the following to define your schema:

   * To use a schema that Unstructured suggests for the selected file, click **Run Schema**.
   * To use a custom schema that conforms to the [OpenAI Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas) guidelines,
     click **Upload JSON**; enter your own custom schema or upload a JSON file that contains your custom schema; click **Use this Schema**; and then click **Run Schema**.
     [Learn about the OpenAI Structured Outputs format](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas).
   * To use a visual editor to define the schema, click the ellipses (three dots) icon; click **Reset form**; enter your own custom schema objects and their properties;
     and then click **Run Schema**. [Learn about OpenAI Structured Outputs data types](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas).
   * To use a plain language prompt, click **Suggest**; enter your prompt in the **Prompt a Schema** dialog; click **Generate schema**; make any changes as needed; and then click **Run Schema**. For more information, see [Plain language in a schema prompt](/concepts/structured-data-extractor/llm-options#prompt-a-schema).

8. **Regex only:** Do one of the following to define your schema:

   * To use the visual schema builder, for each field enter the values below. Add additional fields as needed.

     * **Pattern name** – a descriptive label for the field.
     * **Regular expression** – a regex pattern.

   * To import an existing schema, click **Upload JSON**.

   To learn more about Regex specifications, example formats, and validation tools, see [Regex-based extraction](/concepts/structured-data-extractor/regex-options).

9. The extracted data appears in the **Extract Results** pane. You can do one of the following:

   * To view a formatted view of the extracted data, click **Formatted**.
   * To view the JSON representation of the extracted data, click **JSON**.
   * To download the JSON representation of the extracted data as a local JSON file, click the download icon next to **Formatted** and **JSON**.
   * To change the schema and then re-run the extraction, click the back arrow next to **Extract Results**, and then skip back to step 7 or 8 in this procedure.

<h2 id="use-the-structured-data-extractor-from-the-workflow-editor">
  Use the structured data extractor from the workflow editor
</h2>

To have Unstructured [extract the data in a custom-defined format](/concepts/structured-data-extractor/data-extractor#custom-defined-output) for a single file that is stored on your local machine, or with any number of files that are stored in remote locations, do the following from the workflow editor:

1. If you already have an Unstructured workflow that you want to use, open it in the workflow editor. Otherwise, create a new
   workflow as follows:

   a. Sign in to your Unstructured account, if you are not already signed in.<br />
   b. On the sidebar, click **Workflows**.<br />
   c. Click **New Workflow +**.<br />
   d. With **Build it Myself** already selected, click **Continue**. The workflow editor appears.<br />

2. Add an **Extract** node to your Unstructured workflow. This node must be added right before the workflow's **Destination** node.
   To add this node, in the workflow designer, click the **+** (add node) button immediately before the **Destination** node, and then click **Enrich > Extract**.

3. Click the newly added **Extract** node to select it.

4. In the **structured data extractor** settings pane, under **Schema**, choose an extraction method: **LLM** (the default) or **Regex**. For a comparison of the two methods, see [Choose an extraction method: LLM or Regex](/concepts/structured-data-extractor/choose-extraction-method).

5. Under **Output settings**, switch **Schema-Only Output** on to return only the extracted fields, or off (the default) to also include Unstructured's document elements and metadata.

6. **LLM only:** For **Model**, select your provider and model. For a full list of the models available in Unstructured, see [Available models](/api-reference/workflow/models).

7. **LLM only:** To specify the custom schema, do one of the following:

   * To use a custom schema that conforms to the [OpenAI Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas) guidelines,
     click **Upload JSON**; enter your own custom schema or upload a JSON file that contains your custom schema; and then click **Use this Schema**.
     [Learn about the OpenAI Structured Outputs format](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas).
   * To use a visual editor to define the schema, enter your own custom schema objects and their properties. To clear the current schema and start over,
     click the ellipses (three dots) icon, and then click **Reset form**.
     [Learn about OpenAI Structured Outputs data types](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas).

8. **LLM only:** The **Extraction Guidance Prompt** is available. Click **+ Add Prompt** to display the dialog, where you can add plain-language instructions that tell the LLM how to format, normalize, or summarize extracted values. When you are ready to save, click **Save Prompt**. For more information, see [Extraction guidance](/concepts/structured-data-extractor/llm-options#extraction-guidance-workflow-editor).

9. **Regex only:** To specify the custom schema, do one of the following:

   * To use the visual schema builder, for each field enter the values below. Add additional fields as needed.

     * **Pattern name** – a descriptive label for the field.
     * **Regular expression** – a regex pattern.

   * To import an existing schema, click **Upload JSON**.

   To learn more about Regex specifications, example formats, and validation tools, see [Regex-based extraction](/concepts/structured-data-extractor/regex-options).

10. To see the results of the structured data extractor, do one of the following:

* If you have already selected a local file as input to your workflow, click **Test** immediately above the **Source** node. Results appear in the **Test output** pane on-screen.
* If you are using source and destination connectors for your workflow, [run the workflow as a job](/ui/jobs#run-a-job),
  [monitor the job](/ui/jobs#monitor-a-job), and then examine the job's results in your destination location.
