Unstructured UI quickstarts

This page contains two quickstarts to help you get started with the Unstructured user interface (UI).

The local quickstart is an ideal starting point and requires no prior setup. It shows you how to use the Unstructured UI to process a single file that is stored on your local machine. This approach is ideal for rapid testing and prototyping of Unstructured ETL+ workflows, across the full range of Unstructured features, against a single representative file.
The remote quickstart takes a bit longer, but it shows you how to use the UI to set up and run Unstructured ETL+ workflows against files and data stored in remote source file and object stores, websites, databases, and vector stores. Unstructured delivers the resulting processed data to remote destination file and object stores, databases, and vector stores. This approach is ideal for production use cases, where you want to process large volumes of files and data in a scalable and efficient manner.

If you’re not sure which quickstart to use, we recommend starting with the local quickstart.The local quickstart is a fast and easy way to learn about Unstructured. When you’re happy with the results from the workflow that you create there, it is straightforward to turn it into a production-ready workflow with all of the settings that you want already in place!

Local quickstart

This quickstart uses a no-code, point-and-click user interface (UI) in your web browser to have Unstructured process a single file that is stored on your local machine. The file is first processed on Unstructured-hosted compute resources. The UI then shows the processed data that Unstructured generates for that file. You can download that processed data as a .json file to your local machine. This approach enables rapid, local, run-adjust-repeat prototyping of end-to-end Unstructured ETL+ workflows with a full range of Unstructured features. After you get the results you want, you can then attach remote source and destination connectors to both ends of your existing workflow to begin processing remote files and data at scale in production. To run this quickstart, you will need a local file with a size of 10 MB or less and one of the following file types:

File type
`.bmp`
`.csv`
`.doc`
`.docx`
`.email`
`.epub`
`.heic`
`.html`
`.jpg`
`.md`
`.odt`
`.org`
`.pdf`
`.pot`
`.potm`
`.ppt`
`.pptm`
`.pptx`
`.rst`
`.rtf`
`.sgl`
`.tiff`
`.txt`
`.tsv`
`.xls`
`.xlsx`
`.xml`

For processing remote files at scale in production, Unstructured supports many more files types than these. See the list of supported file types.Unstructured also supports processing files from remote object stores, and data from remote sources in websites, web apps, databases, and vector stores. For more information, see the source connector overview and the remote quickstart for how to set up and run production-ready Unstructured ETL+ workflows at scale.

If you do not have any files available, you can use one of the sample files that Unstructured offers in the UI. Or, you can download one or more sample files from the example-docs folder in the Unstructured repo on GitHub.

If you do not already have an Unstructured account, go to https://unstructured.io/contact and fill out the online form to indicate your interest.
If you already have an Unstructured account, sign in by using the URL of the sign in page that Unstructured provided to you when your Unstructured account was created. After you sign in, the Unstructured user interface (UI) then appears, and you can start using it right away. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.

Create a workflow

In the Unstructured UI, on the sidebar, click Workflows.
Click New Workflow.
Select Build it Myself, if it is not already selected.
Click Continue. The visual workflow editor appears. The workflow is represented visually as a series of directed acyclic graph (DAG) nodes. Each node represents a step in the workflow. The workflow proceeds end to end from left to right. By default, the workflow starts with three nodes:
- Source: This node represents the location where you have your files or data for Unstructured to process. For this quickstart, this node represents a single file on your local machine. After you get the results you want, you can update this node to represent files or data in a remote location at scale in production.
- Partitioner: This node represents the partitioning step, which extracts content from unstructured files and data and outputs it as structured document elements for consistent representation across varying kinds of file and data types. For this quickstart, this node extracts the contents of a single file on your local machine and outputs it as a series of structured document elements in JSON format.
- Destination: This node represents the location where you want Unstructured to put the processed files or data. After you get the results you want, you can update this node to have Unstructured put the processed files or data into a remote location at scale in production.

Process a local file

Drag the file that you want Unstructured to process from your local machine’s file browser app and drop it into the Source node’s Drop file to test area. The file must have a size of 10 MB or less and one of the file types listed at the beginning of this quickstart. If you are not able to drag and drop the file, you can click Drop file to test and then browse to and select the file instead. Alternatively, you can use a sample file that Unstructured offers. To do this, click the Source node, and then in the Source pane, with Details selected, on the Local file tab, click one of the files under Or use a provided sample file. To view the file’s contents before you select it, click the eyes button next to the file.
Above the Source node, click Test. Unstructured displays a visual representation of the file and begins processing its contents, sending it through each of the workflow’s nodes in sequence. Depending on the file’s size and the workflow’s complexity, this processing could take several minutes. After Unstructured has finished its processing, the processed data appears in the Test output pane, as a series of structured elements in JSON format.
In the Test output pane, you can:
- Search through the processed, JSON-formatted representation of the file by using the Search JSON box.
- Download the full JSON as a .json file to your local machine by clicking Download full JSON.
When you are done, click the Close button in the Test output pane.

Add more nodes to the workflow

You can now add more nodes to the workflow to do further testing of various Unstructured features and with the option of eventually moving the workflow into production. For example, you can:
- Add an Enrichment node after the Partitioner node, to apply enrichments to the partitioned data such as image summaries, table summaries, table-to-HTML transforms, and named entity recognition (NER). To do this, click the add (+) button to the right of the Partitioner node, and then click Enrich > Enrichment. Click the new Enrichment node and specify its settings. For help, click the FAQ button in the Enrichment node’s pane. Learn more about enrichments and enrichment settings.
  Unstructured can potentially generate image summary descriptions, table summary descriptions, and table-to-HTML output only for workflows that are configured as follows:
  - With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node, table summary description node, or table-to-HTML output node is added.
  - With a Partitioner node set to use the VLM partitioning strategy. No image summary description node, table summary description node, or table-to-HTML output node is needed (or allowed).
  Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible for processing with the following partitioning strategies:
  - High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
  - VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
  Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:
  - With a Partitioner node set to use the Fast partitioning strategy.
  - With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.
- Add a Chunker node after the Enrichment node, to chunk the enriched data into smaller pieces for your retrieval-augmented generation (RAG) applications. To do this, click the add (+) button to the right of the Enrichment node, and then click Enrich > Chunker. Click the new Chunker node and specify its settings. For help, click the FAQ button in the Chunker node’s pane. Learn more about chunking and chunker settings.
- Add an Embedder node after the Chunker node, to generate vector embeddings for performing vector-based searches. To do this, click the add (+) button to the right of the Chunker node, and then click Transform > Embedder. Click the new Embedder node and specify its settings. For help, click the FAQ button in the Embedder node’s pane. Learn more about embedding and embedding settings.
Each time you add a node or change its settings, you can click Test above the Source node again to test the current workflow end to end and see the results of the changes, if any.
Keep repeating this step as many times as you want, until you get the results you want.

Next steps

After you get the results you want, you have the option of moving your workflow into production. To do this, complete the following instructions.

The following instructions have you create a new workflow that is suitable for production. This behavior is planned to be fixed in a future release, allowing you to update the workflow that you just created, rather than needing to create a new one.

With your workflow remaining open in the visual workflow editor, open a new tab in your web browser, and in this new tab, sign in to your Unstructured account:
- If you do not already have an Unstructured account, go to https://unstructured.io/contact and fill out the online form to indicate your interest.
- If you already have an Unstructured account, sign in by using the URL of the sign in page that Unstructured provided to you when your Unstructured account was created. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.
In this new tab, create a source connector for your remote source location. This is the location in production where you have files or data in a file or object store, website, database, or vector store that you want Unstructured to process.
Create a destination connector for your remote destination location. This is the location in production where you want Unstructured to put the processed data as .json files in a file or object store, or as records in a database or vector store.
Create a workflow: on the sidebar, click Workflows, and then click New Workflow. Select Build it Myself, and then click Continue to open the visual workflow editor.
In the visual workflow editor, click Source.
In the Source pane, with Details selected, on the Connectors tab, select the source connector that you just created.
Click the Destination node.
In the Destination pane, with Details selected, select the destination connector that you just created.
Using your original workflow on the other tab as a guide, add any additional nodes to this new workflow as needed, and configure those new nodes’ settings to match the other ones.
Click Save.
To run the workflow: a. Make sure to click Save first.
b. Click the Close button next to the workflow’s name in the top navigation bar.
c. On the sidebar, click Workflows.
d. In the list of available workflows, click the Run button for the workflow that you just saved.
e. On the sidebar, click Jobs.
f. In the list of available jobs, click the job that you just ran.
g. After the job status shows Finished, go to the your destination location to see the processed files or data that Unstructured put there.

See also the remote quickstart for more coverage about how to set up and run production-ready Unstructured ETL+ workflows at scale.

To learn more, read Skip the Setup. Get Straight to the Results with Our Redesigned Interactive Workflow Builder.

Remote quickstart

This quickstart uses a no-code, point-and-click user interface in your web browser to get all of your data RAG-ready. Data is processed on Unstructured-hosted compute resources. The requirements are as follows.

A compatible source (input) location that contains your data for Unstructured to process. See the list of supported source types.
For document-based source locations, compatible files in that location. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
A compatible destination (output) location for Unstructured to put the processed data. See the list of supported destination types.

If you do not already have an Unstructured account, go to https://unstructured.io/contact and fill out the online form to indicate your interest.
If you already have an Unstructured account, sign in by using the URL of the sign in page that Unstructured provided to you when your Unstructured account was created. After you sign in, the Unstructured user interface (UI) then appears, and you can start using it right away. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.

Set the source (input) location

From your Unstructured dashboard, in the sidebar, click Connectors.
Click Sources.
Cick New or Create Connector.
For Name, enter some unique name for this connector.
In the Provider area, click the source location type that matches yours.
Click Continue.
Fill in the fields with the appropriate settings. Learn more.
If a Continue button appears, click it, and fill in any additional settings fields.
Click Save and Test.

Set the destination (output) location

In the sidebar, click Connectors.
Click Destinations.
Cick New or Create Connector.
For Name, enter some unique name for this connector.
In the Provider area, click the destination location type that matches yours.
Click Continue.
Fill in the fields with the appropriate settings. Learn more.
If a Continue button appears, click it, and fill in any additional settings fields.
Click Save and Test.

Define the workflow

In the sidebar, click Workflows.
Click New Workflow.
Next to Build it for Me, click Create Workflow.
If a radio button appears instead of Build it for Me, select it, and then click Continue.
For Workflow Name, enter some unique name for this workflow.
In the Sources dropdown list, select your source location from Step 3.
In the Destinations dropdown list, select your destination location from Step 4.
You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations.
Click Continue.
The Reprocess All box applies only to blob storage connectors such as the Amazon S3, Azure Blob Storage, and Google Cloud Storage connectors:
- Checking this box reprocesses all documents in the source location on every workflow run.
- Unchecking this box causes only new documents that are added to the source location, or existing documents that are updated in the source location, since the last workflow run to be processed on future runs. Previously processed documents are not processed again. However:
  - Even if this box is unchecked, a renamed file is always treated as a new file, regardless of whether the file’s original contents have changed.
  - Even if this box is unchecked, a file that is removed but is added back later with the same file name is processed on future runs only if the file’s contents have changed since the file was originally processed.
Click Continue.
If you want this workflow to run on a schedule, in the Repeat Run dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select Don’t repeat.
Click Complete.

Process the documents

If you did not choose to run this workflow on a schedule in Step 5, you can run the workflow now: on the sidebar, click Workflows.
Next to your workflow from Step 5, click Run.

Monitor the processing job

In the sidebar, click Jobs.
In the list of jobs, wait for the job’s Status to change to Finished.
Click the row for the job.
After Overview displays Finished, go to the next Step.

View the processed data

Go to your destination location to view the processed data.

Unstructured UI

Getting started with the UI

Using the UI

Concepts

Unstructured UI quickstarts

Local quickstart

Remote quickstart

Unstructured UI

Getting started with the UI

Using the UI

Concepts

​Local quickstart

​Remote quickstart

Local quickstart

Remote quickstart