Skip to main content

Local file quickstart

This quickstart shows how, in just a few minutes, you can use the Unstructured user interface (UI) to quickly and easily see Unstructured’s best-in-class transformation results for a single file that is stored on your local computer.
This quickstart focuses on a single, local file for ease-of-use demonstration purposes.To use Unstructured later to do large-scale batch processing of multiple files and semi-structured data that are stored in remote locations, skip over to the remote quickstart after you finish this one.
If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io. Do the following:
  1. After you are signed in, the Start page appears.
  2. In the Welcome area, do one of the following:
    • Click one of the sample files, such as realestate.pdf, to have Unstructured parse and transform that sample file.
    • Click Browse files, and then browse to and select one of your own files, to have Unstructured parse and transform it.
      If you choose to use your own file, the file must be 10 MB or less in size.
    Welcome interface on the Start page
  3. After Unstructured has finished parsing and transforming the file (a process known as partitioning), you will see the file’s contents in the Preview pane in the center and Unstructured’s results in the Result pane on the right. Unstructured's parse and transform results
  4. The Result pane shows a formatted view of Unstructured’s results by default. This formatted view is designed for human readability. To see the underlying JSON view of the results, which is designed for RAG and agentic AI, click JSON at the top of the Result pane. Learn about what’s in the JSON view. Switching to the JSON view of the results
  5. Unstructured’s initial results are based on its High Res partitioning strategy, which begins processing the file’s contents and converting these contents into a series of Unstructured document elements and metadata. This partitioning strategy provides good results overall, depending on the complexity of the file’s contents. This partioning strategy also generates a bounding box for each detected object in the file. A bounding box is an imaginary rectangular box drawn around the object to show its location and extent within the file. After the High Res partitioning results are shown, Unstructured begins improving these initial results by using vision language models (VLMs) to apply a series of generative refinements known as enrichments. These enrichments include:
    • An image description enrichment, which uses a VLM to provide a text-based summary of the contents of the each detected image.
    • A generative OCR enrichment, which uses a VLM to improve the accuracy of each block of initially-processed text.
    • A table to HTML enrichment, which uses a VLM to provide an HTML-structured representation of each detected table.
    While these enrichments are being applied, a banner appears at the top of the Result pane. Updating the initial results with enrichments To see these enrichments applied to the initial results, click Update results in the banner as soon as this button appears, which might take up to a minute or more. Seeing the initial results udpated with the enrichments
    Each page that Unstructured processes by using this approach is counted as two pages for usage and billing purposes.This is because Unstructured processes each page once with its High Res partitioning strategy and then reprocessess each page with a VLM to improve the quality, accuracy, and relevance of the initial partitioning results. The final results of these two processing passes for each page count as two pages for usage and billing purposes. This two-pass process happens regardless of whether you click Update results in the banner.This two-page usage and billing behavior is a known issue and will be addressed in a future release.
  6. To synchronize the scrolling of the Preview pane’s selected contents with the Result pane’s Formatted results, rest your mouse pointer anywhere inside the contents of the Preview pane until a bounding box appears. Then click the bounding box. Unstructured automatically scrolls the Result pane’s Formatted results to match the selected bounding box. (You cannot synchronize the scrolling of the JSON results.) Selecting a bounding box To show all of the bounding boxes in the Preview pane at once, turn on the Show all bounding boxes toggle at the top of the Preview pane. You can now click any of the bounding boxes without first needing to rest your mouse pointer on them to show them. Showing all bounding boxes
You can also do the following:
  • To download the JSON view of the results as a local JSON file, click the download icon to the left of the Formatted and JSON buttons in the Result pane. (You cannot download the formatted view of the results.) Downloading the results as a local JSON file
  • To have Unstructured partition a different file, click Add new file in the Files pane on the left, and then browse to and select the target file.
  • To view the results for a file that was previously partitioned during this session, click the file’s name in the Recent files list in the Files pane.
  • To return to the Start page, click the X (close) button at the left on the title bar, next to Transform.
  • To have Unstructured do more—such as chunking, embedding, applying additional kinds of enrichments, and processing larger files and semi-structured data in batches at scale—click Edit in Workflow Editor at the right on the title bar, and then skip over to the walkthrough. Switching to the workflow editor
  Learn how to add chunking, embeddings, and additional enrichments to your results.   Learn more about the Unstructured user interface.

Remote quickstart

The following quickstart shows you how to use the Unstructured UI to process remote files (or data). The requirements are as follows.
  • A compatible source (input) location that contains your data for Unstructured to process. See the list of supported source types. If your source (input) location is not in this list, or if you do not yet have any source locations for Unstructured to process, stop here and skip over to the Dropbox source connector quickstart instead. This quickstart guides you through the process of creating a free Dropbox account, uploading your files to Dropbox, and creating a source connector to connect Unstructured to those files.
  • For document-based source locations, compatible files in that location. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
  • A compatible destination (output) location for Unstructured to put the processed data. See the list of supported destination types. If your destination (output) location is not in this list, or if you do not yet have any destination locations for Unstructured to send its processed data, stop here and skip over to the Pinecone destination connector quickstart instead. This quickstart guides you through the process of creating a free Pinecone account and creating a destination connector to connect Unstructured to a Pinecone dense serverless index within your Pinecone account.
1

Sign up and sign in

  1. If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io.
    To sign up for a Team or Enterprise account instead, contact Unstructured Sales, or learn more.
  2. If you have an Unstructured Starter or Team account and are not already signed in, sign in to your account at https://platform.unstructured.io.
    For an Enterprise account, see your Unstructured account administrator for instructions, or email Unstructured Support at support@unstructured.io.
2

Set the source (input) location

Sources in the sidebar
  1. From your Unstructured dashboard, in the sidebar, click Connectors.
  2. Click Sources.
  3. Click New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the source location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
3

Set the destination (output) location

Destinations in the sidebar
  1. In the sidebar, click Connectors.
  2. Click Destinations.
  3. Click New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the destination location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
4

Define the workflow

Workflows in the sidebar
  1. In the sidebar, click Workflows.
  2. Click New Workflow.
  3. Next to Build it for Me, click Create Workflow.
    If a radio button appears instead of Build it for Me, select it, and then click Continue.
  4. For Workflow Name, enter some unique name for this workflow.
  5. In the Sources dropdown list, select your source location from Step 3.
  6. In the Destinations dropdown list, select your destination location from Step 4.
    You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations.
  7. Click Continue.
  8. The Reprocess All box applies only to blob storage connectors such as the Amazon S3, Azure Blob Storage, and Google Cloud Storage connectors:
    • Checking this box reprocesses all documents in the source location on every workflow run.
    • Unchecking this box causes only new documents that are added to the source location, or existing documents that are updated in the source location, since the last workflow run to be processed on future runs. Previously processed documents are not processed again. However:
      • Even if this box is unchecked, a renamed file is always treated as a new file, regardless of whether the file’s original contents have changed.
      • Even if this box is unchecked, a file that is removed but is added back later with the same file name is processed on future runs only if the file’s contents have changed since the file was originally processed.
  9. Click Continue.
  10. If you want this workflow to run on a schedule, in the Repeat Run dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select Don’t repeat.
  11. Click Complete.
5

Process the documents

Workflows in the sidebar
  1. If you did not choose to run this workflow on a schedule in Step 5, you can run the workflow now: on the sidebar, click Workflows.
  2. Next to your workflow from Step 5, click Run.
6

Monitor the processing job

Select a jobCompleted job
  1. In the sidebar, click Jobs.
  2. In the list of jobs, wait for the job’s Status to change to Finished.
  3. Click the row for the job.
  4. After Overview displays Finished, go to the next Step.
7

View the processed data

Go to your destination location to view the processed data.
Learn more about Unstructured source connectors, destination connectors, workflows, jobs, and managing your account.