Local file quickstart

The following quickstart shows you how to use the Unstructured UI to process a single file that is stored on your local machine. This approach is ideal for rapid testing and prototyping of Unstructured ETL+ workflows, across the full range of Unstructured features, against a single representative file. To process files (or data) in remote locations instead, skip ahead to the remote quickstart.
  1. If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io.
    To sign up for a Team or Enterprise account instead, contact Unstructured Sales, or learn more.
  2. If you have an Unstructured Starter or Team account and are not already signed in, sign in to your account at https://platform.unstructured.io.
    For an Enterprise account, see your Unstructured account administrator for instructions, or email Unstructured Support at support@unstructured.io.
  1. After you sign in, watch the following 2-minute video:
Learn more about Unstructured source connectors, destination connectors, workflows, jobs, and managing your account.

Remote quickstart

The following quickstart shows you how to use the Unstructured UI to process remote files (or data). The requirements are as follows.
1

Sign up and sign in

  1. If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io.
    To sign up for a Team or Enterprise account instead, contact Unstructured Sales, or learn more.
  2. If you have an Unstructured Starter or Team account and are not already signed in, sign in to your account at https://platform.unstructured.io.
    For an Enterprise account, see your Unstructured account administrator for instructions, or email Unstructured Support at support@unstructured.io.
2

Set the source (input) location

Sources in the sidebar
  1. From your Unstructured dashboard, in the sidebar, click Connectors.
  2. Click Sources.
  3. Cick New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the source location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
3

Set the destination (output) location

Destinations in the sidebar
  1. In the sidebar, click Connectors.
  2. Click Destinations.
  3. Cick New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the destination location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
4

Define the workflow

Workflows in the sidebar
  1. In the sidebar, click Workflows.
  2. Click New Workflow.
  3. Next to Build it for Me, click Create Workflow.
    If a radio button appears instead of Build it for Me, select it, and then click Continue.
  4. For Workflow Name, enter some unique name for this workflow.
  5. In the Sources dropdown list, select your source location from Step 3.
  6. In the Destinations dropdown list, select your destination location from Step 4.
    You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations.
  7. Click Continue.
  8. The Reprocess All box applies only to blob storage connectors such as the Amazon S3, Azure Blob Storage, and Google Cloud Storage connectors:
    • Checking this box reprocesses all documents in the source location on every workflow run.
    • Unchecking this box causes only new documents that are added to the source location, or existing documents that are updated in the source location, since the last workflow run to be processed on future runs. Previously processed documents are not processed again. However:
      • Even if this box is unchecked, a renamed file is always treated as a new file, regardless of whether the file’s original contents have changed.
      • Even if this box is unchecked, a file that is removed but is added back later with the same file name is processed on future runs only if the file’s contents have changed since the file was originally processed.
  9. Click Continue.
  10. If you want this workflow to run on a schedule, in the Repeat Run dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select Don’t repeat.
  11. Click Complete.
5

Process the documents

Workflows in the sidebar
  1. If you did not choose to run this workflow on a schedule in Step 5, you can run the workflow now: on the sidebar, click Workflows.
  2. Next to your workflow from Step 5, click Run.
6

Monitor the processing job

Select a jobCompleted job
  1. In the sidebar, click Jobs.
  2. In the list of jobs, wait for the job’s Status to change to Finished.
  3. Click the row for the job.
  4. After Overview displays Finished, go to the next Step.
7

View the processed data

Go to your destination location to view the processed data.
Learn more about Unstructured source connectors, destination connectors, workflows, jobs, and managing your account.