This quickstart uses a no-code, point-and-click user interface in your web browser to get all of your data RAG-ready. Data is processed on Unstructured-hosted compute resources.

The requirements are as follows.

1

Sign up

To sign up for the Unstructured Platform, go to the For Developers page and enter your information.

By signing up through the For Developers page, your Unstructured account will run within the context of the Unstructured Platform on Unstructured’s own hosted cloud resources. Also, after your first 14 days of usage or more than 1000 processed pages per day, whichever comes first, your account is then billed at Unstructured’s standard service usage rates. You can always start a prepaid subscription in exchange for usage rate discounts. To switch your account from a pay-as-you-go plan to a prepaid subscription, contact Unstructured Sales at sales@unstructured.io.

If you would rather run the Unstructured Platform within the context of your own virtual private cloud (VPC), or you want to make a long-term billing commitment in exchange for deeply discounted service usage rates, stop here and sign up through the For Enterprise page instead.

2

Sign in

  1. After you have signed up through the For Developers page, the Unstructured Platform sign-in page appears.

    If you signed up through the For Enterprise page instead, your sign-in process will be different. For enterprise sign-in guidance, contact Unstructured Sales at sales@unstructured.io.

  2. Click Google or GitHub to sign in with the Google or GitHub account that you signed up with through the For Developers page. Or, enter the email address that you signed up with, and then click Sign In.

  3. If you entered your email address, check your email inbox for a message from Unstructured. In that email, click the Sign In link.

  4. The first time you sign in, read the terms and conditions, and then click Accept.

After you have signed in through the For Developers page for the first time, you can sign in the second time and beyond by going to the Unstructured home page at https://unstructured.io and clicking Login.

For enterprise sign-in guidance, contact Unstructured Sales at sales@unstructured.io.

3

Set the source (input) location

  1. From your Unstructured Platform dashboard, in the sidebar, click Connectors.
  2. Click Sources.
  3. Cick New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the source location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
4

Set the destination (output) location

  1. In the sidebar, click Connectors.
  2. Click Destinations.
  3. Cick New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the destination location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
5

Define the workflow

  1. In the sidebar, click Workflows.

  2. Click New Workflow.

  3. Next to Build it with me, click Create Workflow.

    If a radio button appears instead of Build it with me, select it, and then click Continue.
  4. For Workflow Name, enter some unique name for this workflow.

  5. In the Sources dropdown list, select your source location from Step 3.

  6. In the Destinations dropdown list, select your destination location from Step 4.

    You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations.
  7. Click Continue.

  8. In the Optimize for section, select the option to choose one of these preconfigured workflow settings groups:

    • Basic: Ideal for simple, text-only documents.

    • Advanced: Best for PDFs, images, and complex file types.

    • Platinum: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs). During processing, files that are not PDFs or images are processed by using the Advanced strategy and are charged at the Advanced rate instead.

      When you use the Platinum strategy for PDF files of 200 or more pages, you might notice some errors when these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images.

  9. The Reprocess all box applies only to the Amazon S3 and Azure Blob Storage source connectors:

    • Checking this box reprocesses all documents in the source location on every workflow run.
    • Unchecking this box causes only new documents that are added to the source location since the last workflow run to be processed on future runs. Previously processed documents are not processed again, even if those documents’ contents change.
  10. If you want to retry processing any documents that failed to process, check the Retry Failed Documents box.

  11. Click Continue.

  12. If you want this workflow to run on a schedule, in the Repeat Run dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select Don’t repeat.

  13. Click Complete.

6

Process the documents

  1. If you did not choose to run this workflow on a schedule in Step 5, you can run the workflow now: on the sidebar, click Workflows.
  2. Next to your workflow from Step 5, click Run.
7

Monitor the processing job

  1. In the sidebar, click Jobs.
  2. In the list of jobs, wait for the job’s Status to change to Finished.
  3. Click the row for the job.
  4. If Overview displays Success, go to the next Step.
8

View the processed data

Go to your destination location to view the processed data.