Unstructured
Unstructured provides a platform and tools to ingest and process unstructured documents for Retrieval Augmented Generation (RAG) and model fine-tuning.
Product offerings
Unstructured offers three products:
Unstructured Platform - No-code UI. Production-ready. Pay as you go.Unstructured Serverless API Services - Use scripts or code. Production-ready. Pay as you go. (There is also a non-production, free edition with limits.)
Unstructured open source library - Use scripts or code. Not production-ready. Limited.
Learn more about these products:
Unstructured Platform
No-code user interface, pay-as-you-go platform to get all of your data RAG-ready.
Data is processed on Unstructured-hosted compute resources.
Try the quickstart.
Learn more.
Unstructured Serverless API Services
Use scripts or code to call the Unstructured CLI, SDKs, or REST API to get all of your data RAG-ready.
Unstructured Serverless API Services have a Serverless pay-as-you-go edition and a Free limited edition that process data on Unstructured-hosted compute resources.
If you need to use compute resources that you host instead, there are also Azure pay-as-you-go and AWS pay-as-you-go editions; these editions process data by using the Unstructured API installed on compute resources hosted in your own Azure or AWS account.
Try the quickstart.
Learn more.
Unstructured open source library
Recommended only for rapid local script or code prototyping or simple proofs-of-concept. It is not designed for production scenarios.
Data processing is done only on your local machine and only with local compute resources, and there are no charges. However, features and performance are limited compared to the Platform and API service products.
Try the quickstart.
Learn more.
Quickstart: Unstructured Platform
This quickstart uses a no-code, point-and-click user interface in your web browser to get all of your data RAG-ready. Data is processed on Unstructured-hosted compute resources.
You will need:
- A compatible source (input) location in cloud storage that contains your documents for Unstructured to process. See the list of supported source types.
- Compatible files in your source location. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
- A compatible destination (output) location in cloud storage for Unstructured to put the processed data. See the list of supported destination types.
Sign up
Sign in
Use the sign in URL, username, and temporary password in the welcome email that Unstructured sends you.
Set the source (input) location
- In the sidebar, click Sources.
- Click New Source.
- In the Type dropdown list, select the source location type that matches yours.
- Fill in the rest of the fields with the appropriate settings. Learn more.
- Click Test connection.
- Click Submit.
Set the destination (output) location
- In the sidebar, click Destinations.
- Click New Destination.
- In the Type dropdown list, select the destination location type that matches yours.
- Fill in the rest of the fields with the appropriate settings. Learn more.
- Click Test connection.
- Click Submit.
Process the documents
- In the sidebar, click Jobs.
- Click Run Job.
- In the Select a Workflow or create a new one dropdown list, select New.
- In the Sources dropdown list, select your source location from Step 3.
- In the Destination dropdown list, select your destination location from Step 4.
- Click Run.
Monitor the processing job
- In the list of Jobs, click the Workflow link for your New job.
- When the Status shows JOB FINISHED, go to the next Step.
View the processed data
Go to your destination location to view the processed data.
Learn more about the Unstructured Platform.
Quickstart: Unstructured API service
This quickstart uses your local machine for the source (input) and destination (output) locations, and the Free Unstructured API edition. Data is processed on Unstructured-hosted compute resources.
You will need:
- Python installed on your local machine.
- Compatible files on your local machine to be processed. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
Sign up
Get your API key and API URL
- Get your Unstructured API key from the welcome email that Unstructured sends you. Store your API key in a secure location. Do not share it with others.
- For this quickstart, your Unstructured API URL is an empty string.
Set enviromnent variables
- Set an environment variable named
UNSTRUCTURED_API_KEY
to the value of your Unstructured API key. - Set another environment variable named
UNSTRUCTURED_API_URL
to an empty string.To learn how to set environment variables, see your operating system’s documentation.Setting the environment variable namedUNSTRUCTURED_API_URL
to an empty string makes your code forward-compatible if you later upgrade to the Unstructured Serverless API, which requires an API URL instead of an empty string.
Install the API library
Run the following command:
pip install "unstructured[all-docs]"
Run the code
Run the following command, replacing:
<path/to/input>
with the source (input) path on your local machine that contains the compatible files for Unstructured to process on its hosted compute resources.<path/to/output>
with the destination (output) path on your local machine that will contain the processed data that Unstructured returns from its hosted compute resources.
unstructured-ingest \
local \
--input-path <path/to/input> \
--output-dir <path/to/output> \
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL
View the processed data
Go to your destination location to view the processed data.
Learn more about the Unstructured Serverless API.
Quickstart: Unstructured open source library
This quickstart uses your local machine for the source (input) and destination (output) locations and for local data processing. It does not call Unstructured Serverless API Services.
You will need:
- Python installed on your local machine.
- Compatible files on your local machine to be processed. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
Install the open source library
Run the following command:
pip install "unstructured[all-docs]"
Run the code
Run the following command, replacing:
<path/to/input>
with the source (input) path on your local machine that contains the compatible files to process.<path/to/output>
with the destination (output) path on your local machine that will contain the processed data.
unstructured-ingest \
local \
--input-path <path/to/input> \
--output-dir <path/to/output>
View the processed data
Go to your destination location to view the processed data.
Learn more about the Unstructured open source library.
Get in touch
If you don’t find the information you’re looking for in the documentation, or require assistance, get in touch with our Support team at support@unstructured.io, or join our Slack where our team and community can help you.
Was this page helpful?