Overview
The Unstructured Platform UI features a no-code user interface for transforming your unstructured data into data that is ready for Retrieval Augmented Generation (RAG).
The Unstructured Platform Workflow Endpoint, part of the Unstructured Platform API, enables a full range of partitioning, chunking, embedding, and enrichment options for your files and data. It is designed to batch-process files and data in remote locations; send processed results to various storage, databases, and vector stores; and use the latest and highest-performing models on the market today. It has built-in logic to deliver the highest quality results at the lowest cost.
This page provides an overview of the Unstructured Platform Workflow Endpoint. This endpoint enables Unstructured Platform UI automation usage scenarios as well as for documentation, reporting, and recovery needs.
Getting started
Choose one of the following options to get started with the Unstructured Platform Workflow Endpoint:
- Follow the quickstart, which uses the Unstructured Python SDK from a remote hosted Google Collab notebook.
- Start using the Unstructred Python SDK.
- Start using a REST client, such as
curl
or Postman.
Quickstart
This quickstart uses the Unstructured Python SDK to call the Unstructured Platform Workflow Endpoint to get your data RAG-ready. The Python code for this quickstart is in a remote hosted Google Collab notebook. Data is processed on Unstructured-hosted compute resources.
The requirements are as follows:
- A compatible source (input) location that contains your data for Unstructured to process. See the list of supported source types. This quickstart uses an Amazon S3 bucket as the source location. If you use a different source type, you will need to modify the quickstart notebook accordingly.
- For document-based source locations, compatible files in that location. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the
Unstructured-IO/unstructured-ingest
repository in GitHub. - A compatible destination (output) location for Unstructured to put the processed data. See the list of supported destination types. For this quickstart’s destination location, a different folder in the same Amazon S3 bucket as the source location is used. If you use a different destination S3 bucket or a different destination type, you will need to modify the quickstart notebook accordingly.
Sign up
To sign up for the Unstructured Platform, go to the For Developers page and choose one of the following plans:
- Sign up for a pay-per-page plan.
- Save money by signing up for a subscribe-and-save plan instead.
If you’re not sure which plan to sign up for, start with a pay-per-page plan. You can always switch plans later.
If you choose a pay-per-page plan, after your first 14 days of usage or more than 1000 processed pages per day, whichever comes first, your account is then billed at Unstructured’s standard service usage rates. To keep using the service, you must provide Unstructured with your payment details.
To save money by switching from a pay-per-page to a subscribe-and-save plan, go to the Unstructured Subscribe & Save page and complete the on-screen instructions. To save even more money by making a long-term billing commitment, stop here and sign up through the For Enterprise page instead.
By signing up for a pay-per-page or subscribe-and-save plan, your Unstructured account will run within the context of the Unstructured Platform on Unstructured’s own hosted cloud resources. If you would rather run the Unstructured Platform within the context of your own virtual private cloud (VPC), stop here and sign up through the For Enterprise page instead.
Sign in
If you initially signed up for a subscribe-and-save plan instead of a pay-per-page plan, wait to complete this step until after you receive confirmation from Unstructured that your plan is activated. Then go to the Unstructured home page at https://unstructured.io and click Login.
If you signed up through the For Enterprise page instead, your sign-in process will be different. For enterprise sign-in guidance, contact Unstructured Sales at sales@unstructured.io.
-
After you have signed up for a pay-per-page plan, the Unstructured Platform sign-in page appears.
-
Click Google or GitHub to sign in with the Google or GitHub account that you signed up with. Or, enter the email address that you signed up with, and then click Sign In.
-
If you entered your email address, check your email inbox for a message from Unstructured. In that email, click the Sign In link.
-
The first time you sign in, read the terms and conditions, and then click Accept.
After you have signed in for the first time, you can sign in the second time and beyond by going to the Unstructured home page at https://unstructured.io and clicking Login.
For enterprise sign-in guidance, contact Unstructured Sales at sales@unstructured.io.
Get your API key
- Sign in to your Unstructured account, at https://platform.unstructured.io.
- At the bottom of the sidebar, click your user icon, and then click Account Settings.
- On the API Keys tab, click Generate New Key.
- Enter some descriptive name for the API key, and then click Save.
- Click the Copy icon for your new API key. The API key’s value is copied to your system’s clipboard.
Create and set up the S3 bucket
This quickstart uses an Amazon S3 bucket as both the source location and the destination location. (You can use other source and destination types that are supported by Unstructured. If you use a different source or destination type, or if you use a different S3 bucket for the destination location, you will need to modify the quickstart notebook accordingly.)
Inside of the S3 bucket, a folder named input
represents the
source location. This is where your files to be processed will be stored.
The S3 URI to the source location will be s3://<your-bucket-name>/input
.
Inside of the same S3 bucket, a folder inside named output
represents the destination location. This
is where Unstructured will put the processed data.
The S3 URI to the destination location will be s3://<your-bucket-name>/output
.
Learn how to create an S3 bucket and set it up for Unstructured. (Do not run the Python SDK code or REST commands at the end of those setup instructions.)
Run the quickstart notebook
After your S3 bucket is created and set up, follow the instructions in this quickstart notebook.
View the processed data
After you run the quickstart notebook, go to your destination location to view the processed data.
Unstructured Python SDK
The Unstructured Python SDK, beginning with version 0.30.6, allows you to call the Unstructured Platform Workflow Endpoint through standard Python code.
To install the Unstructured Python SDK, run the following command from within your Python virtual environment:
If you already have the Unstructured Python SDK installed, upgrade to at least version 0.30.6 by running the following command instead:
The Unstructured Python SDK code examples, shown later on this page and on related pages, use the following environment variable, which you can set as follows:
This environment variable enables you to more easily run the following Unstructured Python SDK examples and help prevent you from storing scripts that contain sensitive API keys in public source code repositories.
To get your Unstructured API key, do the following:
- Sign in to your Unstructured account, at https://platform.unstructured.io.
- At the bottom of the sidebar, click your user icon, and then click Account Settings.
- On the API Keys tab, click Generate New Key.
- Enter some descriptive name for the API key, and then click Save.
- Click the Copy icon for your new API key. The API key’s value is copied to your system’s clipboard.
Calls made by the Unstructured Python SDK’s unstructured_client
functions for creating, listing, updating,
and deleting connectors, workflows, and jobs in the Unstructured Platform UI all use the Unstructured Platform Workflow Endpoint URL (https://platform.unstructuredapp.io/api/v1
) by default. You do not need to
use the server_url
parameter to specify this API URL in your Python code for these particular functions.
If you signed up through the For Enterprise page, your API URL and API key creation guidance
might be different. For guidance, email Unstructured Sales at sales@unstructured.io.
If your API URL is different, be sure to substitute https://platform.unstructuredapp.io/api/v1
for your
API URL throughout the following examples.
To specify an API URL in your code, set the server_url
parameter in the UnstructuredClient
constructor to the target API URL.
The Unstructured Platform Workflow Endpoint enables you to work with connectors, workflows, and jobs in the Unstructured Platform UI.
- A source connector ingests files or data into Unstructured from a source location.
- A destination connector sends the processed data from Unstructured to a destination location.
- A workflow defines how Unstructured will process the data.
- A job runs a workflow at a specific point in time.
For general information about these objects, see:
Skip ahead to start learning about how to use the Unstructured Python SDK to work with connectors, workflows, and jobs programmatically.
REST endpoints
The Unstructured Platform Workflow Endpoint is callable from a set of Representational State Transfer (REST) endpoints, which you can call through standard REST-enabled
utilities, tools, programming languages, packages, and libraries. The examples, shown later on this page and on related pages, describe how to call the Unstructured Platform Workflow Endpoint with
curl
and Postman. You can adapt this information as needed for your preferred programming languages and libraries, for example by using the
requests
library with Python.
You can also use the Unstructured Platform Workflow Endpoint - Swagger UI to call the REST endpoints
that are available through https://platform.unstructuredapp.io
. To use the Swagger UI, you must provide your Unstructured API key with each call. To
get this API key, see the quickstart, earlier on this page.
curl and Postman
The following curl
examples use the following environment variables, which you can set as follows:
These environment variables enable you to more easily run the following curl
examples and help prevent
you from storing scripts that contain sensitive URLs and API keys in public source code repositories.
To get your Unstructured API key, do the following:
- Sign in to your Unstructured account, at https://platform.unstructured.io.
- At the bottom of the sidebar, click your user icon, and then click Account Settings.
- On the API Keys tab, click Generate New Key.
- Enter some descriptive name for the API key, and then click Save.
- Click the Copy icon for your new API key. The API key’s value is copied to your system’s clipboard.
If you signed up through the For Enterprise page, your API URL and API key creation guidance
might be different. For guidance, email Unstructured Sales at sales@unstructured.io.
If your API URL is different, be sure to substitute https://platform.unstructuredapp.io/api/v1
for your
API URL throughout the following examples.
The following Postman examples use variables, which you can set as follows:
-
In Postman, on your workspace’s sidebar, click Environments.
-
Click Globals.
-
Create two global variables with the following settings:
- Variable:
UNSTRUCTURED_API_URL
- Type:
default
- Initial value:
https://platform.unstructuredapp.io/api/v1
- Current value:
https://platform.unstructuredapp.io/api/v1
- Variable:
UNSTRUCTURED_API_KEY
- Type:
secret
- Initial value:
<your-unstructured-api-key>
- Current value:
<your-unstructured-api-key>
- Variable:
-
Click Save.
These variables enable you to more easily run the following examples in Postman and help prevent you from storing Postman collections that contain sensitive URLs and API keys in public source code repositories.
To get your Unstructured API key, do the following:
- Sign in to your Unstructured account, at https://platform.unstructured.io.
- At the bottom of the sidebar, click your user icon, and then click Account Settings.
- On the API Keys tab, click Generate New Key.
- Enter some descriptive name for the API key, and then click Save.
- Click the Copy icon for your new API key. The API key’s value is copied to your system’s clipboard.
If you signed up through the For Enterprise page, your API URL and API key creation guidance
might be different. For guidance, email Unstructured Sales at sales@unstructured.io.
If your API URL is different, be sure to substitute https://platform.unstructuredapp.io/api/v1
for your
API URL throughout the following examples.
The Unstructured Platform Workflow Endpoint enables you to work with connectors, workflows, and jobs in the Unstructured Platform UI.
- A source connector ingests files or data into Unstructured from a source location.
- A destination connector sends the processed data from Unstructured to a destination location.
- A workflow defines how Unstructured will process the data.
- A job runs a workflow at a specific point in time.
For general information about these objects, see:
Skip ahead to start learning about how to use the REST endpoints to work with connectors, workflows, and jobs programmatically.
Restrictions
The following Unstructured SDKs, tools, and libraries do not work with the Unstructured Platform Workflow Endpoint:
- The Unstructured JavaScript/TypeScript SDK
- Local single-file POST requests to the Unstructured Platform Partition Endpoint
- The Unstructured open source Python library
- The Unstructued Ingest CLI
- The Unstructured Ingest Python library
The following Unstructured API URL is also not supported: https://api.unstructuredapp.io/general/v0/general
(the Unstructured Platform Partition Endpoint URL).
Connectors
You can list, get, create, update, and delete source connectors. You can also list, get, create, update, and delete destination connectors.
For general information, see Connectors.
List source connectors
To list source connectors, use the UnstructuredClient
object’s sources.list_sources
function (for the Python SDK) or
the GET
method to call the /sources
endpoint (for curl
or Postman).
To filter the list of source connectors, use the ListSourcesRequest
object’s source_type
parameter (for the Python SDK)
or the query parameter source_type=<type>
(for curl
or Postman),
replacing <type>
with the source connector type’s unique ID
(for example, s3
for the Amazon S3 source connector type).
To get this ID, see Sources.
Get a source connector
To get information about a source connector, use the UnstructuredClient
object’s sources.get_source
function (for the Python SDK) or
the GET
method to call the /sources/<connector-id>
endpoint (for curl
or Postman), replacing
<connector-id>
with the source connector’s unique ID. To get this ID, see List source connectors.
Create a source connector
To create a source connector, use the UnstructuredClient
object’s sources.create_source
function (for the Python SDK) or
the POST
method to call the /sources
endpoint (for curl
or Postman).
In the CreateSourceConnector
object (for the Python SDK) or
the request body (for curl
or Postman),
specify the settings for the connector. For the specific settings to include, which differ by connector, see
Sources.
Update a source connector
To update information about a source connector, use the UnstructuredClient
object’s sources.update_source
function (for the Python SDK) or
the PUT
method to call the /sources/<connector-id>
endpoint (for curl
or Postman), replacing
<connector-id>
with the source connector’s unique ID. To get this ID, see List source connectors.
In the UpdateSourceConnector
object (for the Python SDK) or
the request body (for curl
or Postman), specify the settings for the connector. For the specific settings to include, which differ by connector, see
Sources.
You must specify all of the settings for the connector, even for settings that are not changing.
You can change any of the connector’s settings except for its name
and type
.
Delete a source connector
To delete a source connector, use the UnstructuredClient
object’s sources.delete_source
function (for the Python SDK) or
the DELETE
method to call the /sources/<connector-id>
endpoint (for curl
or Postman), replacing
<connector-id>
with the source connector’s unique ID. To get this ID, see List source connectors.
List destination connectors
To list destination connectors, use the UnstructuredClient
object’s destinations.list_destinations
function (for the Python SDK) or
the GET
method to call the /destinations
endpoint (for curl
or Postman).
To filter the list of destination connectors, use the ListDestinationsRequest
object’s destination_type
parameter (for the Python SDK) or
the query parameter destination_type=<type>
(for curl
or Postman),
replacing <type>
with the destination connector type’s unique ID
(for example, s3
for the Amazon S3 destination connector type).
To get this ID, see Destinations.
Get a destination connector
To get information about a destination connector, use the UnstructuredClient
object’s destinations.get_destination
function (for the Python SDK) or
the GET
method to call the /destinations/<connector-id>
endpoint (for curl
or Postman), replacing
<connector-id>
with the destination connector’s unique ID. To get this ID, see List destination connectors.
Create a destination connector
To create a destination connectors, use the UnstructuredClient
object’s destinations.create_destination
function (for the Python SDK) or
the POST
method to call the /destinations
endpoint (for curl
or Postman).
In the CreateDestinationConnector
object (for the Python SDK) or
the request body (for curl
or Postman),
specify the settings for the connector. For the specific settings to include, which differ by connector, see
Destinations.
Update a destination connector
To update information about a destination connector, use the UnstructuredClient
object’s destinations.update_destination
function (for the Python SDK) or
the PUT
method to call the /destinations/<connector-id>
endpoint (for curl
or Postman), replacing
<connector-id>
with the destination connector’s unique ID. To get this ID, see List destination connectors.
In the UpdateDestinationConnector
object (for the Python SDK) or
the request body (for curl
or Postman), specify the settings for the connector. For the specific settings to include, which differ by connector, see
Destinations.
You must specify all of the settings for the connector, even for settings that are not changing.
You can change any of the connector’s settings except for its name
and type
.
Delete a destination connector
To delete a destination connector, use the UnstructuredClient
object’s destinations.delete_destination
function (for the Python SDK) or
the DELETE
method to call the /destinations/<connector-id>
endpoint (for curl
or Postman), replacing
<connector-id>
with the destination connector’s unique ID. To get this ID, see List destination connectors.
Workflows
You can list, get, create, run, update, and delete workflows.
For general information, see Workflows.
List workflows
To list workflows, use the UnstructuredClient
object’s workflows.list_workflows
function (for the Python SDK) or
the GET
method to call the /workflows
endpoint (for curl
or Postman).
To filter the list of workflows, use one or more of the following ListWorkflowsRequest
parameters (for the Python SDK) or
query parameters (for curl
or Postman):
source_id=<connector-id>
, replacing<connector-id>
with the source connector’s unique ID. To get this ID, see List source connectors.destination_id=<connector-id>
, replacing<connector-id>
with the destination connector’s unique ID. To get this ID, see List destination connectors.status=<status>
, replacing<status>
with one of the following workflow statuses:active
orinactive
.
You can specify multiple query parameters, for example ?source_id=<connector-id>&status=<status>
.
Get a workflow
To get information about a workflow, use the UnstructuredClient
object’s workflows.get_workflow
function (for the Python SDK) or
the GET
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman), replacing
<workflow-id>
with the workflow’s unique ID. To get this ID, see List workflows.
Create a workflow
To create a workflow, use the UnstructuredClient
object’s workflows.create_workflow
function (for the Python SDK) or
the POST
method to call the /workflows
endpoint (for curl
or Postman).
In the CreateWorkflow
object (for the Python SDK) or
the request body (for curl
or Postman),
specify the settings for the workflow. For the specific settings to include, see
Create a workflow.
Run a workflow
To run a workflow manually, use the UnstructuredClient
object’s workflows.run_workflow
function (for the Python SDK) or
the POST
method to call the /workflows/<workflow-id>/run
endpoint (for curl
or Postman), replacing
<workflow-id>
with the workflow’s unique ID. To get this ID, see List workflows.
To run a workflow on a schedule instead, specify the schedule
setting in the request body when you create or update a
workflow. See Create a workflow or Update a workflow.
Update a workflow
To update information about a workflow, use the UnstructuredClient
object’s workflows.update_workflow
function (for the Python SDK) or
the PUT
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman), replacing
<workflow-id>
with the workflow’s unique ID. To get this ID, see List workflows.
In UpdateWorkflow
object (for the Python SDK) or
the request body (for curl
or Postman), specify the settings for the workflow. For the specific settings to include, see
Update a workflow.
Delete a workflow
To delete a workflow, use the UnstructuredClient
object’s workflows.delete_workflow
function (for the Python SDK) or
the DELETE
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman), replacing
<workflow-id>
with the workflow’s unique ID. To get this ID, see List workflows.
Jobs
You can list, get, and cancel jobs.
A job is created automatically whenever a workflow runs on a schedule; see Create a workflow. A job is also created whenever you run a workflow; see Run a workflow.
For general information, see Jobs.
List jobs
To list jobs, use the UnstructuredClient
object’s jobs.list_jobs
function (for the Python SDK) or
the GET
method to call the /jobs
endpoint (for curl
or Postman).
To filter the list of jobs, use one or both of the following ListJobsRequest
parameters (for the Python SDK) or
query parameters (for curl
or Postman):
workflow_id=<workflow-id>
, replacing<workflow-id>
with the workflow’s unique ID. To get this ID, see List workflows.status=<status>
, replacing<status>
with one of the following job statuses:failed
,finished
, orrunning
.
For curl
or Postman, you can specify multiple query parameters as ?workflow_id=<workflow-id>&status=<status>
.
Get a job
To get information about a job, use the UnstructuredClient
object’s jobs.get_job
function (for the Python SDK) or
the GET
method to call the /jobs/<job-id>
endpoint (for curl
or Postman), replacing
<job-id>
with the job’s unique ID. To get this ID, see List jobs.
Cancel a job
To cancel a running job, use the UnstructuredClient
object’s jobs.cancel_job
function (for the Python SDK) or
the POST
method to call the /jobs/<job-id>/cancel
endpoint (for curl
or Postman), replacing
<job-id>
with the job’s unique ID. To get this ID, see List jobs.
Was this page helpful?