The Unstructured Partition Endpoint, part of the Unstructured API, is intended for rapid prototyping of Unstructured’s various partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files, one file at a time. Use the Unstructured Workflow Endpoint for production-level scenarios, file processing in batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and highest-performing models, and for the highest quality results at the lowest cost.

Get started

To call the Unstructured Partition Endpoint, you need an Unstructured account and an Unstructured API key:
  1. If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io.
    To sign up for a Team or Enterprise account instead, contact Unstructured Sales, or learn more.
  2. If you have an Unstructured Starter or Team account and are not already signed in, sign in to your account at https://platform.unstructured.io.
    For an Enterprise account, see your Unstructured account administrator for instructions, or email Unstructured Support at support@unstructured.io.
  3. Get your Unstructured API key:
    a. After you sign in to your Unstructured Starter account, click API Keys on the sidebar.
    For a Team or Enterprise account, before you click API Keys, make sure you have selected the organizational workspace you want to create an API key for. Each API key works with one and only one organizational workspace. Learn more.
    b. Click Generate API Key.
    c. Follow the on-screen instructions to finish generating the key.
    d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
Try the quickstart.

Pricing

Unstructured offers several account types with different pricing plans:
  •   Starter - A single user, with a single workspace, hosted alongside other accounts on Unstructured’s cloud infrastructure.
  •   Team - Multiple users and workspaces, hosted alongside other accounts on Unstructured’s cloud instrastructure.
  •   Enterprise - Multiple users and workspaces, isolated from all other accounts, with two hosting options for additional security and control:
    •   Dedicated instance - Hosted within a virtual private cloud (VPC) running inside Unstructured’s cloud infrastructure.
    •   In-VPC - Hosted within your own VPC on your own cloud infrastructure.
    Enterprise accounts also allow for robust customization of Unstructured’s features for your unique needs.
For more details, see the Unstructured Pricing page. To upgrade your account from Starter to Team, or from Team to Enterprise, email Unstructured Sales at sales@unstructured.io. Some of these plans have billing details that are determined on a per-page basis. Unstructured calculates a page as follows:
  • For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
  • For .docx files that have page metadata, Unstructured calculates the number of pages based on that metadata.
  • For all other file types, Unstructured calculates the number of pages as the file’s size divided by 100 KB.
  • For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.

Quickstart

To use the Unstructured Python SDK instead of curl for the following quickstart, skip ahead to the SDK version instead.
This example uses the curl utility on your local machine to call the Unstructured Partition Endpoint. It sends a source (input) file from your local machine to the Unstructured Partition Endpoint which then delivers the processed data to a destination (output) location, also on your local machine. Data is processed on Unstructured-hosted compute resources. If you do not have a source file readily available, you could use for example a sample PDF file containing the text of the United States Constitution, available for download from https://constitutioncenter.org/media/files/constitution.pdf.
1

Set environment variables

From your terminal or Command Prompt, set the following two environment variables.
  • Replace <your-unstructured-api-url> with the Unstructured Partition Endpoint URL. This URL was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at support@unstructured.io.
    The default URL for the Unstructured Partition Endpoint is https://api.unstructuredapp.io/general/v0/general. However, you should always use the URL that was provided to you when your Unstructured account was created.
  • Replace <your-unstructured-api-key> with your Unstructured API key, which you generated earlier on this page.
export UNSTRUCTURED_API_URL=<your-unstructured-api-url>
export UNSTRUCTURED_API_KEY="<your-unstructured-api-key>"
2

Run the curl command

Run the following curl command, replacing <path/to/file> with the path to the source file on your local machine.If the source file is not a PDF file, then remove ;type=application/pdf from the final --form option in this command.
curl --request 'POST' \
"$UNSTRUCTURED_API_URL" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'content-Type: multipart/form-data' \
--form 'content_type=string' \
--form 'strategy=vlm' \
--form 'vlm_model_provider=openai' \
--form 'vlm_model=gpt-4o' \
--form 'output_format=application/json' \
--form 'files=@<path/to/file>;type=application/pdf'
3

Examine the results

After you run the curl command, the results are printed to your terminal or Command Prompt. The command might take several minutes to complete.By default, the JSON is printed without indenting or other whitespace. You can pretty-print the JSON output by using utilities such as jq in future command runs.You can also pipe the JSON output to a local file by using the curl option -o, —output <file> in future command runs.
You can also call the Unstructured Partition Endpoint by using the Unstructured Python SDK or the Unstructured JavaScript/TypeScript SDK.