Overview
The Unstructured Partition Endpoint, part of the Unstructured API, is intended for rapid prototyping of Unstructured’s various partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files, one file at a time. Use the Unstructured Workflow Endpoint for production-level scenarios, file processing in batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and highest-performing models, and for the highest quality results at the lowest cost.
Get started
To call the Unstructured Partition Endpoint, you need an Unstructured account and an Unstructured API key:
If you signed up for Unstructured through the For Enterprise page, or if you are using a self-hosted deployment of Unstructured, the following information about signing up, signing in, and getting your Unstructured API key might apply differently to you. For details, contact Unstructured Sales at sales@unstructured.io.
-
Sign in to your Unstructured account:
- If you do not already have an Unstructured account, go to https://unstructured.io/contact and fill out the online form to indicate your interest.
- If you already have an Unstructured account, go to https://platform.unstructured.io and sign in by using the email address, Google account, or GitHub account that is associated with your Unstructured account. The Unstructured user interface (UI) then appears, and you can start using it right away.
-
Get your Unstructured API key:
a. In the Unstructured UI, click API Keys on the sidebar.
b. Click Generate API Key.
c. Follow the on-screen instructions to finish generating the key.
d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
Pricing
Unstructured offers three account pricing plans:
- SaaS Cloud-hosted - Processing happens on Unstructured’s software-as-a-service (SaaS) cloud infrastructure in a multi-tenant environment.
- Hybrid SaaS - Processing also happens on Unstructured’s SaaS cloud infrastructure, but your data stays protected in a dedicated cloud environment, maintaining strict data privacy.
- VPC - Sometimes referred to as self-hosted, an instance of the Unstructured SaaS is deployed into your own virtual private cloud (VPC), providing complete data ownership and infrastructure control, full customization, and dedicated technical support.
For more details, see the Unstructured Pricing page.
Some of these plans are billed on a per-page basis.
Unstructured calculates a page as follows:
- For these file types, a page is a page, slide, or image:
.pdf
,.pptx
, and.tiff
. - For
.docx
files that have page metadata, Unstructured calculates the number of pages based on that metadata. - For all other file types, Unstructured calculates the number of pages as the file’s size divided by 100 KB.
- For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.
Quickstart
This example uses the curl utility on your local machine to call the Unstructured Partition Endpoint. It sends a source (input) file from your local machine to the Unstructured Partition Endpoint which then delivers the processed data to a destination (output) location, also on your local machine. Data is processed on Unstructured-hosted compute resources.
If you do not have a source file readily available, you could use for example a sample PDF file containing the text of the United States Constitution, available for download from https://constitutioncenter.org/media/files/constitution.pdf.
Set environment variables
From your terminal or Command Prompt, set the following two environment variables.
- Replace
<your-unstructured-api-url>
with the Unstructured Partition Endpoint URL, which ishttps://api.unstructuredapp.io/general/v0/general
- Replace
<your-unstructured-api-key>
with your Unstructured API key, which you generated earlier on this page.
Run the curl command
Run the following curl
command, replacing <path/to/file>
with the path to the source file on your local machine.
If the source file is not a PDF file, then remove ;type=application/pdf
from the final --form
option in this command.
Examine the results
After you run the curl
command, the results are printed to your terminal or Command Prompt. The command might take several
minutes to complete.
By default, the JSON is printed without indenting or other whitespace. You can pretty-print the JSON output by using utilities such as jq in future command runs.
You can also pipe the JSON output to a local file by using the curl
option -o, —output <file> in future command runs.
You can also call the Unstructured Partition Endpoint by using the Unstructured Python SDK or the Unstructured JavaScript/TypeScript SDK.