To start using the Unstructured API right away, skip ahead to the quickstart now!
- The Unstructured Workflow Endpoint enables a full range of partitioning, chunking, embedding, and enrichment options for your files and data. It is designed to batch-process files and data in remote locations; send processed results to various storage, databases, and vector stores; and use the latest and highest-performing models on the market today. It has built-in logic to deliver the highest quality results at the lowest cost. Learn more.
- The Unstructured Partition Endpoint is intended for rapid prototyping of Unstructured’s various partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files, one file at a time. Use the Unstructured Workflow Endpoint for production-level scenarios, file processing in batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and highest-performing models, and for the highest quality results at the lowest cost. Learn more.
Benefits over open source
The Unstructured API provides the following benefits beyond the Unstructured open source library offering:- Designed for production scenarios.
- Significantly increased performance on document and table extraction.
- Access to newer and more sophisticated vision transformer models.
- Access to Unstructured’s fine-tuned OCR models.
- Access to Unstructured’s by-page and by-similarity chunking strategies.
- Adherence to security and SOC2 Type 1, SOC2 Type 2, HIPAA, GDPR, and ISO 27001 compliance standards. For details, see the Unstructured Trust Portal.
- Authentication and identity management.
- Incremental data loading.
- Image extraction from documents.
- More sophisticated document hierarchy detection.
- Unstructured manages code dependencies, for instance for libraries such as Tesseract.
- Unstructured manages its own infrastructure, including parallelization and other performance optimizations.
Pricing
To call the Unstructured API, you must have an Unstructured account. Unstructured offers several account types with different pricing plans:- Starter - A single user, with a single workspace, hosted alongside other accounts on Unstructured’s cloud infrastructure.
- Team - Multiple users and workspaces, hosted alongside other accounts on Unstructured’s cloud instrastructure.
-
Enterprise - Multiple users and workspaces, isolated from all other accounts, with two hosting options for additional security and control:
- Dedicated instance - Hosted within a virtual private cloud (VPC) running inside Unstructured’s cloud infrastructure.
- In-VPC - Hosted within your own VPC on your own cloud infrastructure.
- For these file types, a page is a page, slide, or image:
.pdf
,.pptx
, and.tiff
. - For
.docx
files that have page metadata, Unstructured calculates the number of pages based on that metadata. - For all other file types, Unstructured calculates the number of pages as the file’s size divided by 100 KB.
- For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.