The Unstructured API consists of two parts:
  • The Unstructured Workflow Endpoint enables a full range of partitioning, chunking, embedding, and enrichment options for your files and data. It is designed to batch-process files and data in remote locations; send processed results to various storage, databases, and vector stores; and use the latest and highest-performing models on the market today. It has built-in logic to deliver the highest quality results at the lowest cost. Learn more.
  • The Unstructured Partition Endpoint is intended for rapid prototyping of Unstructured’s various partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files, one file at a time. Use the Unstructured Workflow Endpoint for production-level scenarios, file processing in batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and highest-performing models, and for the highest quality results at the lowest cost. Learn more.

Benefits over open source

The Unstructured API provides the following benefits beyond the Unstructured open source library offering:
  • Designed for production scenarios.
  • Significantly increased performance on document and table extraction.
  • Access to newer and more sophisticated vision transformer models.
  • Access to Unstructured’s fine-tuned OCR models.
  • Access to Unstructured’s by-page and by-similarity chunking strategies.
  • Adherence to security and SOC2 Type 1, SOC2 Type 2, HIPAA, GDPR, and ISO 27001 compliance standards. For details, see the Unstructured Trust Portal.
  • Authentication and identity management.
  • Incremental data loading.
  • Image extraction from documents.
  • More sophisticated document hierarchy detection.
  • Unstructured manages code dependencies, for instance for libraries such as Tesseract.
  • Unstructured manages its own infrastructure, including parallelization and other performance optimizations.

Pricing

To call the Unstructured API, you must have an Unstructured account. Unstructured offers several account types with different pricing plans:
  •   Starter - A single user, with a single workspace, hosted alongside other accounts on Unstructured’s cloud infrastructure.
  •   Team - Multiple users and workspaces, hosted alongside other accounts on Unstructured’s cloud instrastructure.
  •   Enterprise - Multiple users and workspaces, isolated from all other accounts, with two hosting options for additional security and control:
    •   Dedicated instance - Hosted within a virtual private cloud (VPC) running inside Unstructured’s cloud infrastructure.
    •   In-VPC - Hosted within your own VPC on your own cloud infrastructure.
    Enterprise accounts also allow for robust customization of Unstructured’s features for your unique needs.
For more details, see the Unstructured Pricing page. To upgrade your account from Starter to Team, or from Team to Enterprise, email Unstructured Sales at sales@unstructured.io. Some of these plans have billing details that are determined on a per-page basis. Unstructured calculates a page as follows:
  • For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
  • For .docx files that have page metadata, Unstructured calculates the number of pages based on that metadata.
  • For all other file types, Unstructured calculates the number of pages as the file’s size divided by 100 KB.
  • For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.

Get support

For technical support for Unstructured accounts, email Unstructured Support at support@unstructured.io.

Learn more