Unstructured Serverless API services
Unstructured Serverless API services include these offerings:
Unstructured Serverless API
Serverless API hosted by Unstructured. Scalable and secure. Your data remains private. Start with a 14-day free trial, then pay as you go.
Try the quickstart.
Learn more.
Free Unstructured API
Hosted by Unstructured. Free to use, but data processing is limited to 1000 pages per month. Documents submitted to the Free Unstructured API may be utilized for Unstructured’s proprietary model training and evaluation purposes.
Try the quickstart.
Learn more.
Unstructured API on Azure
The functionality and security of the Unstructured Serverless API, but deployed through the Azure Marketplace, in your preferred infrastructure. Pay as you go.
Learn more.
Unstructured API on AWS
The functionality and security of the Unstructured Serverless API, but deployed through the AWS Marketplace, in your preferred infrastructure. Pay as you go.
Learn more.
Benefits over open source
Unstructured Serverless API services provide the following benefits beyond the Unstructured open source library offering:
- Designed for production scenarios.
- Significantly increased performance on document and table extraction.
- Access to newer and more sophisticated vision transformer models.
- Access to Unstructured’s fine-tuned OCR models.
- Access to Unstructured’s by-page and by-similarity chunking strategies.
- Adherence to security and SOC2 Type 1, SOC2 Type 2, and HIPAA compliance standards.
- Authentication and identity management.
- Incremental data loading.
- Image extraction from documents.
- More sophisticated document hierarchy detection.
- Unstructured manages code dependencies, for instance for libraries such as Tesseract.
- Unstructured manages its own infrastructure, including parallelization and other performance optimizations.
Supported file types
Unstructured supports processing of the following file types:
By file extension:
File extension |
---|
.bmp |
.csv |
.doc |
.docx |
.eml |
.epub |
.heic |
.html |
.jpeg |
.png |
.md |
.msg |
.odt |
.org |
.p7s |
.pdf |
.png |
.ppt |
.pptx |
.rst |
.rtf |
.tiff |
.txt |
.tsv |
.xls |
.xlsx |
.xml |
By file type:
Category | File types |
---|---|
CSV | .csv |
.eml , .msg , .p7s | |
EPUB | .epub |
Excel | .xls , .xlsx |
HTML | .html |
Image | .bmp , .heic , .jpeg , .png , .tiff |
Markdown | .md |
Org Mode | .org |
Open Office | .odt |
.pdf | |
Plain text | .txt |
PowerPoint | .ppt , .pptx |
reStructured Text | .rst |
Rich Text | .rtf |
TSV | .tsv |
Word | .doc , .docx |
XML | .xml |
Data ingestion
Unstructured Serverless API services support ingesting data from various sources. Learn how.
Billing
We calculate a page as follows:
- For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
- For .docx files that have page metadata, we calculate the number of pages based on that metadata.
- For all other file types, we calculate the number of pages as the file’s size divided by 100 KB.
- For non-file data, we calculate a page as 100 KB of incoming data to be processed.
Get support
Should you require any assistance or have any questions regarding the Unstructured API, please contact our support team at support@unstructured.io.
Was this page helpful?