Unstructured Serverless API services include these offerings:

  Read the launch announcement.

Benefits over open source

Unstructured Serverless API services provide the following benefits beyond the Unstructured open source library offering:

  • Designed for production scenarios.
  • Significantly increased performance on document and table extraction.
  • Access to newer and more sophisticated vision transformer models.
  • Access to Unstructured’s fine-tuned OCR models.
  • Access to Unstructured’s by-page and by-similarity chunking strategies.
  • Adherence to security and SOC2 Type 1, SOC2 Type 2, and HIPAA compliance standards.
  • Authentication and identity management.
  • Incremental data loading.
  • Image extraction from documents.
  • More sophisticated document hierarchy detection.
  • Unstructured manages code dependencies, for instance for libraries such as Tesseract.
  • Unstructured manages its own infrastructure, including parallelization and other performance optimizations.

Supported file types

Unstructured supports processing of the following file types:

By file extension:

File extension
.bmp
.csv
.doc
.docx
.eml
.epub
.heic
.html
.jpeg
.png
.md
.msg
.odt
.org
.p7s
.pdf
.png
.ppt
.pptx
.rst
.rtf
.tiff
.txt
.tsv
.xls
.xlsx
.xml

By file type:

CategoryFile types
CSV.csv
E-mail.eml, .msg, .p7s
EPUB.epub
Excel.xls, .xlsx
HTML.html
Image.bmp, .heic, .jpeg, .png, .tiff
Markdown.md
Org Mode.org
Open Office.odt
PDF.pdf
Plain text.txt
PowerPoint.ppt, .pptx
reStructured Text.rst
Rich Text.rtf
TSV.tsv
Word.doc, .docx
XML.xml

Data ingestion

Unstructured Serverless API services support ingesting data from various sources. Learn how.

Billing

We calculate a page as follows:

  • For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
  • For .docx files that have page metadata, we calculate the number of pages based on that metadata.
  • For all other file types, we calculate the number of pages as the file’s size divided by 100 KB.

Get support

Should you require any assistance or have any questions regarding the Unstructured API, please contact our support team at support@unstructured.io.