Frequently Asked Questions

Get in touch

If you don’t find the information you’re looking for in the documentation, or require assistance, get in touch with our Support team at support@unstructured.io, or join our Slack where our team and community can help you.

Re-launch of Serverless API

What is the Unstructured Serverless API, and why did you relaunch it?

The Unstructured Serverless API is a powerful tool that simplifies extracting and analyzing valuable information from unstructured data like documents and images. We’ve relaunched our Serverless API with a new serverless architecture, per-page pricing, and enhanced processing capabilities to deliver faster insights, a smoother user experience, and more predictable costs.

What are the new Serverless API features?

Here’s a quick overview of what to expect:

Simplified Onboarding and User Dashboard: Using an intuitive dashboard, you can easily manage your keys and billing options and monitor usage.
New Per-Page Pricing: Enjoy reduced costs with a transparent and predictable pricing model.
Improved Processing Throughput and Latency: Document processing ramp-up times have been slashed from thirty minutes to under three seconds, with horizontal scaling across multiple regions.
Enhanced Extraction Performance: Our new document transformation models deliver superior extraction performance for over 25 file types.

How is my data secured when using the Serverless API?

For the Serverless API users, your data is kept entirely private. Unstructured does not persist it, use any of it for training, or have the ability to see it. This is in contrast to the Free API, where data may be used to refine and enhance Unstructured’s service offerings. Please refer to our Terms and Conditions for more details.

You can sign up using your email or the Google and GitHub Single Sign-On (SSO) service. Visit our Serverless API website and get your API Key.

Can I still use the old API keys?

Yes, you can still use your old API keys. We will migrate all the user keys to the new Serverless API.

How can I generate and use a new API Key to process my documents?

When you log in to the Serverless API dashboard, you can access your API keys by clicking the API Keys link in the side navigation. Under the Actions column, click the Copy icon to copy the key or the boilerplate codes to process the documents using the Unstructured REST API POST with curl, or the Unstructured CLI, or the Unstructured Python SDK or Unstructured JavaScript/Typescript SDK.

What is the new Unstructured API pricing structure?

We offer a clear and straightforward pay-per-page pricing model, giving you full control and predictability over your document preprocessing costs. We have two different document processing strategies:

Fast Strategy: $1 per 1000 pages processed

Designed for low-latency use cases, Fast Strategy is for file types other than PDF or images, where we can capitalize on the structure of the document to exact and classify the text. This strategy uses rules-based parsers to deliver fast and cost-effective processing to render natural language in structured JSON.
Hi-Res Strategy: $10 per 1000 pages processed

Best for complex file types like PDF and JPEG or documents with images, forms, and tables. This strategy uses AI models to understand document layouts and render their contents in structured JSON.

We calculate a page as follows:

For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
For .docx files that have page metadata, we calculate the number of pages based on that metadata.
For all other file types, we calculate the number of pages as the file’s size divided by 100 KB.

How can I check my balance and make the payment?

Your usage information can be found on your API dashboard. Click the Usage link on the side navigation. You can view your usage by date, including detailed information, such as the number of requests, total pages processed by fast and hi-res strategy, and total cost.

How can I update my credit card information?

You can manage your payment method by going to the Billing page and clicking the Manage Payment Method button. You will be redirected to the Stripe page, where you can enter and save your credit card information.

How can I get the images out of my source documents and into my transformed JSON?

Unstructured will return Base64 encoded images in your transformed JSON if you set the parameter extract_image_block_types=["Table", "Image"]. For more information on what parameters you can set go here.

API Errors

What should be done when encountering a 401 or 404 error?

Encountering a 401 or 404 error necessitates reviewing the endpoint and API key usage. Ensuring that the endpoint alignment and API key authentication are correctly configured is crucial. This may involve verifying the inclusion or exclusion of specific path elements based on the use context, such as when employing the partition_via_api function or via the SDKs.

How can I address a 504 gateway timeout error during hi_res or ocr_only processing?

A 504 gateway timeout error typically arises during prolonged processing tasks, particularly when employing high-resolution strategies or extensive data sets. This situation suggests the operation exceeded the allotted processing window by about 30 minutes. Adjusting processing parameters or strategies to optimize performance may be beneficial. Our support team is available to assist in identifying and implementing suitable modifications to circumvent these timeouts, ensuring efficient data processing.

API Usage

How does the partition_by_api function work with the Unstructured Serverless API?

The partition_via_api function within the Unstructured platform provides a streamlined way for users to segment documents via the hosted Unstructured API. The partition_via_api function within the Unstructured platform provides a streamlined way for users to segment documents via the hosted Unstructured API. This automation tool is particularly valuable for those managing the API on their servers or in local containerized environments, simplifying the process of breaking down documents into more manageable components.

partition

import os 

from unstructured.partition.api import partition_via_api

filename = "example-docs/eml/fake-email.eml"

elements = partition_via_api(
    filename=filename,
    api_key=os.getenv("UNSTRUCTURED_API_KEY")
    api_url=os.getenv("UNSTRUCTURED_API_URL")
)

How can I specify the language for OCR document processing using Unstructured Serverless API?

To define the OCR language when processing documents, utilize the languages keyword argument in your API request. For example:

ocrlang

curl -X 'POST' \
  '$UNSTRUCTURED_API_URL' \
  -H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@sample-docs/english-and-korean.png' \
  -F 'strategy=ocr_only' \
  -F 'languages=eng'  \
  -F 'languages=kor'  \
  | jq -C . | less -R

For comprehensive language support, refer to the Tesseract documentation, which provides a detailed list of supported languages and installation guidelines.

You can still use ocr_languages kwarg, but this parameter is being deprecated in favor of languages kwarg.

Can I employ the basic chunking strategy with the Unstructured API?

Yes, the Unstructured API supports the implementation of the Unstructured chunking strategies. This approach facilitates the efficient segmentation of content, enhancing manageability and processing. Our API Parameter documentation provides detailed instructions on employing the chunking_strategy and relevant kwargs, including multipage_sections, combine_under_n_chars, new_after_n_chars, and max_characters.

How can I extract images from PDFs via the API?

You can pass the extract_image_block_types and extract_image_block_to_payload kwargs in the API Parameters. For more information, please refer to our API Parameter documentation.

What is the difference between table_as_cells and table_as_html?

table_as_cells was designed for internal testing only. It has been replaced with the more descriptive name table_as_html.

Terms & Conditions

Do the Terms & Conditions allow Unstructured to use the data I send through the API?

On our paid products your data is kept entirely private. We do not persist it. We do not use any of it for training. We cannot see it.

If you decide to use our Free API, you’ll see Clause #9 in our Terms & Conditions. It applies solely to the Free API version. It grants us the option to use data coming through the Free API only, to refine and enhance our service offerings.

On our paid products your data is kept entirely private. We do not persist it. We do not use any of it for training. We cannot see it.

FAQ

​Get in touch

​Re-launch of Serverless API

​What is the Unstructured Serverless API, and why did you relaunch it?

​What are the new Serverless API features?

​How is my data secured when using the Serverless API?

​How can I sign up for the new Serverless API?

​Can I still use the old API keys?

​How can I generate and use a new API Key to process my documents?

​What is the new Unstructured API pricing structure?

​How can I check my balance and make the payment?

​How can I update my credit card information?

​How can I get the images out of my source documents and into my transformed JSON?

​API Errors

​What should be done when encountering a 401 or 404 error?

​How can I address a 504 gateway timeout error during hi_res or ocr_only processing?

​API Usage

​How does the partition_by_api function work with the Unstructured Serverless API?

​How can I specify the language for OCR document processing using Unstructured Serverless API?

​Can I employ the basic chunking strategy with the Unstructured API?

​How can I extract images from PDFs via the API?

​What is the difference between table_as_cells and table_as_html?

​Terms & Conditions

​Do the Terms & Conditions allow Unstructured to use the data I send through the API?