If you’re used to working the Unstructured open source library but would like to leverage the advanced capabilities of Unstructured API services, you can do so without leaving your familiarity with the open source library. Whether you’re using the Free Unstructured API, the Unstructured Serverless API, the Unstructured API on Azure/AWS, or your local deployment of the Unstructured API, you can use the open source library to send an individual file through partition_via_api for processing with Unstructured API services.

Unstructured recommends that you use the Unstructured Ingest CLI or the Unstructured Ingest Python library if any of the following apply to you:

  • You need to work with documents in cloud storage.
  • You want to cache the results of processing multiple files in batches.
  • You want more precise control over document-processing pipeline stages such as partitioning, chunking, filtering, staging, and embedding.

To use the open source library, you’ll also need:

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

If you do not specify the API URL, your Unstructured Serverless API pay-as-you-go account will be used by default. You must always specify your Serverless API key.

To use the Free Unstructured API, you must always specify your Free API key, and the Free API URL which is https://api.unstructured.io/general/v0/general

To use the pay-as-you-go Unstructured API on Azure or AWS with the SDKs, you must always specify the corresponding API URL. See the Azure or AWS instructions.

Installation

Make sure you have the Unstructured open source library installed. Refer to the Unstructured open source library quickstart guide for instructions.

Basics

Here’s a basic example in which you send an email file to partition via Unstructured API using partition_via_api available in the Unstructured open source library:

import os, json

from unstructured.partition.api import partition_via_api

elements = partition_via_api(
    filename="example-docs/pdf/DA-1p.pdf",
    api_key=os.getenv("UNSTRUCTURED_API_KEY"),
    api_url=os.getenv("UNSTRUCTURED_API_URL"),
    strategy="hi_res",
    split_pdf_page=True,
    split_pdf_concurrency_level=15
)

element_dicts = [element.to_dict() for element in elements]

# Print the processed data's first element only.
print(element_dicts[0])

# Write the processed data to a local file.
json_elements = json.dumps(element_dicts, indent=2)

with open("example-docs/pdf/DA-1p.pdf.json", "w") as file:
    file.write(json_elements)

When sending a request to the Free Unstructured API, you only need to provide your individual API key. When sending requests to the Unstructured Serverless API, you need to supply your unique API URL in addition to your API key. For Unstructured API on Azure/AWS, use the API URL that you configured through those services.

Parameters & examples

The API parameters are the same across all methods of accessing the Unstructured API.

  • Refer to the API parameters page for the full list of available parameters.
  • Refer to the Examples page for some inspiration on using the parameters.