Unstructured API Quickstart

Do you want to run this quickstart without modifying your local machine? Skip ahead to run this quickstart as a notebook on Google Colab now!Do you want to just copy the sample code for use on your local machine? Skip ahead to the code now!This quickstart uses the Unstructured Partition Endpoint and focuses on a single, local file for ease-of-use demonstration purposes. This quickstart also focuses only on a limited set of Unstructured’s full capabilities. To unlock the full feature set, as well as use Unstructured to do large-scale batch processing of multiple files and semi-structured data that are stored in remote locations, skip over to an expanded, advanced version of this quickstart that uses the Unstructured Workflow Endpoint instead.

The following code shows how to use the Unstructured Python SDK to have Unstructured process one or more local files by using the Unstructured Partition Endpoint. To run this code, you will need the following:

An Unstructured account and an Unstructured API key for your account. Learn how.
Python 3.9 or higher installed on your local machine.
A Python virtual environment is recommended for isolating and versioning Python project code dependencies, but this is not required. To create and activate a virtual environment, you can use a framework such as uv (recommended). Python provides a built-in framework named venv.
You must install the Unstructured Python SDK on your local machine, for example by running one of the following commands:
- For uv, run uv add unstructured-client
- For venv (or for no virtual environment), run pip install unstructured-client
Add the following code to a Python file on your local machine; make the following code changes; and then run the code file to see the results.
- Replace <unstructured-api-key> with your Unstructured API key.
- To process all files within a directory, change None for input_dir to a string that contains the path to the directory on your local machine. This can be a relative or absolute path.
- To process specific files within a directory or across multiple directories, change None for input_file to a string that contains a comma-separated list of filepaths on your local machine, for example "./input/2507.13305v1.pdf,./input2/table-multi-row-column-cells.pdf". These filepaths can be relative or absolute.
  If input_dir and input_file are both set to something other than None, then the input_dir setting takes precedence, and the input_file setting is ignored.
- For the output_dir parameter, specify a string that contains the path to the directory on your local machine that you want Unstructured to send its JSON output files. If the specified directory does not exist at that location, the code will create the missing directory for you. This path can be relative or absolute.

Sample code

Python SDK

import asyncio
import os
import json
import unstructured_client
from unstructured_client.models import shared, errors

client = unstructured_client.UnstructuredClient(
    api_key_auth="<unstructured-api-key>"
)

async def partition_file_via_api(filename):
    req = {
        "partition_parameters": {
            "files": {
                "content": open(filename, "rb"),
                "file_name": os.path.basename(filename),
            },
            "strategy": shared.Strategy.AUTO,
            "vlm_model": "gpt-4o",
            "vlm_model_provider": "openai",
            "languages": ['eng'],
            "split_pdf_page": True, 
            "split_pdf_allow_failed": True,
            "split_pdf_concurrency_level": 15
        }
    }

    try:
        res = await client.general.partition_async(request=req)
        return res.elements
    except errors.UnstructuredClientError as e:
        print(f"Error partitioning {filename}: {e.message}")
        return []

async def process_file_and_save_result(input_filename, output_dir):
    elements = await partition_file_via_api(input_filename)

    if elements:
        results_name = f"{os.path.basename(input_filename)}.json"
        output_filename = os.path.join(output_dir, results_name)

        with open(output_filename, "w") as f:
            json.dump(elements, f)

def load_filenames_in_directory(input_dir):
    filenames = []
    for root, _, files in os.walk(input_dir):
        for file in files:
            if not file.endswith('.json'):
                filenames.append(os.path.join(root, file))

    return filenames

async def process_files():
    # Initialize with either a directory name, to process everything in the dir,
    # or a comma-separated list of filepaths.
    input_dir = None   # "path/to/input/directory"
    input_files = None # "path/to/file,path/to/file,path/to/file"

    # Set to the directory for output json files. This dir 
    # will be created if needed.
    output_dir = "./output/"

    if input_dir:
        filenames = load_filenames_in_directory(input_dir)
    else:
        filenames = input_files.split(",")

    os.makedirs(output_dir, exist_ok=True)

    tasks = []
    for filename in filenames:
        tasks.append(
            process_file_and_save_result(filename, output_dir)
        )

    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(process_files())

Next steps

This quickstart shows how to use the Unstructured Partition Endpoint, which is intended for rapid prototyping of some of Unstructured’s partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files. Take your code to the next level by switching over to the Unstructured Workflow Endpoint for production-level scenarios, file processing in batches, files and data in remote locations, full support for chunking, generating embeddings, applying post-transform enrichments, using the latest and highest-performing models, and much more. Get started.

Unstructured API

Workflow Endpoint

Partition Endpoint

Legacy APIs

Troubleshooting

Unstructured API Quickstart

Sample code

Next steps

Unstructured API

Workflow Endpoint

Partition Endpoint

Legacy APIs

Troubleshooting

​Sample code

​Next steps

Sample code

Next steps