The Unstructured JavaScript/TypeScript SDK client allows you to send an individual file for processing by Unstructured API services. Whether you’re using the Free Unstructured API, the Unstructured Serverless API, the Unstructured API on Azure/AWS, or your local deployment of the Unstructured API, you can access the API using the JavaScript/TypeScript SDK.

Unstructured recommends that you use the Unstructured Ingest CLI or the Unstructured Ingest Python library if any of the following apply to you:

  • You need to work with documents in cloud storage.
  • You want to cache the results of processing multiple files in batches.
  • You want more precise control over document-processing pipeline stages such as partitioning, chunking, filtering, staging, and embedding.

To use the JavaScript/TypeScript SDK, you’ll need:

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

If you do not specify the API URL, your Unstructured Serverless API pay-as-you-go account will be used by default. You must always specify your Serverless API key.

To use the Free Unstructured API, you must always specify your Free API key, and the Free API URL which is https://api.unstructured.io/general/v0/general

To use the pay-as-you-go Unstructured API on Azure or AWS with the SDKs, you must always specify the corresponding API URL. See the Azure or AWS instructions.

Installation

Before using the SDK to interact with Unstructured API services, install the library:

The SDK uses semantic versioning and major bumps could bring breaking changes. It is advised to pin your installed version.

Basics

Let’s start with a simple example in which you send a PDF document to be partitioned with the free Unstructured API:

The JavaScript/TypeScript SDK has the following breaking changes in v0.11.0:

  • Imports under the dist path have moved up a level
  • Enums are now used for parameters with a set of options
    • This includes chunkingStrategy, outputFormat, and strategy
  • All parameters to partition have moved to a partitionParameters object

For a code example that works with an entire directory of files instead of just a single PDF, see the Processing multiple files section.

Page splitting

In order to speed up processing of large PDF files, the splitPdfPage* parameter is true by default. This causes the PDF to be split into small batches of pages before sending requests to the API. The client awaits all parallel requests and combines the responses into a single response object. This is specific to PDF files and other filetypes are ignored.

The number of parallel requests is controlled by splitPdfConcurrencyLevel*. The default is 8 and the max is set to 15 to avoid high resource usage and costs.

If at least one request is successful, the responses are combined into a single response object. An error is returned only if all requests failed or there was an error during splitting.

This feature may lead to unexpected results when chunking because the server does not see the entire document context at once. If you’d like to chunk across the whole document and still get the speedup from parallel processing, you can:

  • Partition the PDF with splitPdfPage set to true, without any chunking parameters.
  • Store the returned elements in results.json.
  • Partition this JSON file with the desired chunking parameters.
TypeScript
client.general.partition({
    partitionParameters: {
        files: {
            content: data,
            fileName: filename,
        },
        strategy: Strategy.HiRes,
        // Set to `false` to disable PDF splitting
        splitPdfPage: true,
        // Continue PDF splitting even if some earlier split operations fail.
        splitPdfAllowFailed: true,
        // Modify splitPdfConcurrencyLevel to set the number of parallel requests
        splitPdfConcurrencyLevel: 10,
    }
})

Customizing the client

Retries

You can also change the defaults for retries through the retryConfig* when initializing the client. If a request to the API fails, the client will retry the request with an exponential backoff strategy up to a maximum interval of one minute. The function keeps retrying until the total elapsed time exceeds maxElapsedTime*, which defaults to one hour:

TypeScript
const key = process.env.UNSTRUCTURED_API_KEY;
const url = process.env.UNSTRUCTURED_API_URL;    

const client = new UnstructuredClient({
    security: { apiKeyAuth: key },
    serverURL: url,
    retryConfig: {
        strategy: "backoff",
        retryConnectionErrors: true,
        backoff: {
            initialInterval: 500,
            maxInterval: 60000,
            exponent: 1.5,
            maxElapsedTime: 900000, // 15min*60sec*1000ms = 15 minutes
        },
    };
});

Processing multiple files

The code example in the Basics section processes a single PDF file. But what if you want to process multiple files inside a directory with a mixture of subdirectories and files with different file types?

The following example takes an input directory path to read files from and an output directory path to write the processed data to, processing one file at a time.

TypeScript
import { UnstructuredClient } from "unstructured-client";
import * as fs from "fs";
import * as path from "path";
import { Strategy } from "unstructured-client/sdk/models/shared/index.js";
import { PartitionResponse } from "unstructured-client/sdk/models/operations";

// Send all files in the source path to Unstructured for processing.
// Send the processed data to the destination path.
function processFiles(
    client: UnstructuredClient,
    sourcePath: string,
    destinationPath: string
): void {

    // If an output directory does not exist for the corresponding input
    // directory, then create it.
    if (!fs.existsSync(destinationPath)) {
        fs.mkdirSync(destinationPath, { recursive: true });
    }

    // Get all folders and files at the current level of the input directory.
    const items = fs.readdirSync(sourcePath);

    // For each folder and file in the input directory...
    for (const item of items) {
        const inputPath = path.join(sourcePath, item);
        const outputPath = path.join(destinationPath, item)

        // If it's a folder, call this function recursively.
        if (fs.statSync(inputPath).isDirectory()) {
            processFiles(client, inputPath, outputPath);
        } else {
            // If it's a file, send it to Unstructured for processing.
            const data = fs.readFileSync(inputPath);

            client.general.partition({
                partitionParameters: {
                    files: {
                        content: data,
                        fileName: inputPath
                    },
                    strategy: Strategy.HiRes,
                    splitPdfPage: true,
                    splitPdfConcurrencyLevel: 15,
                    splitPdfAllowFailed: true
                }
            }).then((res: PartitionResponse) => {
                // If successfully processed, write the processed data to
                // the destination directory.
                if (res.statusCode == 200) {
                    const jsonElements = JSON.stringify(res.elements, null, 2)
                    fs.writeFileSync(outputPath + ".json", jsonElements)
                }
            }).catch((e) => {
                if (e.statusCode) {
                    console.log(e.statusCode);
                    console.log(e.body);
                } else {
                    console.log(e);
                }
            });
        }
    }
}

const client = new UnstructuredClient({
    security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY },
    serverURL: process.env.UNSTRUCTURED_API_URL
});

processFiles(
    client,
    process.env.LOCAL_FILE_INPUT_DIR,
    process.env.LOCAL_FILE_OUTPUT_DIR
);

Parameters & examples

The parameter names used in this document are for the JavaScript/TypeScript SDK, which follow camelCase convention. The Python SDK uses snake_case convention. Other than this difference in naming convention, the names used in the SDKs are the same across all methods.

  • Refer to the API parameters page for the full list of available parameters.
  • Refer to the Examples page for some inspiration on using the parameters.