The Unstructured Ingest CLI enables you to use command-line scripts to send files in batches to Unstructured for processing, and to tell Unstructured where to deliver the processed data. Learn more.

The Unstructured Ingest CLI does not work with the Unstructured API.

For information about the Unstructured API, see the Unstructured API Overview.

Getting started

You can use the Unstructured Ingest CLI to process files locally, or you can use the Ingest CLI to send files in batches to Unstructured for processing.

Local processing does not use an Unstructured API key or API URL.

Using the Ingest CLI to send files in batches to Unstructured for processing is more robust but requires an Unstructured API key and API URL, as follows:

  1. Go to https://platform.unstructured.io and use your email address, Google account, or GitHub account to sign up for an Unstructured account (if you do not already have one) and sign into the account at the same time. The Unstructured user interface (UI) appears.

  2. Get your Unstructured API key:

    a. In the Unstructured UI, click API Keys on the sidebar.
    b. Click Generate API Key.
    c. Follow the on-screen instructions to finish generating the key.
    d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.

By following the preceding instructions, you are signed up for a Developer pay per page account by default.

To save money, consider switching to a Subscribe & Save account instead. To save even more money, consider switching to an Enterprise account instead.

  1. The default Unstructured API URL for Unstructured Ingest is https://api.unstructuredapp.io/general/v0/general, which is the API URL for the Unstructured Partition Endpoint. You must specify this API URL in your scripts only if you are not using this default, for example, if you are calling a version of the Unstructured API that is hosted on your own compute infrastructure.

If the Unstructured API is hosted on your own compute infrastructure, the process for generating Unstructured API keys, and the Unstructured API URL that you use, are different. For details, contact Unstructured Sales at sales@unstructured.io.

Installation

One approach to get started quickly with the Unstructured Ingest CLI is to install Python and then run the following command:

pip install unstructured-ingest

This default installation option enables the ingestion of plain text files, HTML, XML, JSON and emails that do not require any extra dependencies. This default option also enables you to specify local source and destination locations.

You might also need to install additional dependencies, depending on your needs. Learn more.

For additional installation options, see Unstructured Ingest CLI in the Ingest section.

To migrate from older, deprecated versions of the Ingest CLI that used pip install unstructured, see the migration guide.

Usage

To call the Unstructured Ingest CLI, follow this calling pattern, where:

  • <source> is the command name for one of the available source (input) connectors, such as local for a local source location, azure for an Azure Storage account source, s3 for an Amazon S3 bucket source, and so on.
  • <destination> is the command name for one of the available destination (output) connectors, such as local for a local destination, azure for an Azure Storage account destination, s3 for an Amazon S3 bucket destination, and so on.
  • <setting> is one or more command-line options for specifying how and where Unstructured will ingest the files from the <source>, or how and where to deliver the processed data to the <destination>.
CLI
#!/usr/bin/env bash

unstructured-ingest \
  <source> \
    --<setting1> <value1> \
    --<setting2> <value2> \
    --<settingN> <valueN> \
  <destination> \
    --<setting1> <value1> \
    --<setting2> <value2> \
    --<settingN> <valueN>

To learn how to use the Unstructured Ingest CLI to work with a specific source (input) and destination (output) location, see the CLI script examples for the source and destination connectors that are available for you to choose from.

See also the ingest configuration settings for command-line options that enable you to further control how batches are sent and processed.