The Unstructured Ingest Python library enables you to use Python code to send files in batches to Unstructured API services for processing, and to tell Unstructured API services where to deliver the processed data.

The following 3-minute video shows how to use the Unstructured Ingest Python library to send multiple PDFs from a local directory in batches to be ingested by Unstructured API services for processing:

Learn more.

Installation

One approach to get started quickly with the Unstructured Ingest Python library is to install Python and then run the following command:

pip install unstructured-ingest

This default installation option enables the ingestion of plain text files, HTML, XML, JSON and emails that do not require any extra dependencies. This default option also enables you to specify local source and destination locations.

You might also need to install additional dependencies, depending on your needs. Learn more.

For additional installation options, and information about v2 and v1 implementations in this library, see the Unstructured Ingest Python library in the Ingest section.

To migrate from older, deprecated versions of the Ingest Python library that used pip install unstructured, see the migration guide.

Usage

For example, to use the Unstructured Ingest Python library to ingest files from a local source (input) location and to deliver the processed data to an Azure Storage account destination (output) location:

To learn how to use the Unstructured Ingest Python library to work with a specific source (input) and destination (output) location, see the Python code examples for the source and destination connectors that are available for you to choose from.

See also the ingest configuration settings that enable you to further control how batches are sent and processed.