> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

<Note>
  Unstructured recommends that you use the [Unstructured Pipelines](/pipelines/overview) or the [Unstructured API](/api-reference/overview) instead of the
  Unstructured Ingest CLI or the Unstructured Ingest Python library.

  Unstructured Pipelines and the API provide a full range of partitioning, chunking, embedding, and enrichment options for your files and data.
  It also uses the latest and highest-performing models on the market today, and it has built-in logic to deliver the highest quality results
  at the lowest cost.

  The Unstructured Ingest CLI and the Unstructured Ingest Python library are not being actively updated to include these and other Unstructured API features.
</Note>

You can send multiple files in batches to be ingested by Unstructured for processing.
*Ingestion* is the term that Unstructured uses to refer to the set of activities that happens when files are input for processing. [Learn more](/open-source/ingestion/overview).

You can send batches to Unstructured by using the following tools:

* The [Unstructured Ingest CLI](/open-source/ingestion/ingest-cli), which is part of and builds upon the [Unstructured open source library](/open-source/introduction/overview).
* The [Unstructured Ingest Python](/open-source/ingestion/python-ingest) library, which is also part of and builds upon the [Unstructured open source library](/open-source/introduction/overview).

The following 3-minute video shows how to use the Unstructured Ingest Python library to send multiple PDFs from a local directory in batches to be ingested by Unstructured for processing:

<iframe width="560" height="315" src="https://www.youtube.com/embed/tSKHFXsBQ-c" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

The following 5-minute video goes into more detail about the various components of the Unstructured Ingest Python library:

<iframe width="560" height="315" src="https://www.youtube.com/embed/A_kqK2KHTdg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

## Ingest flow

The Unstructured ingest flow is similar to an extract, transform and load (ETL) data pipeline.
Because of this, a customer-defined implementation of the Unstructured ingest flow is sometimes referred to as an *ingest pipeline* or simply a *pipeline*.
An Unstructured ingest pipeline contains the following logical steps:

<Steps>
  <Step title="Index">
    Reaches out to the source location and pulls in metadata for each document.

    For example, this could include information such as the path to the files to be analyzed.

    * For the Unstructured CLI, you can control this behavior, where available for a connector, through its `--input-path` command option.
    * For the Unstructured Ingest Python library, you can control this behavior, where available for a connector, through its `<Prefix>IndexerConfig` class (where `<Prefix>` represents the connector provider's name, such as `Azure` for Azure.)
  </Step>

  <Step title="Post-Index Filter">
    After indexing, you might not want to download everything that was indexed.

    For example, you might want to download only files that match specific types, file names, paths, or sizes.

    For the Unstructured Ingest Python library, you can control this behavior through the `FiltererConfig` class.
  </Step>

  <Step title="Download">
    Using the information generated from the indexer and the filter, downloads the content as files on the local file system for processing. This may require manipulation of the data to prepare it for partitioning.

    For example, this could include information such as the path to a local directory to download files to.

    * For the Unstructured CLI, you can control this behavior through a connector's `--download-dir` command option.
    * For the Unstructured Ingest Python library, you can control this behavior through a connector's `<Prefix>DownloaderConfig` class.
  </Step>

  <Step title="Post-Download Filter">
    After downloading, if uncompression is enabled, you might not want to uncompress everything that was downloaded. The filter that was defined at the beginning is repeated here.
  </Step>

  <Step title="Uncompress">
    If enabled, searches for any compressed files (Unstructured supports TAR and ZIP) and uncompresses them.

    * For the Unstructured CLI, you can control this behavior through the `--uncompress` command option.
    * For the Unstructured Ingest Python library, you can control this behavior through the `UncompressConfig` class.
  </Step>

  <Step title="Post-Uncompress Filter">
    After downloading, and uncompressing if enabled, you might not want to partition everything that was downloaded, and uncompressed if enabled. The filter that was defined at the beginning is repeated here again.
  </Step>

  <Step title="Partition">
    Generates the structured enriched content from the local files that have been downloaded, uncompressed if enabled, and filtered. Both local-based partitioning and Unstructured-based partitioning is supported, with API services-based partitioning set to run asynchronously and local-based partitioning set to run through multiprocessing.

    For the Unstructured Ingest Python library, you can control this behavior through the `PartitionerConfig` class.
  </Step>

  <Step title="Chunk">
    Optionally, chunks the partitioned content. Chunking can also be run locally or through Unstructured, with asynchronous or multiprocessing behaviors set in the same approach as the partitioner.

    For the Unstructured Ingest Python library, you can control this behavior through the `ChunkerConfig` class.
  </Step>

  <Step title="Embed">
    Optionally, generates vector embeddings for each element in the structured output. Most of the time, this is done through API calls to a third-party embedding vendor and therefore runs asynchronously. But it can also use a locally available Hugging Face model, which will run through multiprocessing.

    For the Unstructured Ingest Python library, you can control this behavior through the `EmbedderConfig` class.
  </Step>

  <Step title="Stage">
    This is an optional step that does not apply most of the time. However, sometimes the data needs to be modified from the existing structure to better support the upload step, such as converting it to a CSV file for tabular-based destinations.

    For the Unstructured Ingest Python library, you can control this behavior, where available for a connector, through its `<Prefix>UploadStagerConfig` class.
  </Step>

  <Step title="Upload">
    Writes the local content to the destination. If no destination is provided, the local one will be used, which writes the final result to a location on the local filesystem. If batch uploads are needed, this will run in a single process with access to all documents. If batch uploads are not supported, all documents can be uploaded concurrently by using the asynchronous approach.

    For the Unstructured Ingest Python library, you can control this behavior through a connector's `<Prefix>UploaderConfig` class.
  </Step>
</Steps>

## Generate Python code examples

You can connect any available source connector to any available destination connector. However, the source connector code examples in the
documentation show connecting only to the local destination connector. Similarly, the destination connector code examples in the
documentation show connecting only to the local source connector.

To quickly generate an Unstructured Ingest Python library code example that connects *any* available source connector to *any* available destination connector,
do the following:

1. Open the [Unstructured Ingest Code Generator](https://huggingface.co/spaces/MariaK/unstructured-pipeline-builder) webpage.

2. Select your input (source) location type from the **Get unstructured documents from** drop-down list.

3. Select your output (destination) location type from the **Upload RAG-ready documents to** drop-down list.

4. Select your chunking strategy from the **Chunking strategy** drop-down list:

   * **None** - Do not chunk the data elements' content.
   * **basic** - Combine sequential data elements to maximally fill each chunk. However, do not mix `Table` and non-`Table` elements in the same chunk.
   * **by\_title** - Use the `basic` strategy and also preserve section boundaries. Optionally preserve page boundaries as well.
   * **by\_page** - Use the `basic` strategy and also preserve page boundaries.
   * **by\_similarity** - Use the `sentence-transformers/multi-qa-mpnet-base-dot-v1` embedding model to identify topically similar sequential elements and combine them into chunks. This strategy is availably only when calling Unstructured.

   To learn more, see [Chunking strategies](/api-reference/legacy-api/partition/chunking) and [Chunking configuration](/open-source/ingestion/ingest-configuration/chunking-configuration).

5. For any chunking strategy other than **None**:

   * Enter your chunk size in the **Chunk size (characters)** box, or leave the default of **1000** characters.
   * If you need to apply overlapping to the chunks, enter the chunk overlap size in the **Chunk overlap (characters)** box, or leave default of **20** characters.

   To learn more, see [Chunking configuration](/open-source/ingestion/ingest-configuration/chunking-configuration).

6. To generate vector embeddings, select the provider in the **Embedding provider** drop-down list.

   To learn more, see [Embedding configuraton](/open-source/ingestion/ingest-configuration/embedding-configuration).

7. Click **Generate code**.

8. Copy the example code from the **Generated Code** pane into your code project.

9. The code example will contain one or more environment variables that you must set for the code to run correctly. To learn what to
   set these variables to, click the documentation links that are below the **Generated Code** pane.

## Pricing

Calls to the Unstructured CLI or Unstructured Ingest Python library that are routed to Unstructured's software-as-a-service (SaaS)
for processing (for example, by specifying an
Unstructured API key and an Unstructured SaaS URL) require an Unstructured account for billing purposes.

Unstructured offers different account types with different pricing plans:

* <Icon icon="person" />  **Let's Go** and **Pay-As-You-Go** - A single user, with a single workspace, hosted alongside other accounts on Unstructured's cloud infrastructure.
* <Icon icon="building" />  **Business** - Multiple users and workspaces, with three options:

  * <Icon icon="people-group" />  **Business SaaS** - Hosted alongside other accounts on Unstructured's cloud infrastructure.
  * <Icon icon="shield-halved" />  **Dedicated instance** - Hosted within a virtual private cloud (VPC) running inside Unstructured's cloud infrastructure. Dedicated instances are isolated from all other accounts, for additional security and control.
  * <Icon icon="shield" />  **In-VPC** - Hosted within your own VPC on your own cloud infrastructure.

  **Business** accounts also allow for robust customization of Unstructured's features for your unique needs.

For more details, see the [Unstructured Pricing](https://unstructured.io/pricing) page.

To upgrade your account from **Let's Go** or **Pay-As-You-Go** to **Business**,
email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

Some of these plans have billing details that are determined on a per-page basis.

Unstructured calculates a page as follows:

* For these file types, a page is a page, slide, or image: `.pdf`, `.pptx`, and `.tiff`.
* For `.docx` files that have page metadata, Unstructured calculates the number of pages based on that metadata.
* For all other file types, Unstructured calculates the number of pages as the file's size divided by 100 KB.
* For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.

## Learn more

* [Ingest configuration](/open-source/ingestion/ingest-configuration/overview) settings enable you to control how batches are sent and processed.
* [Source connectors](/open-source/ingestion/source-connectors/overview) enable you to send batches from local or remote locations to be ingested by Unstructured for processing.
* [Destination connectors](/open-source/ingestion/destination-connectors/overview) enable Unstructured to send the processed data to local or remote locations.

## See also

* The [Unstructured Pipelines](/pipelines/overview) enables you to send batches to Unstructured from remote locations, and to have Unstructured send the processed data to remote locations, all without using code or a CLI.