> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Process a subset of files

<Note>
  The following information applies only to the [Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli) and the [Unstructured Ingest Python library](/open-source/ingestion/python-ingest).

  The Unstructured SDKs for Python and JavaScript/TypeScript and the Unstructured open-source library do not support this functionality.
</Note>

## Task

You want to process only files with specified extensions, only files at or below a specified size, or both.

## Approach

For the Ingest CLI, use the following command options. For the Ingest Python library, use the following parameters for the `FiltererConfig` object.

* Use `--file-glob` (CLI) or `file_glob` (Python) to specify the list of file extensions to process.
* Use `--max-file-size` (CLI) or `max_file_size` (Python) to specify the maximum size of files to process, in bytes.

## To run this example

The following example processes only `.pdf` and `.eml` files that have a file size of 100 KB or less. To run this example, you should have a directory
with a mixture of files, including at least one `.pdf` file and one `.eml` file, and with at least one of these files having a file size of 100 KB or less.

## Code

<CodeGroup>
  ```bash CLI Ingest v2 theme={null}
  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --file-glob "*.pdf,*.eml" \
      --max-file-size 100000 \
      --partition-by-api \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --api-key $UNSTRUCTURED_API_KEY \
      --strategy hi_res
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig,
      LocalUploaderConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.filter import FiltererConfig

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          filterer_config=FiltererConfig(
              file_glob=["*.pdf","*.eml"],
              max_file_size=100000
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "unique_element_ids": True,
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>
