To install the Unstructured open source library on a local development machine, run one or more of the following commands.

These commands assume that you are using the Python package and project manager uv, running within an activated venv virtual environment that was created with uv. However, uv and venv are not required.

To work with all supported file types, run:

uv add "unstructured[all-docs]"

To conserve disk space and reduce code dependencies, you can run the following command instead to work with a default set of supported file types:

uv add unstructured

The preceding command supports plain text files (.txt), HTML files (.html), XML files (.xml), and emails (.eml, .msg, and .p7s) by default.

To further conserve disk space and reduce code dependencies, you can run the following command instead, replacing <extra> with the appropriate extra for the target file type:

uv add "unstructured[<extra>]"

The following file type extras are available:

  • all-docs (for all supported file types in this list)
  • csv (for .csv files only)
  • docx (for .doc and .docx files only)
  • epub (for .epub files only)
  • image (for all supported image file types: .bmp, .heic, .jpeg, .png, and .tiff)
  • md (for .md files only)
  • odt (for .odt files only)
  • org (for .org files only)
  • pdf (for .pdf files only)
  • pptx (for .ppt and .pptx files only)
  • rst (for .rst files only)
  • rtf (for .rtf files only)
  • tsv (for .tsv files only)
  • xlsx (for .xls and .xlsx files only)

Note that you can install multiple extras at the same time by separating them with commas, for example:

uv add "unstructured[pdf,docx]"

For maximum compatiblity, you should also install the following system dependencies:

  • libmagic-dev (for filetype detection)
  • poppler-utils and tesseract-ocr (for images and PDFs), and tesseract-lang (for additional language support)
  • libreoffice (for Microsoft Office documents)
  • pandoc (for .epub, .odt, and .rtf files. For .rtf files, you must have version 2.14.2 or newer. Running this script will install the correct version for you.)

Installation instructured for these system dependencies vary by operating system type. For details, follow the preceding links or see your operating system’s documentation.