To install the Unstructured open source library on a local development machine, run one or more of the following commands. These commands assume that you are using the Python package and project manager uv, running within an activated venv virtual environment that was created withDocumentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
uv. However, uv and venv are not required.
To work with all supported file types, run:
.txt), HTML files (.html), XML files (.xml), and emails (.eml, .msg, and .p7s) by default.
To further conserve disk space and reduce code dependencies, you can run the following command instead, replacing <extra> with the appropriate extra for the target file type:
all-docs(for all supported file types in this list)csv(for.csvfiles only)docx(for.docand.docxfiles only)epub(for.epubfiles only)image(for all supported image file types:.bmp,.heic,.jpeg,.png, and.tiff)md(for.mdfiles only)odt(for.odtfiles only)org(for.orgfiles only)pdf(for.pdffiles only)pptx(for.pptand.pptxfiles only)rst(for.rstfiles only)rtf(for.rtffiles only)tsv(for.tsvfiles only)xlsx(for.xlsand.xlsxfiles only)
- libmagic-dev (for filetype detection)
- poppler-utils and tesseract-ocr (for images and PDFs), and
tesseract-lang(for additional language support) - libreoffice (for Microsoft Office documents)
- pandoc (for
.epub,.odt, and.rtffiles. For.rtffiles, you must have version 2.14.2 or newer. Running this script will install the correct version for you.)

