> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Notebooks

> Notebooks contain complete working sample code for end-to-end solutions.

<CardGroup cols={2}>
  <Card title="Unstructured API: Convert Documents to Stylized HTML" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Convert_Documents_to_Stylized_HTML_using_the_Unstructured_API.ipynb">
    <br />

    Partition documents into structured elements using our VLM-based partitioning strategy, then convert the extracted HTML metadata into beautifully stylized web documents for RAG pipelines, knowledge bases, and AI-powered applications.

    <br />

    `Unstructured API` `On-demand jobs` `Local files`

    <br />
  </Card>

  <Card title="Unstructured API: On-Demand Jobs Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_On_Demand_Jobs_Quickstart.ipynb">
    <br />

    Learn the basics of how to use the Unstructured API to run an on-demand job with a short-lived workflow that takes one or more local files as input.

    <br />

    `Unstructured API` `On-demand jobs` `Local files` `Workflow operations` `Workflow templates`

    <br />
  </Card>

  <Card title="Unstructured API: On-Demand Jobs Walkthrough" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_On_Demand_Jobs_Walkthrough.ipynb">
    <br />

    Learn how to use the Unstructured API to run an on-demand job with a short-lived workflow that takes one or more local files as input and applies basic partitioning and a few enrichments. Then, extend your knowledge of and skills
    with Unstructured by adding chunking, more enrichments, and embeddings to your on-demand jobs.

    <br />

    `Unstructured API` `On-demand jobs` `Local files` `Workflow operations` `Workflow templates`

    <br />
  </Card>

  <Card title="Adding a Memory Layer to RAG with Unstructured and Mem0" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/MemoryLayer_for_RAG.ipynb">
    <br />

    Learn how to build a personalized RAG system using Unstructured's document processing and Mem0's memory layer to create an AI assistant that remembers user preferences and adapts responses across sessions.

    <br />

    `Unstructured API` `Workflows` `RAG` `S3` `Weaviate` `Memory`

    <br />
  </Card>

  <Card title="Agentic Weekly AI News TL;DR" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Agentic-Weekly-AI-News-TLDR.ipynb">
    <br />

    Build an automated pipeline that scrapes the last 7 days of AI papers & posts (ArXiv, OpenAI, Anthropic, Hugging Face, DeepLearning.AI), processes them with Unstructured’s Hi Res partitioner for clean text, stores structured chunks in MongoDB, and generates both detailed summaries and an executive brief for a weekly newsletter.

    <br />

    `Unstructured API` `Workflows` `hi_res` `Scraping` `ArXiv` `OpenAI` `Anthropic` `Hugging Face` `DeepLearning.AI` `S3` `MongoDB` `Summarization`

    <br />
  </Card>

  <Card title="Graph RAG for Academic Papers" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/GraphRAG_for_Academic_Papers.ipynb">
    <br />

    Learn how to build a GraphRAG system for research papers using Unstructured's Named Entity Recognition to extract custom entities and relationships, then query them with Neo4j to answer complex questions that require understanding connections between models, datasets, and metrics.

    <br />

    `Unstructured API` `Workflows` `Graph RAG` `S3` `Neo4j`

    <br />
  </Card>

  <Card title="Unstructured API Walkthrough" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_Walkthrough.ipynb">
    <br />

    This walkthrough provides you with deep, hands-on experience with the Unstructured API. As you follow along, you will learn how to use many of the Unstructured API's features for partitioning, enriching, chunking, and embedding.

    <br />

    `Unstructured API` `Workflows` `Workflow Operations` `Local File`

    <br />
  </Card>

  <Card title="Everything (from) Everywhere All at Once" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_From_Multiple_Data_Sources_FileTypes.ipynb">
    <br />

    Set up an AI assistant that answers questions by querying your company's scattered data. Retrieve context from contracts in Azure, sales decks in OneDrive, and emails in Outlook through a single RAG pipeline.

    <br />

    `Unstructured API` `Workflows` `RAG` `Azure Blob Storage` `Outlook` `OneDrive` `AstraDB`

    <br />
  </Card>

  <Card title="Agentic RAG with Visually Grounded Answers and Visual Citations" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Agentic-RAG-with-Visually-Grounded-Answers-and-Visual-Citations.ipynb">
    <br />

    Learn how to build an AI-powered document processing system that extracts both text and images from PDFs in S3, generates intelligent descriptions for visual elements, and enables a searchable knowledge base that can answer questions about charts, diagrams, and product visuals.

    <br />

    `Unstructured API` `Workflows` `S3` `Image Processing` `Visual RAG` `Enterprise AI`

    <br />
  </Card>

  <Card title="Building a Hybrid RAG System: From Fragmented Data to Unified Intelligence" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Hybrid_RAG_Pipeline_with_ElasticSearch.ipynb">
    <br />

    Learn how to build a comprehensive hybrid RAG system that processes multiple data sources simultaneously - combining S3 PDFs and Elasticsearch records into a unified knowledge base for enterprise AI applications.

    <br />

    `Unstructured API` `Workflows` `S3` `Elasticsearch` `Hybrid RAG` `Enterprise AI`

    <br />
  </Card>

  <Card title="Dropbox-to-Pinecone Connector API Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb">
    <br />

    Learn how to set up and run a custom workflow that uses a free Dropbox storage location as a source and a free Pinecone serverless index as a destination, suitable for powering RAG applications.

    <br />

    `Unstructured API` `Workflows` `Dropbox` `Pinecone`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and PostgreSQL" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_PostgreSQL.ipynb">
    <br />

    Learn how to build data processing workflows using the Unstructured API and Python SDK to preprocess unstructured files from S3 and store the structured outputs in PostgreSQL for retrieval.

    <br />

    `Unstructured API` `Workflows` `S3` `PostgreSQL`

    <br />
  </Card>

  <Card title="Unstructured Partition Endpoint Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb">
    <br />

    This notebook calls the Unstructured Python SDK to have Unstructured process a local file by using the legacy Unstructured Partition Endpoint.

    <br />

    `Unstructured API` `Partition Endpoint` `Local file`

    <br />
  </Card>

  <Card title="Preserving Table Structure for Better Retrieval" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Preserving_Table_Structure_for_Better_Retrieval.ipynb">
    <br />

    This notebook explores using Unstructured API to process financial documents while preserving tabular structure in a way that's usable by downstream applications.

    <br />

    `Unstructured API` `Workflows` `S3`  `Astra DB`

    <br />
  </Card>

  <Card title="RAG without Embeddings" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Rag_without_Embeddings.ipynb">
    <br />

    Learn how to build a RAG pipeline without any embedding models. Use Unstructured to preprocess documents, index them into Elasticsearch, and retrieve using classic BM25 scoring.

    <br />

    `Unstructured API` `Workflows` `Elasticsearch` `BM25`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and Redis" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Redis.ipynb">
    <br />

    Learn how to build data processing workflows using the Unstructured API and Python SDK to preprocess unstructured files from S3 and store the structured outputs in Redis Cloud for retrieval.

    <br />

    `Unstructured API` `Workflows` `S3` `Redis`

    <br />
  </Card>

  <Card title="Create a S3 to Qdrant Pipeline using the Unstructured API" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/S3_to_Qdrant_Workflow_using_Unstructured_API.ipynb">
    <br />

    This notebook walks through using the Unstructured API's workflow operations to set up a complete pipeline that pulls documents from S3, processes them using Unstructured, and stores the resulting embeddings in Qdrant for fast vector search and retrieval.

    <br />

    `Unstructured API` `Workflows` `S3` `Qdrant` `VLM` `Embeddings`

    <br />
  </Card>

  <Card title="Two-stage retrieval: similarity search + rerankers" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Rag_with_Reranking.ipynb">
    <br />

    Improve RAG precision with a two-stage retrieval pipeline: fast vector search followed by reranking using Cohere’s re-ranker models.

    <br />

    `Unstructured API` `Workflows` `Cohere` `Pinecone`

    <br />
  </Card>

  <Card title="Create a S3 to MongoDB Pipeline using the Unstructured API" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/S3_to_MongoDB_Workflow_using_Unstructured_API.ipynb">
    <br />

    Learn how to build an end-to-end document processing pipeline that processes PDFs from S3 and stores structured results in MongoDB. Features VLM-powered partitioning, semantic chunking, and vector embeddings using the Unstructured Workflows API.

    <br />

    `Unstructured API` `Workflows` `S3` `MongoDB` `VLM` `Embeddings`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and IBM watsonx.data" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Azure_to_IBM_WatsonX.ipynb">
    <br />

    Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your IBM watsonx.data instance.

    <br />

    `Unstructured API` `Workflows` `Azure Blob Storage` `IBM watsonx.data`

    <br />
  </Card>

  <Card title="Using Unstructured with Snowflake Cortex Search for RAG" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Use_Unstructured_with_Snowflake_Cortex_for_RAG_Search.ipynb">
    <br />

    Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.

    <br />

    `Unstructured API` `Snowflake Cortex` `RAG Search` `Workflows` `S3`

    <br />
  </Card>

  <Card title="Agentic RAG with LangGraph and Together AI" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/AgenticRAG_with_LangGraph,TogetherAI.ipynb">
    <br />

    Build Agentic RAG with `LangGraph` and `Together AI` and compare the results with Vanilla RAG in pure Python

    <br />

    `Unstructured API` `Workflows` `Agents` `LangGraph` `Together AI` `Astra DB`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and Snowflake" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Snowflake.ipynb">
    <br />

    Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.

    <br />

    `Unstructured API` `Workflows` `Azure Blob Storage` `Snowflake`

    <br />
  </Card>

  <Card title="Building Graph-Based RAG Applications" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Building_Graph_Based_RAG_Applications_with_Unstructured_and_AstraDB.ipynb">
    <br />

    Learn how to use the Unstructured API to create a Graph RAG-based workflow that writes data with named entity recognition (NER) to your Astra DB.

    <br />

    `Unstructured API` `Workflows` `Graph RAG` `NER` `Astra DB`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and Delta Tables in Databricks" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Delta_Tables_in_Databricks.ipynb">
    <br />

    Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.

    <br />

    `Unstructured API` `Workflows` `Databricks` `S3`

    <br />
  </Card>

  <Card title="RAG for Online Documentation" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_for_documentation.ipynb">
    <br />

    Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.

    <br />

    `Unstructured API` `Workflows` `MongoDB`

    <br />
  </Card>

  <Card title="Unstructured API Workflow Operations Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_Workflow_Operations_Quickstart.ipynb">
    <br />

    Build an end-to-end workflow in Unstructured programmatically by using the Unstructured API's workflow operations.

    <br />

    `Unstructured API` `Workflows` `S3`

    <br />
  </Card>

  <Card title="RAG with Databricks Vector Search with Context from Multiple Sources" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Delta_Tables_Databricks_Multiple_Sources.ipynb">
    <br />

    Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.

    <br />

    `Databricks` `Introductory notebook`

    <br />
  </Card>

  <Card title="Agentic RAG with Hugging Face smolagents vs Vanilla RAG" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Agentic_RAG_with_HuggingFace_smolagents.ipynb">
    <br />

    Build Agentic RAG with `smolagents` library and compare the results with Vanilla RAG in pure Python

    <br />

    `GPT-4o` `smolagents` `Agents` `DataStax` `S3` `Advanced notebook`

    <br />
  </Card>

  <Card title="LLama3.2 RAG evaluation on unstructured text" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Llama3_2_RAG_evaluation_on_Unstructured_Text_via_VLM.ipynb">
    <br />

    Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain

    <br />

    `GPT-4o` `Ragas` `LangChain` `Llama3.2` `Pinecone` `S3` `Advanced notebook`

    <br />
  </Card>

  <Card title="Multimodal RAG: Enhancing RAG outputs with image results" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Multimodal_RAG_with_image_results.ipynb">
    <br />

    Process a file in S3 with Unstructured and return images in your RAG output

    <br />

    `S3` `FAISS` `GPT-4o-mini` `Advanced notebook`

    <br />
  </Card>

  <Card title="Quantitative Reasoning with tables inside PDFs" href="https://app.hex.tech/5e6b6e24-dead-4d3b-b9da-a9a7ad587b96/hex/cb595d8a-6eac-4e19-96ed-b1540e5c031c/draft/logic">
    <br />

    From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex

    <br />

    `Unstructured API` `Hex` `Advanced notebook`

    <br />
  </Card>

  <Card title="PII removal with GLiNER in unstructured data ETL" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/PII_removal_in_unstructured_data_ETL.ipynb">
    <br />

    Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.

    <br />

    `Unstructured API`  `PII` `GLiNER` `Advanced notebook`

    <br />
  </Card>

  <Card title="Custom metadata extraction and self-querying retrieval" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/custom_metadata_self_querying_rag_mongodb_unstructured_langgraph.ipynb">
    <br />

    Extract custom metadata, and enable metadata pre-filtering in your RAG.

    <br />

    `Unstructured API` `MongoDB`  `Metadata` `Advanced notebook`

    <br />
  </Card>

  <Card title="Selecting an embedding model for custom data" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Selecting_an_embedding_model_for_custom_data.ipynb">
    <br />

    End-to-end data processing pipeline using Unstructured Serverless API.

    <br />

    `Unstructured API` `Hugging Face` `Advanced notebook`

    <br />
  </Card>

  <Card title="RAG with PDFs, LangChain and Llama 3" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_Llama3_Unstructured_LangChain.ipynb">
    <br />

    A RAG system with the Llama 3 model from Hugging Face.

    <br />

    `Unstructured API`  `🤗 Hugging Face` `LangChain` `Llama 3` `Introductory notebook`
  </Card>

  <Card title="Unstructured data ETL from S3 to SingleStore DB" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_data_ETL_from_S3_to_SingleStore.ipynb">
    <br />

    Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.

    <br />

    `Unstructured API`  `SingleStoreDB` `AWS S3` `Introductory notebook`
  </Card>

  <Card title="Google Drive to DataStax Astra DB" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Google_Docs_to_Astra.ipynb">
    <br />

    Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API

    <br />

    `Unstructured API` `Google` `DataStax` `Introductory notebook`

    <br />
  </Card>

  <Card title="Weaviate RAG quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Weaviate_Quickstart_OpenAI.ipynb">
    <br />

    Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API

    <br />

    `Unstructured API` `OpenAI` `Weaviate` `Introductory notebook`

    <br />
  </Card>

  <Card title="Preprocess PDFs in AWS S3, load into Elasticsearch" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/S3_to_Elasticsearch_with_Unstructured.ipynb">
    <br />

    Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.

    <br />

    `Unstructured API` `AWS S3` `Elasticsearch` `Introductory notebook`

    <br />
  </Card>

  <Card title="Preprocess documents in Google Drive, load into Databricks Volume" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/GoogleDrive_to_Databricks_Connector.ipynb">
    <br />

    Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.

    <br />

    `Unstructured API` `Google Drive` `Databricks` `Introductory notebook`

    <br />
  </Card>

  <Card title="Source references in RAG responses" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_on_arXiv_papers_with_source_references.ipynb">
    <br />

    Add document source references to RAG responses based on documents metadata.

    <br />

    `Unstructured API` `RAG` `LangChain` `Intermediate notebook`

    <br />
  </Card>

  <Card title="Query processed PDF with HuggingChat" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/PDF_with_Unstructured_and_HuggingChat.ipynb">
    <br />

    Send a PDF to Unstructured for processing, and send a subset of the returned PDF's processed text to [HuggingChat](https://huggingface.co/chat/) for chatbot-style querying.

    <br />

    `Unstructured API`  `🤗 Hugging Face` `🤗 HuggingChat` `Introductory notebook`
  </Card>

  <Card title="Llama 3 Local RAG with emails" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Local_RAG_with_emails.ipynb">
    <br />

    Build a local RAG app for your emails with Unstructured, LangChain and Ollama.

    <br />

    `Unstructured API` `LangChain` `Ollama` `Llama 3` `Introductory notebook`
  </Card>

  <Card title="Building RAG With PowerPoint presentations" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Building_RAG_with_Powerpoint_presentations.ipynb">
    <br />

    A RAG solution that is based on PowerPoint files.

    <br />

    `Unstructured API`  `🤗 Hugging Face` `LangChain` `Llama 3` `Introductory notebook`
  </Card>

  <Card title="Synthetic test dataset generation" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_synthetic_test_data_with_Unstructured_GPT_4o_and_Ragas.ipynb">
    <br />

    Build a Synthetic Test Dataset for your RAG system in 5 easy steps

    <br />

    `Unstructured API` `GPT-4o` `Ragas` `LangChain` `Advanced notebook`

    <br />
  </Card>
</CardGroup>
