Historical research about MLK with the Unstructured API


This notebook explores how you can use Unstructured to gather and process declassified historical records surrounding the assassination of Dr. Martin Luther King, Jr. These processed documents can then be analyzed by using Elasticsearch and RAG.

Unstructured API Workflows S3 VLM NER Elasticsearch MLK National Archives

Create a S3 to Qdrant Pipeline using the Unstructured API


This notebook walks through using the Unstructured Workflow Endpoint to set up a complete pipeline that pulls documents from S3, processes them using Unstructured, and stores the resulting embeddings in Qdrant for fast vector search and retrieval.
Unstructured API Workflows S3 Qdrant VLM Embeddings

Create a S3 to MongoDB Pipeline using the Unstructured API


Learn how to build an end-to-end document processing pipeline that processes PDFs from S3 and stores structured results in MongoDB. Features VLM-powered partitioning, semantic chunking, and vector embeddings using the Unstructured Workflows API.
Unstructured API Workflows S3 MongoDB VLM Embeddings

Getting Started with Unstructured API and IBM watsonx.data


Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your IBM watsonx.data instance.
Unstructured API Workflows Azure Blob Storage IBM watsonx.data

Using Unstructured with Snowflake Cortex Search for RAG


Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.
Unstructured API Snowflake Cortex RAG Search Workflows S3

Agentic RAG with LangGraph and Together AI


Build Agentic RAG with LangGraph and Together AI and compare the results with Vanilla RAG in pure Python
Unstructured API Workflows Agents LangGraph Together AI Astra DB

Getting Started with Unstructured API and Snowflake


Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.
Unstructured API Workflows Azure Blob Storage Snowflake

Building Graph-Based RAG Applications


Learn how to use the Unstructured API to create a Graph RAG-based workflow that writes data with named entity recognition (NER) to your Astra DB.
Unstructured API Workflows Graph RAG NER Astra DB

Getting Started with Unstructured API and Delta Tables in Databricks


Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.
Unstructured API Workflows Databricks S3

RAG for Online Documentation


Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.
Unstructured API Workflows MongoDB

Unstructured Workflow Endpoint Quickstart


Build an end-to-end workflow in Unstructured programmatically by using the Unstructured Workflow Endpoint.
Unstructured API Workflows S3

RAG with Databricks Vector Search with Context from Multiple Sources


Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.
Databricks Introductory notebook

Agentic RAG with Hugging Face smolagents vs Vanilla RAG


Build Agentic RAG with smolagents library and compare the results with Vanilla RAG in pure Python
GPT-4o smolagents Agents DataStax S3 Advanced notebook

LLama3.2 RAG evaluation on unstructured text


Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain
GPT-4o Ragas LangChain Llama3.2 Pinecone S3 Advanced notebook

Multimodal RAG: Enhancing RAG outputs with image results


Process a file in S3 with Unstructured and return images in your RAG output
S3 FAISS GPT-4o-mini Advanced notebook

Quantitative Reasoning with tables inside PDFs


From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex
Unstructured API Hex Advanced notebook

PII removal with GLiNER in unstructured data ETL


Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.
Unstructured API PII GLiNER Advanced notebook

Custom metadata extraction and self-querying retrieval


Extract custom metadata, and enable metadata pre-filtering in your RAG.
Unstructured API MongoDB Metadata Advanced notebook

Selecting an embedding model for custom data


End-to-end data processing pipeline using Unstructured Serverless API.
Unstructured API Hugging Face Advanced notebook

RAG with PDFs, LangChain and Llama 3


A RAG system with the Llama 3 model from Hugging Face.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebook

Unstructured data ETL from S3 to SingleStore DB


Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.
Unstructured API SingleStoreDB AWS S3 Introductory notebook

Google Drive to DataStax Astra DB


Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API
Unstructured API Google DataStax Introductory notebook

Weaviate RAG quickstart


Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API
Unstructured API OpenAI Weaviate Introductory notebook

Preprocess PDFs in AWS S3, load into Elasticsearch


Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.
Unstructured API AWS S3 Elasticsearch Introductory notebook

Preprocess documents in Google Drive, load into Databricks Volume


Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.
Unstructured API Google Drive Databricks Introductory notebook

Source references in RAG responses


Add document source references to RAG responses based on documents metadata.
Unstructured API RAG LangChain Intermediate notebook

Query processed PDF with HuggingChat


Send a PDF to Unstructured for processing, and send a subset of the returned PDF’s processed text to
HuggingChat for chatbot-style querying.
Unstructured API 🤗 Hugging Face 🤗 HuggingChat Introductory notebook

Llama 3 Local RAG with emails


Build a local RAG app for your emails with Unstructured, LangChain and Ollama.
Unstructured API LangChain Ollama Llama 3 Introductory notebook

Building RAG With PowerPoint presentations


A RAG solution that is based on PowerPoint files.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebook

Synthetic test dataset generation


Build a Synthetic Test Dataset for your RAG system in 5 easy steps
Unstructured API GPT-4o Ragas LangChain Advanced notebook