Unstructured Foundation is an Early Access product. Unstructured must accept you into the Early Access program
before you can begin using Foundation.
Add your name to the Early Access interest list.Unstructured is currently accepting a limited number of participants into the Early Access program at its discretion, with
plans to expand access to a more general audience in the future.The following information is being provided to give you an advance preview of possible future functionality. Unstructured
makes no express claims as to what extent this information will reflect the state of the product upon initial
general availability.
What problems does Foundation solve?
AI tools such as Claude, Codex, Cursor, and Microsoft Copilot can reason over public information. They cannot see your organization’s private knowledge: your Google Drive folders, Microsoft SharePoint sites, Confluence spaces, Slack conversations, and Amazon S3 buckets. Every session requires the same workaround: upload files, paste excerpts, and re-attach context. The assistant forgets everything when the session ends. That cost compounds. Knowledge never accumulates, and context windows degrade as the volume of source files grows. Foundation eliminates those workarounds. It processes your files once, maintains a persistent enrichment index, and exposes it through every AI tool your team uses.
How Foundation works
Foundation connects to your data sources through sources. A source is a configured connection to a data source such as Google Drive, SharePoint, S3, Confluence, or Slack. Once a source is set up, Foundation indexes the files it finds there and keeps that index current as files change. Foundation calls this the enrichment index (the structured, persistent representation of your files). AI tools query it without reprocessing raw files on every request. This keeps retrieval fast and cost-efficient at enterprise scale. The enrichment index is exposed to AI tools through a global catalog (a unified view across all connected sources). Any MCP-compatible tool can query the global catalog directly, with no extra configuration.Key benefits of Foundation
- Your files stay where they are. Foundation never copies or moves your source files. It stores a full text representation alongside named entities, topics, and short summaries.
- Process once, query forever. Each file is processed at ingest time using Unstructured’s full extraction pipeline: layout-aware parsing, table extraction, and structured output. Every query returns the same quality without reprocessing.
- Persistent across sessions and tools. The enrichment index does not reset when a session ends. The same catalog is available from any MCP-compatible AI tool (such as Claude, Codex, Cursor, and Copilot) without reconfiguration.
- Stays current automatically. New files added to connected sources are discovered and indexed in the background. No action is required from you.
- Scales to enterprise volumes. Foundation delivers summaries and enriched metadata to your AI tool rather than raw file dumps. This avoids the context window degradation that direct file-attachment approaches suffer at large scale.
- Permissions are respected. Files you cannot access in a source system do not surface in your results. Foundation checks source permissions at query time.
- Save on token costs. Better retrieval will lead to more efficient AI token usage.
Foundation versus built-in chatbot features
Most AI tools let you upload files to a conversation. In Claude Desktop, this is + > Add files or photos. That pattern works for quick questions but breaks down at enterprise scale.| AI tool conversation uploads | Foundation | |
|---|---|---|
| Persistence | Resets when the session ends | Permanent enrichment index that never resets |
| File count | 10–20 files per session or message | Unlimited (entire file libraries) |
| Supported formats | ~10 common types | 50+ formats |
| Context pressure | Uploaded files consume context tokens; oldest content gets dropped | Summaries and metadata keep context usage low |
| Freshness | Snapshot at upload time | Auto-syncs as source files change |
| Cross-tool availability | Manual re-upload per tool | One index, any MCP-compatible tool |
The Foundation MCP Server versus built-in source connectors
Many AI tools offer built-in source connectors. In Claude Desktop, this is Customize > Connectors > Add connector. These include integrations with sources such as Google Drive or Microsoft SharePoint. These connectors pass a snapshot of your files to the AI on demand. The approach has structural limits.| AI tool source connectors | Foundation MCP server | |
|---|---|---|
| Data sources | One source per connector | All sources through one catalog |
| Tool compatibility | Reconfigured per AI tool | One setup, any MCP-compatible tool |
| File processing | Raw files parsed at inference time | VLM parsing, table extraction, and layout-aware chunking; stored as structured JSON permanently |
| Corpus operations | Per-file retrieval only | Topics, entities, and keywords indexed across the entire corpus; results available before full processing completes |
| Context window usage | Full files fill the window | Summaries and metadata keep context lean |
| Freshness | Snapshot at connection time | Auto-syncs as source files change |
| File format support | Formats the AI tool supports out of the box | 50+ formats via Unstructured |
Foundation versus Unstructured Pipelines and its API
Unstructured Pipelines and its API are pipeline tools. You configure a workflow: sources, destinations, and transformation steps. Jobs then run on a schedule or on demand and move processed files to a vector database, a file storage location, or other destination. Your application then queries that destination. Foundation is different in purpose and design.| Unstructured Pipelines and its API | Foundation | |
|---|---|---|
| Goal | Transform files and write results to a destination | Index files once and expose them to AI tools persistently |
| Setup | Configure sources, destinations, steps, and schedules | Connect sources (no workflow to design) |
| Output | Data written to your chosen destination system | Enrichment index queried via the Foundation MCP Server |
| Processing model | On-demand (jobs run per schedule or request) | Process once; queryable forever |
| Query interface | Your application queries the destination | AI tools query Foundation directly via MCP |
| Best for | Engineers building RAG pipelines and data products | Teams who want AI tools with persistent file access |
| Underlying engine | Unstructured’s full extraction pipeline | Same engine (identical parsing quality) |
Security and data privacy
Foundation is built on data minimization. Your source files never leave their original location. Foundation stores only the enriched representations it needs to answer queries, not the files themselves.What Foundation stores
Foundation stores the enriched representations it needs to answer queries: named entities, topics, summaries, and complete file text. Your source files remain at rest in their original connected locations.Data deletion
When you remove a source, you can delete the associated indexed data. Your source files are not affected.User isolation
Each user’s data is stored in an isolated database. The MCP server verifies your identity on every tool call. Your indexed data is never accessible to other users, and theirs is never accessible to you.Source authorization
Foundation connects to your data sources using industry-standard OAuth. You authorize each source through that service’s standard OAuth flow, the same process you use for any other app. Source-level permissions are enforced at query time: files you cannot access in the source do not appear in your results.Next steps
- Get started with Foundation: Install Foundation and connect your first file source location.
- Foundation sources: Learn about available file sources and how to manage connections.
- The Foundation MCP server: Understand how AI tools query your enrichment index.
Questions? Need help?
- For general questions about Unstructured products and pricing, email Unstructured Sales at sales@unstructured.io.
- For technical support, see request support.

