Platform is currently in private beta. Click here to join the waitlist.

What We Do

The Unstructured Platform is a no-code platform for transforming unstructured data to RAG-ready data.

To get your data RAG-ready our platform moves it through the following process:

1

Connect

We offer multiple Source Connectors. We can connect to your data in its existing location.

2

Route

Routing determines which strategy we will employ in transforming your document to our canonical JSON schema. There are three Partioning Strategies for document transformation, fast, hires, or ocr_only. fast is great for when there is extractable text available, like in HTML files or in the Microsoft Office Document format. hires is best for PDFs and tables and where accurate classification of document elements is critical. ocr_only is useful when dealing with image-based files or PDFs that do not have extractable text. If you’re unsure, select auto and we’ll handle the decision for you.

3

Transform

Your source document is transformed to our canonical JSON schema. Irrespective of the input document, the JSON schema we provide gives you a standardized output to code against. It contains 20+ elements, such as Header, Footer, Title, NarrativeText, Table, Image, and more. Each document is wrapped in extensive metadata so you can understand languages, file_type, source, hierarchy and much more.

4

Chunk

Initially platform comes with two chunking strategies. Basic: Combines sequential elements up to specified size limits. Oversized elements are split, while tables are isolated and divided if necessary. Overlap between chunks is optional. By Title: Semantic chunking, understands the layout of the document and makes intelligent splits.

5

Embed

Call out to third party embedding providers, Open AI, AWS Bedrock, and Octo ML.

6

Persist

We have multiple Destination Connectors. Including all major vector databases.

How We Do It

To simplify this process and provide it as a no-code solution, platform consists of 4 key concepts:

  1. Source Connectors to ingest your data.
  2. Destination Connectors tell our system where to write your transformed data too..
  3. Workflows connect sources to destinations and provide chunking, embedding, and scheduling options.
  4. Jobs allow you to monitor data transformation progress.

Compliance

The platform is designed for global reach with SOC 2 type 2 compliance. It has support for over 50 languages.

Sign-Up

You can sign-up here to our private beta.