VectorShift is an integrated framework of no-code, low-code, and out of the box generative AI solutions to build AI search engines, assistants, chatbots, and automations.

VectorShift’s platform allows you to design, prototype, build, deploy, and manage generative AI workflows and automation across two interfaces: no-code and code SDK. This hands-on demonstration uses the no-code interface to walk you through creating a VectorShift pipeline project. This project enables you to use GPT-4o-mini to chat in real time with a PDF document that is processed by Unstructured and has its processed data stored in a Pinecone vector database.

Prerequisites

  • A Pinecone account. Get an account.

  • A Pinecone API key. Get an API key.

  • A Pinecone serverless index. Create a serverless index.

    An existing index is not required. At runtime, the index behavior is as follows:

    For the Unstructured UI and Unstructured API:

    • If an existing index name is specified, and Unstructured generates embeddings, but the number of dimensions that are generated does not match the existing index’s embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing index’s embedding settings to match, and try the run again.
    • If an index name is not specified, Unstructured creates a new index in your Pinecone account. If Unstructured generates embeddings, the new index’s name will be u<short-workflow-id>-<short-embedding-model-name>-<number-of-dimensions>. If Unstructured does not generate embeddings, the new index’s name will be u<short-workflow-id.

    For Unstructured Ingest:

    • If an existing index name is specified, and Unstructured generates embeddings, but the number of dimensions that are generated does not match the existing index’s embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing index’s embedding settings to match, and try the run again.
    • If an index name is not specified, Unstructured creates a new index in your Pinecone account. The new index’s name will be unstructuredautocreated.

    If you create a new index or use an existing one, Unstructured recommends that all records in the target index have a field named record_id with a string data type. Unstructured can use this field to do intelligent document overwrites. Without this field, duplicate documents might be written to the index or, in some cases, the operation could fail altogether.

Also:

  • Sign up for an OpenAI account, and get your OpenAI API key.

  • Sign up for a VectorShift Starter account.

  • Sign up for an Unstructured account:

    If you signed up for Unstructured through the For Enterprise page, or if you are using a self-hosted deployment of Unstructured, the following information about signing up, signing in, and getting your Unstructured API key might apply differently to you. For details, contact Unstructured Sales at sales@unstructured.io.

    1. Go to https://platform.unstructured.io and use your email address, Google account, or GitHub account to sign up for an Unstructured account (if you do not already have one) and sign into the account at the same time. The Unstructured user interface (UI) appears.

    2. Get your Unstructured API key:

      a. In the Unstructured UI, click API Keys on the sidebar.
      b. Click Generate API Key.
      c. Follow the on-screen instructions to finish generating the key.
      d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.

    By following the preceding instructions, you are signed up for a Developer pay per page account by default.

    To save money, consider switching to a Subscribe & Save account instead. To save even more money, consider switching to an Enterprise account instead.

Create and run the demonstration project

1

Get source data into Pinecone

Although you can use any supported file type or data in any supported source type for the input into Pinecone, this demonstration uses the text of the United States Constitution in PDF format.

  1. Sign in to your Unstructured account.
  2. Create a source connector, if you do not already have one, to connect Unstructured to the source location where the PDF file is stored.
  3. Create a Pinecone destination connector, if you do not already have one, to connect Unstructured to your Pinecone serverless index.
  4. Create a workflow that references this source connector and destination connector.
  5. Run the workflow.
2

Create the VectorShift project

  1. Sign in to your VectorShift account dashboard.
  2. On the sidebar, click Pipelines.
  3. Click New.
  4. Click Create Pipeline from Scratch.

3

Add the Input node

In this step, you add a node to the pipeline. This node takes user-supplied chat messages and sends them as input to Pinecone, and as input to a text-based LLM, for contextual searching.

In the top pipeline node chooser bar, on the General tab, click Input.

4

Add the Pinecone node

In this step, you add a node that connects to the Pinecone serverless index.

  1. In the top pipeline node chooser bar, on the Integrations tab, click Pinecone.

  2. In the Pinecone node, for Embedding Model, select openai/text-embedding-3-large.

  3. Click Connected Account.

  4. In the Select Pinecone Account dialog, click Connect New.

  5. Enter the API Key and Region for your Pinecone serverless index, and then click Save.

  6. For Index, selet the name of your Pinecone serverless index.

  7. Connect the input_1 output from the Input node to the query input in the Pinecone node.

    To make the connection, click and hold your mouse pointer inside of the circle next to input_1 in the Input node. While holding your mouse pointer, drag it over into the circle next to query in the Pinecone node. Then release your mouse pointer. A line appears between these two circles.

5

Add the OpenAI LLM node

In this step, you add a node that builds a prompt and then sends it to a text-based LLM.

  1. In the top pipeline node chooser bar, on the LLMs tab, click OpenAI.

  2. In the OpenAI LLM node, for System, enter the following text:

    Answer the Question based on Context. Use Memory when relevant.
    

    To answer the question, the preceding prompt uses the context along with general information that the text-based LLM is trained on. To use only the context to answer the question, you can change the prompt, for example to something like this:

    Answer the Question based only on the Context. Do not use any other sources of
    information. If the context does not provide enough information to answer the 
    question, reply with 'I do not have enough context to answer the question.' 
    Use Memory when relevant.
    
  3. For Prompt, enter the following text:

    Question: {{Question}}
    Context: {{Context}}
    Memory: {{Memory}}
    
  4. For Model, select gpt-4o-mini.

  5. Check the box titled Use Personal API Key.

  6. For API Key, enter your OpenAI API key.

  7. Connect the input_1 output from the Input node to the Question input in the OpenAI LLM node.

  8. Connect the output output from the Pinecone node to the Context input in the OpenAI LLM node.

6

Add the Chat Memory node

In this step, you add a node that adds chat memory to the session.

  1. In the top pipeline node chooser bar, on the Chat tab, click Chat Memory.
  2. Connect the output from the Chat Memory node to the Memory input in the OpenAI LLM node.

7

Add the Output node

In this step, you add a node that displays the chat output.

  1. In the top pipeline node chooser bar, on the General tab, click Output.
  2. Connect the response output from the OpenAI LLM node to the input in the Output node.

8

Run the project

  1. In the upper corner of the pipeline designer, click the play (Run Pipeline) button.

  2. In the chat pane, on the Chatbot tab, enter a question into the Message Assistant box, for example, What rights does the fifth amendment guarantee? Then press the send button.

  3. Wait until the answer appears.

  4. Ask as many additional questions as you want to.

Learn more

See the VectorShift documentation.