Prerequisites
- A Pinecone account. Get an account.
- A Pinecone API key. Get an API key.
-
A Pinecone serverless index. Create a serverless index.
An existing index is not required. At runtime, the index behavior is as follows:
For the Unstructured UI and Unstructured API:
- If an existing index name is specified, and Unstructured generates embeddings, but the number of dimensions that are generated does not match the existing index’s embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing index’s embedding settings to match, and try the run again.
- If an index name is not specified, Unstructured creates a new index in your Pinecone account. If Unstructured generates embeddings,
the new index’s name will be
u<short-workflow-id>-<short-embedding-model-name>-<number-of-dimensions>
. If Unstructured does not generate embeddings, the new index’s name will beu<short-workflow-id
.
- If an existing index name is specified, and Unstructured generates embeddings, but the number of dimensions that are generated does not match the existing index’s embedding settings, the run will fail. You must change your Unstructured embedding settings or your existing index’s embedding settings to match, and try the run again.
- If an index name is not specified, Unstructured creates a new index in your Pinecone account. The new index’s name will be
unstructuredautocreated
.
If you create a new index or use an existing one, Unstructured recommends that all records in the target index have a field namedrecord_id
with a string data type. Unstructured can use this field to do intelligent document overwrites. Without this field, duplicate documents might be written to the index or, in some cases, the operation could fail altogether. - Within a Pinecone serverless index, custom namespaces are supported but are not required.
- Sign up for an OpenAI account, and get your OpenAI API key.
- Sign up for a VectorShift Starter account.
-
If you do not already have an Unstructured account, sign up for one:
-
If you do not already have an Unstructured account, sign up for free.
After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io.
To sign up for a Team or Enterprise account instead, contact Unstructured Sales, or learn more.
-
If you have an Unstructured Starter or Team account and are not already signed in, sign in to your account at https://platform.unstructured.io.
For an Enterprise account, see your Unstructured account administrator for instructions, or email Unstructured Support at support@unstructured.io.
-
Get your Unstructured API key:
a. After you sign in to your Unstructured Starter account, click API Keys on the sidebar.
b. Click Generate API Key.For a Team or Enterprise account, before you click API Keys, make sure you have selected the organizational workspace you want to create an API key for. Each API key works with one and only one organizational workspace. Learn more.
c. Follow the on-screen instructions to finish generating the key.
d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
-
If you do not already have an Unstructured account, sign up for free.
After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io.
Create and run the demonstration project
1
Get source data into Pinecone
Although you can use any supported file type or data in any
supported source type for the input into Pinecone, this demonstration uses the text of the United States Constitution in PDF format.
- If you are not already signed in, sign in to your Unstructured account.
- Create a source connector, if you do not already have one, to connect Unstructured to the source location where the PDF file is stored.
- Create a Pinecone destination connector, if you do not already have one, to connect Unstructured to your Pinecone serverless index.
- Create a workflow that references this source connector and destination connector.
- Run the workflow.
2
Create the VectorShift project
- Sign in to your VectorShift account dashboard.
- On the sidebar, click Pipelines.
- Click New.
-
Click Create Pipeline from Scratch.
3
Add the Input node
In this step, you add a node to the pipeline. This node takes user-supplied chat messages and sends them as input to Pinecone, and as input to a text-based LLM, for contextual searching.In the top pipeline node chooser bar, on the General tab, click Input.

4
Add the Pinecone node
In this step, you add a node that connects to the Pinecone serverless index.
- In the top pipeline node chooser bar, on the Integrations tab, click Pinecone.
- In the Pinecone node, for Embedding Model, select openai/text-embedding-3-large.
- Click Connected Account.
- In the Select Pinecone Account dialog, click Connect New.
- Enter the API Key and Region for your Pinecone serverless index, and then click Save.
- For Index, selet the name of your Pinecone serverless index.
-
Connect the input_1 output from the Input node to the query input in the Pinecone node.
To make the connection, click and hold your mouse pointer inside of the circle next to input_1 in the Input node.
While holding your mouse pointer, drag it over into the circle next to query in the Pinecone node. Then
release your mouse pointer. A line appears between these two circles.
5
Add the OpenAI LLM node
In this step, you add a node that builds a prompt and then sends it to a text-based LLM.
- In the top pipeline node chooser bar, on the LLMs tab, click OpenAI.
-
In the OpenAI LLM node, for System, enter the following text:
To answer the question, the preceding prompt uses the context along with general information that the text-based LLM is trained on. To use only the context to answer the question, you can change the prompt, for example to something like this:
-
For Prompt, enter the following text:
- For Model, select gpt-4o-mini.
- Check the box titled Use Personal API Key.
- For API Key, enter your OpenAI API key.
- Connect the input_1 output from the Input node to the Question input in the OpenAI LLM node.
-
Connect the output output from the Pinecone node to the Context input in the OpenAI LLM node.
6
Add the Chat Memory node
In this step, you add a node that adds chat memory to the session.
- In the top pipeline node chooser bar, on the Chat tab, click Chat Memory.
-
Connect the output from the Chat Memory node to the Memory input in the OpenAI LLM node.
7
Add the Output node
In this step, you add a node that displays the chat output.
- In the top pipeline node chooser bar, on the General tab, click Output.
-
Connect the response output from the OpenAI LLM node to the input in the Output node.
8
Run the project
-
In the upper corner of the pipeline designer, click the play (Run Pipeline) button.
-
In the chat pane, on the Chatbot tab, enter a question into the Message Assistant box, for example,
What rights does the fifth amendment guarantee?
Then press the send button. - Wait until the answer appears.
- Ask as many additional questions as you want to.