Streamlit is an open-source Python framework for data scientists and AI/ML engineers to deliver dynamic data apps with only a few lines of code.

Streamlit in Snowflake enables data scientists and Python developers to combine Streamlit’s component-rich, open-source Python library with the scale, performance and security of the Snowflake platform. Streamlit Python scripts can define user interface (UI) components such as filters, graphs, sliders, and more to interact with your data.

In this example, you use Snowsight in your Snowflake account to create a simple Streamlit app that uses Snowflake Cortex Search for RAG to ask natural-language questions about an existing table in your Snowflake account. This table contains data that was generated by Unstructured. Answers are returned in natural-language, chatbot-style format.

Prerequisites

  • A table in Snowflake that contains data that was generated by Unstructured. The target Snowflake table must have a column named EMBEDDINGS that will contains vector embeddings for the text in the table’s TEXT column. The following Streamlit example app assumes that the EMBEDDINGS column contains 1,024 vector embeddings and has a data type of VECTOR(FLOAT, 1024).

    To create this table, you can create a custom Unstructured workflow that uses any supported source connector along with the Snowflake destination connector. Then run the workflow to generate the data and then insert that generated data into the target Snowflake table.

    After the data is inserted into the target Snowflake table, you can run the following Snowflake SQL statement to generate the 1,024 vector embeddings for the text in the table’s TEXT column and then insert those generated vector embeddings into the table’s EMBEDDINGS column. The model specified here for generating the vector embeddings is the same one that is used by the Streamlit example app:

    UPDATE ELEMENTS
    SET EMBEDDINGS = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
        'snowflake-arctic-embed-l-v2.0', 
        TEXT
    );
    

    To learn how to run Snowflake SQL statements, see for example Querying data using worksheets.

  • You must have the appropriate privileges to create and use a Streamlit app in your Snowflake account. These privileges include ones for the target table’s parent database and schema as well as the Snowflake warehouse that runs the Streamlit app. For details, see Getting started with Streamlit in Snowflake.

Create and run the example app

1

Create the Streamlit app

  1. In Snowsight for your Snowflake account, on the sidebar, click Projects > Streamlit.
  2. Click + Streamlit App.
  3. For App title, enter a name for your app, such as Unstructured Demo Streamlit App.
  4. For App location, chose the target database and schema to store the app in.
  5. For App warehouse, choose the warehouse that you want to use to run your app and execute its queries.
  6. Click Create.
2

Add code to the Streamlit app

In this step, you add Python code to the Streamlit app that you created in the previous step.

This step explains each part of the code as you add it. If you want to skip past these explanations, add the code in the complete code example all at once, and then skip ahead to the next step, “Run the Streamlit app.”

  1. Import Python dependencies that get the current connection to the Snowflake database and schema and get Streamlit functions and features.

    from snowflake.snowpark.context import get_active_session
    import streamlit as st
    
  2. Get the current connection to the Snowflake database and schema.

    session = get_active_session() 
    
  3. Display the title of the app in the Streamlit UI, and get the user’s search query from the Streamlit UI.

    st.title("Snowflake Cortex Search for RAG with Data from Unstructured")
    
    query = st.text_input("Enter your search query:")
    
  4. Get the user’s search query and display a progress indicator in the UI.

    if query:
        with st.spinner("Embedding and retrieving..."):
    
  5. Use the user’s search query to get the top result from the ELEMENTS table. The ELEMENTS table contains the data that was generated by Unstructured. The code uses the SNOWFLAKE.CORTEX.EMBED_TEXT_1024 function to generate vector embeddings for the user’s search query and the VECTOR_COSINE_SIMILARITY function to get the similarity between the vector embeddings for the user’s search query and the vector embeddings for the TEXT column for each rown in the ELEMENTS table. The code then orders the results by similarity and limits the results to the row with the greatest similarity between the search query and the target text.

            top_result_df = session.sql(f"""
                WITH query_embedding AS (
                    SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
                        'snowflake-arctic-embed-l-v2.0', '{query}'
                    ) AS EMBED
                )
                SELECT 
                    e.TEXT,
                    VECTOR_COSINE_SIMILARITY(e.EMBEDDINGS, q.EMBED) AS similarity
                FROM ELEMENTS e, query_embedding q
                ORDER BY similarity DESC
                LIMIT 1
            """).to_pandas()
    
  6. Get the TEXT column from the top result and use it as context for the user’s search query.

            context = top_result_df["TEXT"][0]
    
  7. Use the user’s search query and the context from the top result to get a response from Snowflake Cortex Search for RAG. The code uses the SNOWFLAKE.CORTEX.COMPLETE function to generate a response to the user’s search query based on the context from the top result.

            completion_df = session.sql(f"""
                SELECT SNOWFLAKE.CORTEX.COMPLETE(
                    'snowflake-arctic',
                CONCAT('Context: ', $$ {context} $$, ' \\n\\nQuestion: {query}\\nAnswer:')
            ) AS RESPONSE
            """).to_pandas()
    
  8. Display the generated response in the Streamlit UI.

            st.write("Answer:")
            st.write(completion_df["RESPONSE"][0])
    
3

Run the Streamlit app

  1. In the upper right corner, click Run.
  2. For Enter your search query, enter some natural-language question about the TEXT column in the table.
  3. Press Enter.

Snowflake Cortex Search for RAG returns its answer to your question in natural-language, chatbot-style format.

Complete code example

The full code example for the Streamlit app is as follows:

from snowflake.snowpark.context import get_active_session
import streamlit as st

session = get_active_session()

st.title("Snowflake Cortex Search for RAG with Data from Unstructured")

query = st.text_input("Enter your search query:")

if query:
    with st.spinner("Embedding and retrieving..."):

        top_result_df = session.sql(f"""
            WITH query_embedding AS (
                SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
                    'snowflake-arctic-embed-l-v2.0', '{query}'
                ) AS EMBED
            )
            SELECT 
                e.TEXT,
                VECTOR_COSINE_SIMILARITY(e.EMBEDDINGS, q.EMBED) AS similarity
            FROM ELEMENTS e, query_embedding q
            ORDER BY similarity DESC
            LIMIT 1
        """).to_pandas()

        context = top_result_df["TEXT"][0]

        completion_df = session.sql(f"""
            SELECT SNOWFLAKE.CORTEX.COMPLETE(
                'snowflake-arctic',
                CONCAT('Context: ', $$ {context} $$, ' \\n\\nQuestion: {query}\\nAnswer:')
            ) AS RESPONSE
        """).to_pandas()

        st.write("Answer:")
        st.write(completion_df["RESPONSE"][0])

Additional resources