> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Streamlit in Snowflake

[Streamlit](https://streamlit.io/) is an open-source Python framework for data scientists and AI/ML engineers to
deliver dynamic data apps with only a few lines of code.

[Streamlit in Snowflake](https://www.snowflake.com/en/product/features/streamlit-in-snowflake/) enables data scientists
and Python developers to combine Streamlit's component-rich, open-source Python library with the scale, performance and
security of the Snowflake platform. Streamlit Python scripts can define user interface (UI) components such as
filters, graphs, sliders, and more to interact with your data.

In this example, you use Snowsight in your Snowflake account to create a simple Streamlit app that uses
[Snowflake Cortex Search for RAG](https://docs.snowflake.com/user-guide/snowflake-cortex/cortex-search/cortex-search-overview)
to ask natural-language questions about an existing table in your Snowflake account. This table
contains data that was generated by Unstructured. Answers are returned in natural-language,
chatbot-style format.

## Prerequisites

* A table in Snowflake that contains data that was generated by Unstructured. The
  target Snowflake table must have a column named `EMBEDDINGS` that will contains vector embeddings for the text in the table's `TEXT` column.
  The following Streamlit example app assumes that the `EMBEDDINGS` column contains 1,024 vector embeddings and has a data type of `VECTOR(FLOAT, 1024)`.

  To create this table, you can [create a custom Unstructured workflow](/pipelines/workflows#create-a-custom-workflow)
  that uses any supported [source connector](/pipelines/sources/overview) along with the
  [Snowflake destination connector](/pipelines/destinations/snowflake). Then
  [run](/pipelines/workflows#edit%2C-delete%2C-or-run-a-workflow) the workflow to
  generate the data and then insert that generated data into the target Snowflake table.

  After the data is inserted into the target Snowflake table, you can run the following Snowflake SQL statement to
  generate the 1,024 vector embeddings for the text in the table's `TEXT` column and then insert those generated vector
  embeddings into the table's `EMBEDDINGS` column. The model specified here for generating the vector embeddings is the
  same one that is used by the Streamlit example app:

  ```sql theme={null}
  UPDATE ELEMENTS
  SET EMBEDDINGS = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
      'snowflake-arctic-embed-l-v2.0', 
      TEXT
  );
  ```

  To learn how to run Snowflake SQL statements, see for example
  [Querying data using worksheets](https://docs.snowflake.com/user-guide/ui-snowsight-query).

* You must have the appropriate privileges to create and use a Streamlit app in your Snowflake account. These
  privileges include ones for the target table's parent database and schema as well as the Snowflake warehouse that
  runs the Streamlit app. For details, see
  [Getting started with Streamlit in Snowflake](https://docs.snowflake.com/developer-guide/streamlit/getting-started).

## Create and run the example app

<Steps>
  <Step title="Create the Streamlit app">
    1. In Snowsight for your Snowflake account, on the sidebar, click **Projects > Streamlit**.
    2. Click **+ Streamlit App**.
    3. For **App title**, enter a name for your app, such as `Unstructured Demo Streamlit App`.
    4. For **App location**, chose the target database and schema to store the app in.
    5. For **App warehouse**, choose the warehouse that you want to use to run your app and execute its queries.
    6. Click **Create**.
  </Step>

  <Step title="Add code to the Streamlit app">
    In this step, you add Python code to the Streamlit app that you created in the previous step.

    This step explains each part of the code as you add it. If you want to skip past these explanations, add the
    code in the [complete code example](#complete-code-example) all at once, and then skip ahead to
    the next step, "Run the Streamlit app."

    1. Import Python dependencies that get the current connection to the Snowflake database and schema and get Streamlit functions and features.

       ```python theme={null}
       from snowflake.snowpark.context import get_active_session
       import streamlit as st
       ```

    2. Get the current connection to the Snowflake database and schema.

       ```python theme={null}
       session = get_active_session() 
       ```

    3. Display the title of the app in the Streamlit UI, and get the user's search query from the Streamlit UI.

       ```python theme={null}
       st.title("Snowflake Cortex Search for RAG with Data from Unstructured")

       query = st.text_input("Enter your search query:")
       ```

    4. Get the user's search query and display a progress indicator in the UI.

       ```python theme={null}
       if query:
           with st.spinner("Embedding and retrieving..."):
       ```

    5. Use the user's search query to get the top result from the `ELEMENTS` table.
       The `ELEMENTS` table contains the data that was generated by Unstructured. The code uses the
       `SNOWFLAKE.CORTEX.EMBED_TEXT_1024` function to generate vector embeddings for the user's search query and the `VECTOR_COSINE_SIMILARITY`
       function to get the similarity between the vector embeddings for the user's search query and the vector embeddings for the `TEXT` column
       for each rown in the `ELEMENTS` table. The code then orders the results by similarity and limits the results to the row with the greatest similarity
       between the search query and the target text.

       ```python theme={null}
               top_result_df = session.sql(f"""
                   WITH query_embedding AS (
                       SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
                           'snowflake-arctic-embed-l-v2.0', '{query}'
                       ) AS EMBED
                   )
                   SELECT 
                       e.TEXT,
                       VECTOR_COSINE_SIMILARITY(e.EMBEDDINGS, q.EMBED) AS similarity
                   FROM ELEMENTS e, query_embedding q
                   ORDER BY similarity DESC
                   LIMIT 1
               """).to_pandas()
       ```

    6. Get the `TEXT` column from the top result and use it as context for the user's search query.

       ```python theme={null}
               context = top_result_df["TEXT"][0]
       ```

    7. Use the user's search query and the context from the top result to get a response from Snowflake Cortex Search for RAG.
       The code uses the `SNOWFLAKE.CORTEX.COMPLETE` function to generate a response to the user's search query based on the context from the top result.

       ```python theme={null}
               completion_df = session.sql(f"""
                   SELECT SNOWFLAKE.CORTEX.COMPLETE(
                       'snowflake-arctic',
                   CONCAT('Context: ', $$ {context} $$, ' \\n\\nQuestion: {query}\\nAnswer:')
               ) AS RESPONSE
               """).to_pandas()
       ```

    8. Display the generated response in the Streamlit UI.

       ```python theme={null}
               st.write("Answer:")
               st.write(completion_df["RESPONSE"][0])
       ```
  </Step>

  <Step title="Run the Streamlit app">
    1. In the upper right corner, click **Run**.
    2. For **Enter your search query**, enter some natural-language question about the `TEXT` column in the table.
    3. Press **Enter**.

    Snowflake Cortex Search for RAG returns its answer to your question in natural-language, chatbot-style format.
  </Step>
</Steps>

## Complete code example

The full code example for the Streamlit app is as follows:

```python theme={null}
from snowflake.snowpark.context import get_active_session
import streamlit as st

session = get_active_session()

st.title("Snowflake Cortex Search for RAG with Data from Unstructured")

query = st.text_input("Enter your search query:")

if query:
    with st.spinner("Embedding and retrieving..."):

        top_result_df = session.sql(f"""
            WITH query_embedding AS (
                SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
                    'snowflake-arctic-embed-l-v2.0', '{query}'
                ) AS EMBED
            )
            SELECT 
                e.TEXT,
                VECTOR_COSINE_SIMILARITY(e.EMBEDDINGS, q.EMBED) AS similarity
            FROM ELEMENTS e, query_embedding q
            ORDER BY similarity DESC
            LIMIT 1
        """).to_pandas()

        context = top_result_df["TEXT"][0]

        completion_df = session.sql(f"""
            SELECT SNOWFLAKE.CORTEX.COMPLETE(
                'snowflake-arctic',
                CONCAT('Context: ', $$ {context} $$, ' \\n\\nQuestion: {query}\\nAnswer:')
            ) AS RESPONSE
        """).to_pandas()

        st.write("Answer:")
        st.write(completion_df["RESPONSE"][0])
```

## Additional resources

* [Streamlit in Snowflake documentation](https://docs.snowflake.com/developer-guide/streamlit/about-streamlit)
* [Create and deploy Streamlit apps using Snowsight](https://docs.snowflake.com/developer-guide/streamlit/create-streamlit-ui)
* [Snowflake Solutions Developer Center for Streamlit](https://www.snowflake.com/en/developers/solutions-center/?tags=technology%2Fstreamlit)
* [Streamlit documentation](https://docs.streamlit.io/)
