Snowflake

If you’re new to Unstructured, read this note first.Before you can create a destination connector, you must first sign in to your Unstructured account:

If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io. To sign up for a Team or Enterprise account instead, contact Unstructured Sales, or learn more.
If you already have an Unstructured Starter or Team account and are not already signed in, sign in to your account at https://platform.unstructured.io. For an Enterprise account, see your Unstructured account administrator for instructions, or email Unstructured Support at support@unstructured.io.

After you sign in, the Unstructured user interface (UI) appears, which you use to get your Unstructured API key.

After you sign in to your Unstructured Starter account, click API Keys on the sidebar.
For a Team or Enterprise account, before you click API Keys, make sure you have selected the organizational workspace you want to create an API key for. Each API key works with one and only one organizational workspace. Learn more.
Click Generate API Key.
Follow the on-screen instructions to finish generating the key.
Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.

After you create the destination connector, add it along with a source connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on Workflow Endpoint quickstart, go directly to the quickstart notebook, or watch the two 4-minute video tutorials for the Unstructured Python SDK.You can also create destination connectors with the Unstructured user interface (UI). Learn how.If you need help, email Unstructured Support at support@unstructured.io.You are now ready to start creating a destination connector! Keep reading to learn how.

Send processed data from Unstructured to Snowflake. The requirements are as follows.

A Snowflake account and its account identifier. To get the identifier for the current Snowflake account:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click your username, and then click Account > View account details.
3. On the Account tab, note the value of the Account Identifier field.
Alternatively, the following Snowflake query returns the current account’s identifier:
```
SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME() AS "Account Identifier"
```
A Snowflake user, which can be a service user (recommended) or a human user. To create a service user (recommended):
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Projects > Worksheets.
3. Click the + button to create a SQL worksheet.
4. In the worksheet, enter the following Snowflake query to create a service user, replacing the following placeholders:
  - Replace <service-user-name> with some name for the service user.
  - Replace <default-role-name> with the name of any default role for the service user to use.
  CREATE USER <service-user-name> DEFAULT_ROLE = "<default-role-name>" TYPE = SERVICE
5. Click the arrow icon to run the worksheet, which creates the service user.
To create a human user:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Users & roles.
3. Click the Users tab.
4. Click + User.
5. Follow the on-screen guidance to specify the user’s settings.
6. Click Create User.
The Snowflake user’s login name (not username) in the account, and a programmatic access token (PAT) for the Snowflake user. To view the login name for a user:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Users & Roles.
3. On the Users tab, in the list of available users, click the name of the target user.
4. In the About tile, note the Login Name for the user.
Alternatively, the following Snowflake query returns information about the user with the username of <my-user>, including their login_name value representing their login name:
```
SHOW USERS LIKE '<my-user>';
```
To create a programmatic access token (PAT) for a user:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Users & Roles.
3. On the Users tab, in the list of available users, click the name of the target user.
4. In the Programmatic access tokens tile, click the Generate new token button.
5. Follow the on-screen guidance to specify the PAT’s settings.
  You must set an expiration date for the PAT. This expiration date can be as soon as one day after the PAT is created or up to one year or even later. Once this PAT expires, the connector will stop working. To make sure that your connector continues to work, before your current PAT expires, you must follow this procedure again to generate a new PAT and update your connector’s settings with your new PAT’s value.Unstructured does not notify you when a PAT is about to expire or has already expired. You are responsible for tracking your PATs’ expiration dates and taking corrective action before they expire.
6. Click Generate.
7. Copy the generated PAT’s value to a secure location, as you will not be able to access it again. If you lose this PAT’s value, you will need to repeat this procedure to generate a new, replacement one.
The PAT will not work unless the Snowflake account also has a valid network rule along with a valid network policy attached to that rule. The network rule must also be activated on the Snowflake account to begin taking effect. To create a valid network rule:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Security > Network Rules.
3. Click + Network Rule.
4. Enter some name for the network rule.
5. For Type, select IPv4.
6. For Mode, select Ingress.
7. For Identifiers, next to the magnifying glass icon, enter 0.0.0.0/0, and then press Enter.
  The 0.0.0.0/0 value allows all IP addresses to access the Snowflake account. You can specify a more specific IP address range if you prefer. However, this more specific IP address range will apply to all users, including the user for which you created the PAT.
8. Click Create Network Rule.
To create a valid network policy, attaching the preceding network rule to this policy at the same time:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Security > Network Policies.
3. Click + Network Policy.
4. Enter some name for the network policy.
5. Make sure Allowed is selected.
6. In the Select rule drop-down list, select the precedingnetwork rule to attach to this network policy.
7. Click Create Network Policy.
To activate the network rule in the account:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Security > Network Policies.
3. Click the name of the precedingnetwork policy to activate.
4. In the policy’s side panel, click the ellipsis (three dots) icon, and then click Activate On Account.
5. Click Activate policy.
(No longer recommended, as passwords are being deprecated by Snowflake—use PATs instead) The Snowflake user’s login name (not username) and the user’s password in the account. This user must be a human user. Passwords are not supported for service users. To view the login name for a user:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Users & Roles.
3. On the Users tab, in the list of available users, click the name of the target user.
4. In the About tile, note the Login Name for the user.
Alternatively, the following Snowflake query returns information about the user with the username of <my-user>, including their login_name value representing their login name:
```
SHOW USERS LIKE '<my-user>';
```
The name of the Snowflake role that the user belongs to and that also has sufficient access to the Snowflake database, schema, table, and host.
- To create a database in Snowflake, the role needs to be granted CREATE DATABASE privilege at the current account level; and USAGE privilege on the warehouse that is used to create the database.
- To create a schema in a database in Snowflake, the role needs to be granted USAGE privilege on the database and the warehouse that is used to create the schema; and CREATE SCHEMA on the database.
- To create a table in a schema in Snowflake, the role needs to be granted USAGE privilege on the database and schema and the warehouse that is used to create the table; and CREATE TABLE on the schema.
- To write to a table in Snowflake, the role needs to be granted USAGE privilege on the database and schema and the warehouse that is used to write to the table; and INSERT on the table.
- To read from a table in Snowflake, the role needs to be granted USAGE privilege on the database and schema and the warehouse that is used to write to the table; and SELECT on the table.
To view a list of available roles in the current Snowflake account:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Users & Roles.
3. Click the Roles tab.
Alternatively, the following Snowflake query returns a list of available roles in the current account:
```
SHOW ROLES;
```
Grant privileges to a role. Learn more.
The Snowflake warehouse’s hostname and its port number in the account. To view a list of available warehouses in the current Snowflake account:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Admin > Warehouses. This view does not provide access to the warehouses’ hostnames or port numbers. To get this information, you must run a Snowflake query.
The following Snowflake query returns a list of available warehouse types, hostnames, and port numbers in the current account. Look for the row with a type of SNOWFLAKE_DEPLOYMENT:
```
SELECT t.VALUE:type::VARCHAR as type,
       t.VALUE:host::VARCHAR as host,
       t.VALUE:port as port
FROM TABLE(FLATTEN(input => PARSE_JSON(SYSTEM$ALLOWLIST()))) AS t;
```
The name of the Snowflake database in the account. To view a list of available databases in the current Snowflake account:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Data > Databases.
Alternatively, the following Snowflake query returns a list of available databases in the current account:
```
SHOW DATABASES;
```
The name of the schema in the database. To view a list of available schemas for a database in the current Snowflake account:
1. Log in to Snowsight with your Snowflake account.
2. In Snowsight, on the navigation menu, click Data > Databases.
3. Expand the name of the target database.
Alternatively, the following Snowflake query returns a list of available schemas in the current account:
```
SHOW SCHEMAS;
```
The following Snowflake query returns a list of available schemas for the database named <database_name> in the current account:
```
SHOW SCHEMAS IN DATABASE <database_name>;
```

The name of the table in the schema. To view a list of available tables for a schema in a database in the current Snowflake account:

Log in to Snowsight with your Snowflake account.
In Snowsight, on the navigation menu, click Data > Databases.
Expand the name of the database that contains the target schema.
Expand the name of the target schema.
Expand Tables.

Alternatively, the following Snowflake query returns a list of available tables for the schema named <schema_name> in the datbase named <database_name> in the current account:

SHOW TABLES IN SCHEMA <database_name>.<schema_name>;

Snowflake requires the target table to have a defined schema before Unstructured can write to the table. The recommended table schema for Unstructured is as follows. In the following CREATE TABLE statement, replace the following placeholders with the appropriate values:

<database_name>: The name of the target database in the Snowflake account.
<schema_name>: The name of the target schema in the database.
<number-of-dimensions>: The number of dimensions for any embeddings that you plan to use. This value must match the number of dimensions for any embeddings that are
specified in your related Unstructured workflows or pipelines. If you plan to use Snowflake vector embedding generation or Snowflake vector search, this value must match the number of dimensions that you plan to have Snowflake generate or search against.

SQL

CREATE TABLE <database_name>.<schema_name>.ELEMENTS (
  ID VARCHAR(36) PRIMARY KEY NOT NULL DEFAULT UUID_STRING(),
  RECORD_ID VARCHAR,
  ELEMENT_ID VARCHAR,
  TEXT VARCHAR,
  EMBEDDINGS VECTOR(FLOAT, <number-of-dimensions>),
  TYPE VARCHAR,
  SYSTEM VARCHAR,
  LAYOUT_WIDTH DECIMAL,
  LAYOUT_HEIGHT DECIMAL,
  POINTS VARCHAR,
  URL VARCHAR,
  VERSION VARCHAR,
  DATE_CREATED TIMESTAMP_TZ,
  DATE_MODIFIED TIMESTAMP_TZ,
  DATE_PROCESSED TIMESTAMP_TZ,
  PERMISSIONS_DATA VARCHAR,
  RECORD_LOCATOR VARCHAR,
  CATEGORY_DEPTH INTEGER,
  PARENT_ID VARCHAR,
  ATTACHED_FILENAME VARCHAR,
  FILETYPE VARCHAR,
  LAST_MODIFIED TIMESTAMP_TZ,
  FILE_DIRECTORY VARCHAR,
  FILENAME VARCHAR,
  LANGUAGES ARRAY,
  PAGE_NUMBER VARCHAR,
  LINKS VARCHAR,
  PAGE_NAME VARCHAR,
  LINK_URLS ARRAY,
  LINK_TEXTS ARRAY,
  SENT_FROM ARRAY,
  SENT_TO ARRAY,
  SUBJECT VARCHAR,
  SECTION VARCHAR,
  HEADER_FOOTER_TYPE VARCHAR,
  EMPHASIZED_TEXT_CONTENTS ARRAY,
  EMPHASIZED_TEXT_TAGS ARRAY,
  TEXT_AS_HTML VARCHAR,
  REGEX_METADATA VARCHAR,
  DETECTION_CLASS_PROB DECIMAL,
  IMAGE_BASE64 VARCHAR,
  IMAGE_MIME_TYPE VARCHAR,
  ORIG_ELEMENTS VARCHAR,
  IS_CONTINUATION BOOLEAN
);

The name of the column in the table that uniquely identifies each record (for example, RECORD_ID).

To create a Snowflake destination connector, see the following examples.

import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateDestinationRequest
from unstructured_client.models.shared import CreateDestinationConnector

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.destinations.create_destination(
        request=CreateDestinationRequest(
            create_destination_connector=CreateDestinationConnector(
                name="<name>",
                type="snowflake",
                config={
                    "account": "<account>",
                    "user": "<user>",
                    "host": "<host>",
                    "port": <port>,
                    "database": "<database>",
                    "schema": "<schema>",
                    "role": "<role>",
                    "password": "<programmatic-access-token>",
                    "record_id_key": "<record-id-key>",
                    "table_name": "<table_name>",
                    "batch_size": <batch-size>
                }
            )
        )
    )

    print(response.destination_connector_information)

Replace the preceding placeholders as follows:

<name> (required) - A unique name for this connector.
<account> (required): The target Snowflake account’s identifier.
<role> (required): The name of the Snowflake role that the user belongs to. This role must have the appropriate access to the target Snowflake warehouse, database, schema, and table.
<user> (required): The target Snowflake user’s login name (not their username).
<programmatic-access-token> (required): The user’s programmatic access token (PAT).
Specifying a password is no longer recommended, as passwords are being deprecated by Snowflake. Use a PAT instead.
<host> (required): The hostname of the target Snowflake warehouse.
<port> (required): The warehouse’s port number. The default is 443 if not otherwise specified.
<database> (required): The name of the target Snowflake database.
<schema> (required): The name of the target Snowflake schema within the database.
<table_name>: The name of the target Snowflake table within the database’s schema. For the destination connector, the default is elements if not otherwise specified.
<columns> (source connector only): A comma-separated list of columns to fetch from the table. By default, all columns are fetched unless otherwise specified.
<id-column> (required, source connector only): The name of the column that uniquely identifies each record in the table.
<record-id-key> (destination connector only): The name of the column that uniquely identifies each record in the table. The default is record_id if not otherwise specified.
<batch-size> (required): The maximum number of rows to fetch for each batch. The default is 50 if not otherwise specified.

Learn more

Powering Enterprise RAG: Unstructured’s New Snowflake Integration

Unstructured API

Workflow Endpoint

Partition Endpoint

Legacy APIs

Troubleshooting

Learn more

Unstructured API

Workflow Endpoint

Partition Endpoint

Legacy APIs

Troubleshooting

​Learn more

Learn more