> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks Volumes

<Tip>
  This article covers connecting Unstructured to Databricks Volumes.

  For information about connecting Unstructured to Delta Tables in Databricks instead, see
  [Delta Tables in Databricks](/api-reference/workflow/destinations/databricks-delta-table).
</Tip>

<Note>
  First time creating a connector? [Read this first](/api-reference/workflow/connector-first-time-reqs).
</Note>

Send processed data from Unstructured to Databricks Volumes.

## Requirements

You will need:

* A Databricks account on [AWS](https://docs.databricks.com/getting-started/free-trial.html),
  [Azure](https://learn.microsoft.com/azure/databricks/getting-started/), or
  [GCP](https://docs.gcp.databricks.com/getting-started/index.html).

* A workspace within the Databricks account for [AWS](https://docs.databricks.com/admin/workspace/index.html),
  [Azure](https://learn.microsoft.com/azure/databricks/admin/workspace/), or
  [GCP](https://docs.gcp.databricks.com/admin/workspace/index.html).

* The workspace's URL. Get the workspace URL for
  [AWS](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids),
  [Azure](https://learn.microsoft.com/azure/databricks/workspace/workspace-details#workspace-instance-names-urls-and-ids),
  or [GCP](https://docs.gcp.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids).

  Examples:

  * AWS: `https://<workspace-id>.cloud.databricks.com`
  * Azure: `https://adb-<workspace-id>.<random-number>.azuredatabricks.net`
  * GCP: `https://<workspace-id>.<random-number>.gcp.databricks.com`

  <Note>
    Do not add a trailing slash (`/`) to the workspace URL.
  </Note>

* The Databricks authentication details. For more information, see the documentation for
  [AWS](https://docs.databricks.com/dev-tools/auth/index.html),
  [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/),
  or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/index.html).

  For the [Unstructured Pipelines](/pipelines/overview) or the [Unstructured API](/api-reference/overview), the following Databricks authentication types are supported:

  * Databricks OAuth machine-to-machine (M2M) authentication for\
    [AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-m2m), or
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html).

    You will need the **Client ID** (or **UUID** or **Application** ID) and OAuth **Secret** (client secret) values for the corresponding service principal.
    Note that for Azure, only Databricks managed service principals are supported. Microsoft Entra ID managed service principals are not supported.

    The following video shows how to create a Databricks managed service principal:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/wBmqv5DaA1E" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * Databricks personal access token authentication for
    [AWS](https://docs.databricks.com/dev-tools/auth/pat.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat), or
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html).

    You will need the personal access token's value.

    The following video shows how to create a Databricks personal access token:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/OzEU2miAS6I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  For [Unstructured Ingest](/open-source/ingestion/overview), the following Databricks authentication types are supported:

  * For Databricks personal access token authentication for
    [AWS](https://docs.databricks.com/dev-tools/auth/pat.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat), or
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html): The personal access token's value.

    The following video shows how to create a Databricks personal access token:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/OzEU2miAS6I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * For username and password (basic) authentication ([AWS](https://docs.databricks.com/archive/dev-tools/basic.html) only): The user's name and password values.

  * For OAuth machine-to-machine (M2M) authentication ([AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-m2m), and
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html)): The client ID and OAuth secret values for the corresponding service principal.

  * For OAuth user-to-machine (U2M) authentication ([AWS](https://docs.databricks.com/dev-tools/auth/oauth-u2m.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-u2m), and
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-u2m.html)): No additional values.

  * For Azure managed identities (formerly Managed Service Identities (MSI) authentication) ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-mi) only): The client ID value for the corresponding managed identity.

  * For Microsoft Entra ID service principal authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-sp) only): The tenant ID, client ID, and client secret values for the corresponding service principal.

  * For Azure CLI authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-cli) only): No additional values.

  * For Microsoft Entra ID user authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/user-aad-token) only): The Entra ID token for the corresponding Entra ID user.

  * For Google Cloud Platform credentials authentication ([GCP](https://docs.gcp.databricks.com/dev-tools/auth/gcp-creds.html) only): The local path to the corresponding Google Cloud service account's credentials file.

  * For Google Cloud Platform ID authentication ([GCP](https://docs.gcp.databricks.com/dev-tools/auth/gcp-id.html) only): The Google Cloud service account's email address.

* The name of the parent catalog in Unity Catalog for
  [AWS](https://docs.databricks.com/catalogs/create-catalog.html),
  [Azure](https://learn.microsoft.com/azure/databricks/catalogs/create-catalog), or
  [GCP](https://docs.gcp.databricks.com/catalogs/create-catalog.html) for the volume.

* The name of the parent schema (formerly known as a database) in Unity Catalog for
  [AWS](https://docs.databricks.com/schemas/create-schema.html),
  [Azure](https://learn.microsoft.com/azure/databricks/schemas/create-schema), or
  [GCP](https://docs.gcp.databricks.com/schemas/create-schema.html) for the volume.

* The name of the volume in Unity Catalog for [AWS](https://docs.databricks.com/tables/managed.html),
  [Azure](https://learn.microsoft.com/azure/databricks/tables/managed), or
  [GCP](https://docs.gcp.databricks.com/tables/managed.html), and optionally any path in that volume that you want to access directly, beginning with the volume's root.

* The Databricks workspace user or service principal must have the following *minimum* set of privileges to read from or write to the
  existing volume in Unity Catalog:

  * `USE CATALOG` on the volume's parent catalog in Unity Catalog.
  * `USE SCHEMA` on the volume's parent schema (formerly known as a database) in Unity Catalog.
  * `READ VOLUME` and `WRITE VOLUME` on the volume.

  The following videos show how to create and set privileges for a catalog, schema (formerly known as a database), and volume in Unity Catalog.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/yF9DJphhQQc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  Learn more about how to check and set Unity Catalog privileges for
  [AWS](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges),
  [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/manage-privileges/#grant), or
  [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges).

## Examples

To create a Databricks Volumes destination connector, see the following examples.

For more information on working with destination connectors using the Unstructured API, see [Destination endpoints](/api-reference/api/destination/destination-apis).

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="databricks_volumes",
                  config={
                      "host": "<host>",
                      "catalog": "<catalog>",
                      "schema": "<schema>",
                      "volume": "<volume>",
                      "volume_path": "<volume_path>",

                      # For Databricks OAuth machine-to-machine (M2M) authentication:
                      "client_secret": "<client-secret>",
                      "client_id": "<client-id>"

                      # For Databricks personal access token authentication:
                      "token": "<token>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "databricks_volumes",
      "config": {
          "host": "<host>",
          "catalog": "<catalog>",
          "schema": "<schema>",
          "volume": "<volume>",
          "volume_path": "<volume_path>",

          # For Databricks OAuth machine-to-machine (M2M) authentication:
          "client_secret": "<client-secret>",
          "client_id": "<client-id>"

          # For Databricks personal access token authentication:
          "token": "<token>"
      }
  }'
  ```
</CodeGroup>

## Configuration settings

Replace the preceding placeholders as follows:

<ParamField body="name" type="string" required>
  A unique name for this connector.
</ParamField>

<ParamField body="host" type="string" required>
  The Databricks workspace host URL.

  <Note>
    Do not add a trailing slash (`/`) to the workspace host URL.
  </Note>
</ParamField>

<ParamField body="client_id" type="string" required>
  For Databricks OAuth machine-to-machine (M2M) authentication, the **Client ID** (or **UUID** or **Application ID**) value for the Databricks managed service principal that has the appropriate privileges to the volume.
</ParamField>

<ParamField body="client_secret" type="string" required>
  For Databricks OAuth M2M authentication, the associated OAuth **Secret** value for the Databricks managed service principal that has the appropriate privileges to the volume.
</ParamField>

<ParamField body="token" type="string" required>
  For Databricks personal access token authentication, the personal access token's value.
</ParamField>

<ParamField body="catalog" type="string" required>
  The name of the catalog to use.
</ParamField>

<ParamField body="schema" type="string" default="default">
  The name of the associated schema.
</ParamField>

<ParamField body="volume" type="string" required>
  The name of the associated volume.
</ParamField>

<ParamField body="volume_path" type="string">
  Any optional path to access within the volume.
</ParamField>
