https://<workspace-id>.cloud.databricks.com
https://adb-<workspace-id>.<random-number>.azuredatabricks.net
https://<workspace-id>.<random-number>.gcp.databricks.com
/
) to the workspace URL.USE CATALOG
on the volume’s parent catalog in Unity Catalog.USE SCHEMA
on the volume’s parent schema (formerly known as a database) in Unity Catalog.READ VOLUME
and WRITE VOLUME
on the volume.DATABRICKS_HOST
- The Databricks host URL, represented by --host
(CLI) or host
(Python).
/
) to the host URL.DATABRICKS_CATALOG
- The Databricks catalog name for the Volume, represented by --catalog
(CLI) or catalog
(Python).
DATABRICKS_SCHEMA
- The Databricks schema name for the Volume, represented by --schema
(CLI) or schema
(Python). If not specified, default
is used.
DATABRICKS_VOLUME
- The Databricks Volume name, represented by --volume
(CLI) or volume
(Python).
DATABRICKS_VOLUME_PATH
- Any optional path to access within the volume, specified by --volume-path
(CLI) or volume_path
(Python).
DATABRICKS_TOKEN
- The personal access token, represented by --token
(CLI) or token
(Python).DATABRICKS_USERNAME
- The user’s name, represented by --username
(CLI) or username
(Python).DATABRICKS_PASSWORD
- The user’s password, represented by --password
(CLI) or password
(Python).DATABRICKS_CLIENT_ID
- The client ID value for the corresponding service principal, represented by --client-id
(CLI) or client_id
(Python).DATABRICKS_CLIENT_SECRET
- The client ID and OAuth secret values for the corresponding service principal, represented by --client-secret
(CLI) or client_secret
(Python).ARM_CLIENT_ID
- The client ID value for the corresponding managed identity, represented by --azure-client-id
(CLI) or azure_client_id
(Python).DATABRICKS_AZURE_RESOURCE_ID
, represented by --azure-workspace-resource-id
(CLI) or azure_workspace_resource_id
(Python).ARM_TENANT_ID
- The tenant ID value for the corresponding service principal, represented by --azure-tenant-id
(CLI) or azure_tenant_id
(Python).ARM_CLIENT_ID
- The client ID value for the corresponding service principal, represented by --azure-client-id
(CLI) or azure_client_id
(Python).ARM_CLIENT_SECRET
- The client secret value for the corresponding service principal, represented by --azure-client-secret
(CLI) or azure_client_secret
(Python).DATABRICKS_AZURE_RESOURCE_ID
, represented by --azure-workspace-resource-id
(CLI) or azure_workspace_resource_id
(Python).DATABRICKS_TOKEN
- The Entra ID token for the corresponding Entra ID user, represented by --token
(CLI) or token
(Python).GOOGLE_CREDENTIALS
- The local path to the corresponding Google Cloud service account’s credentials file, represented by --google-credentials
(CLI) or google_credentials
GOOGLE_SERVICE_ACCOUNT
- The Google Cloud service account’s email address, represented by --google-service-account
(CLI) or google_service_account
(Python).DATABRICKS_PROFILE
- The name of the Databricks configuration profile, represented by --profile
(CLI) or profile
(Python).--partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
--partition-by-api
(CLI) or partition_by_api
(Python), or explicitly specify partition_by_api=False
(Python).
Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
--partition-by-api
(CLI) or partition_by_api=True
(Python).
Unstructured also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.https://api.unstructuredapp.io/general/v0/general
, which is the API URL for the Unstructured Partition Endpoint. However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.If you do not have an API key, get one now.If the Unstructured API is self-hosted, the process
for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
For details, contact Unstructured Sales at
sales@unstructured.io.