Ingest your files into Unstructured from Azure Blob Storage.

The requirements are as follows.

The following video shows how to fulfill the minimum set of Azure Storage account requirements:

If you are generating an SAS token as shown in the preceding video, be sure to set the following permissions:

  • Read and List for reading from the container only.
  • Write and List for writing to the container only.
  • Read, Write, and List for both reading from and writing to the container.

Here are some more details about these requirements:

  • An Azure account. To create one, learn how.

  • An Azure Storage account, and a container within that account. Create a storage account. Create a container.

  • The Azure Storage remote URL, using the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed>

    For example, if your container is named my-container, and there is a folder in the container named my-folder, the Azure Storage remote URL would be az://my-container/my-folder/.

  • An SAS token (recommended), access key, or connection string for the Azure Storage account. Create an SAS token (recommended). Get an access key. Get a connection string.

    Create an SAS token (recommended):

    Get an access key or connection string:

To create an Azure Blob Storage source connector, see the following examples.

import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import (
    CreateSourceConnector,
    SourceConnectorType,
    AzureSourceConnectorConfigInput
)

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type=SourceConnectorType.AZURE,
                config=AzureSourceConnectorConfigInput(
                    remote_url="az://<container-name>/<path/to/file/or/folder>",
                    recursive=<True|False>,
                
                    # For anonymous authentication, do not set any of the 
                    # following fields.

                    # For SAS token authentication:
                    account_name="<account-name>",
                    sas_token="<sas-token>"

                    # For account key authentication:
                    account_name="<account-name>",
                    account_key="<account-key>"

                    # For connection string authentication:
                    connection_string="<connection-string>"
                )
            )
        )
    )

    print(response.source_connector_information)

Replace the preceding placeholders as follows:

  • <name> (required) - A unique name for this connector.

  • az://<container-name>/<path/to/file/or/folder> (required) - The Azure Storage remote URL, with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed>

    For example, if your container is named my-container, and there is a folder in the container named my-folder, the Azure Storage remote URL would be az://my-container/my-folder/.

  • <account-name> (required for SAS token authentication and account key authentication) - The Azure Storage account name.

  • <sas-token> - For SAS token authentication, the SAS token for the Azure Storage account (required).

  • <account-key> - For account key authentication, the key for the Azure Storage account (required).

  • <connection-string> - For connection string authentication, the connection string for the Azure Storage account (required).

  • For recursive (source connector only), set to true to recursively access files from subfolders within the container. The default is false if not otherwise specified.