If you’re new to Unstructured, read this note first.

Before you can create a source connector, you must first sign up for Unstructured and get your Unstructured API key. After you sign up, the Unstructured user interface (UI) appears, which you use to get the key. To learn how, watch this 40-second how-to video.

After you create the source connector, add it along with a destination connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on Workflow Endpoint quickstart, go directly to the quickstart notebook, or watch the two 4-minute video tutorials for the Unstructured Python SDK.

You can also create source connectors with the Unstructured user interface (UI). Learn how.

If you need help, reach out to the community on Slack, or contact us directly.

You are now ready to start creating a source connector! Keep reading to learn how.

Ingest your files into Unstructured from Azure Blob Storage.

The requirements are as follows.

The following video shows how to fulfill the minimum set of Azure Storage account requirements:

If you are generating an SAS token as shown in the preceding video, be sure to set the following permissions:

  • Read and List for reading from the container only.
  • Write and List for writing to the container only.
  • Read, Write, and List for both reading from and writing to the container.

Here are some more details about these requirements:

  • An Azure account. To create one, learn how.

  • An Azure Storage account, and a container within that account. Create a storage account. Create a container.

  • The Azure Storage remote URL, using the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed>

    For example, if your container is named my-container, and there is a folder in the container named my-folder, the Azure Storage remote URL would be az://my-container/my-folder/.

  • An SAS token (recommended), access key, or connection string for the Azure Storage account. Create an SAS token (recommended). Get an access key. Get a connection string.

    Create an SAS token (recommended):

    Get an access key or connection string:

To create an Azure Blob Storage source connector, see the following examples.

import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import (
    CreateSourceConnector,
    SourceConnectorType,
    AzureSourceConnectorConfigInput
)

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type=SourceConnectorType.AZURE,
                config=AzureSourceConnectorConfigInput(
                    remote_url="az://<container-name>/<path/to/file/or/folder>",
                    recursive=<True|False>,
                
                    # For anonymous authentication, do not set any of the 
                    # following fields.

                    # For SAS token authentication:
                    account_name="<account-name>",
                    sas_token="<sas-token>"

                    # For account key authentication:
                    account_name="<account-name>",
                    account_key="<account-key>"

                    # For connection string authentication:
                    connection_string="<connection-string>"
                )
            )
        )
    )

    print(response.source_connector_information)

Replace the preceding placeholders as follows:

  • <name> (required) - A unique name for this connector.

  • az://<container-name>/<path/to/file/or/folder> (required) - The Azure Storage remote URL, with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed>

    For example, if your container is named my-container, and there is a folder in the container named my-folder, the Azure Storage remote URL would be az://my-container/my-folder/.

  • <account-name> (required for SAS token authentication and account key authentication) - The Azure Storage account name.

  • <sas-token> - For SAS token authentication, the SAS token for the Azure Storage account (required).

  • <account-key> - For account key authentication, the key for the Azure Storage account (required).

  • <connection-string> - For connection string authentication, the connection string for the Azure Storage account (required).

  • For recursive (source connector only), set to true to recursively access files from subfolders within the container. The default is false if not otherwise specified.