To proceed with a self-hosted deployment, your organization must first sign a self-hosting agreement with Unstructured.

If you have not yet signed this agreement, stop here, and begin the self-hosting agreement process by contacting your Unstructured sales representative, emailing Unstructured Sales at sales@unstructured.io, or filling out the contact form on the Unstructured website.

After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you must first set up your Azure account as follows.

Questions? Need help?

If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are, email Unstructured Sales at sales@unstructured.io, or fill out the contact form on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible.

Onboarding checklist

Set up the following infrastructure within your Azure account for Unstructured to deploy the Unstructured UI and API into.

Azure subscription and resource group

  • Subscription

    • Ensure you have access to a valid Azure subscription
    • You will need the subscription_id if deploying via CLI or Pulumi
  • Resource Group

    • Name: u10d-{env}-rg
    • Region: e.g., eastus2
    • All resources (VNet, AKS, PostgreSQL, Storage, etc.) will be created inside this group

VNet and networking

  • Virtual Network (VNet)

    • Address space: 10.0.0.0/16
    • DNS Hostnames: Enabled
    • DNS Support: Enabled
  • Internet Access

    • Handled via Azure’s default gateway and public IPs
  • Public Subnet

    • Address: 10.0.0.0/24
    • Assign Public IP: true
    • Availability Zone: ${region}a
  • NAT Gateway + Public IP

    • NAT Gateway in the public subnet
    • Public IP resource attached
  • Private Subnets (x2)

    • Addresses: 10.0.1.0/24, 10.0.2.0/24
    • AZs: ${region}a and ${region}b
  • Route Tables

    • Public: route 0.0.0.0/0 via internet
    • Private: route 0.0.0.0/0 via NAT Gateway

Managed identities and RBAC

  • AKS Cluster Managed Identity

    • Assign roles:

      • Contributor or more scoped role
      • Network Contributor
  • Node Pool Managed Identity

    • Assign roles:

      • Monitoring Metrics Publisher
      • AcrPull (if using ACR)
      • Storage Blob Data Reader
  • Workload Identity Bindings (x3)

    • Namespaces: recommender, etl-operator, data-broker
    • Use Azure AD Workload Identity Federation
    • Assign Storage Blob Data Contributor to required containers

AKS Cluster

  • Control Plane

    • Version: 1.31 or higher
    • API authorized IPs: optional
    • Private cluster networking recommended
  • Node Pool

    • VM Size: Standard_D16s_v5
    • Disk Size: 100 GB
    • Desired Size: 2 (min: 2, max: 5)
    • SSH: Enabled via key pair
    • SSH key exported in PEM format
  • NSGs (Network Security Groups)

    • Allow intra-cluster traffic (10.0.0.0/16)
    • Allow all egress

Kubernetes Add-ons

Install via Helm or YAML:

  • Workload Identity Webhook

  • Metrics Serverv0.7.2

  • Azure Disk CSI Driver

    • Provisioner: disk.csi.azure.com

Storage class

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-disk-sc
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  kind: Managed
volumeBindingMode: WaitForFirstConsumer

Secrets and ConfigMaps

After your infrastructure is set up, but before Unstructured can deploy the Unstructured UI and API into your insfrastructure, Unstructured will need to know the values of the following Secrets and ConfigMaps. These must be provided to Unstructured as a set of YAML files in Kubernetes Secret and ConfigMap format.

Capture these during setup

  • DB host, username, password
  • Container names
  • SSH private key
  • Auth secrets

The Secrets are as follows.

Blob storage credentials (Azure)

  • BLOB_STORAGE_ADAPTER_ACCOUNT_NAME
  • BLOB_STORAGE_ADAPTER_ACCOUNT_KEY
  • BLOB_STORAGE_ADAPTER_CONTAINER_REGION (optional)

Database credentials

  • DB_USERNAME
  • DB_PASSWORD
  • DB_HOST
  • DB_NAME
  • DB_DATABASE

Authentication

  • JWT_SECRET_KEY
  • AUTH_STRATEGY
  • SESSION_SECRET
  • SHARED_SECRET
  • KEYCLOAK_CLIENT_SECRET
  • KEYCLOAK_ADMIN_SECRET
  • KEYCLOAK_ADMIN
  • KEYCLOAK_ADMIN_PASSWORD
  • API_BEARER_TOKEN

The ConfigMaps are as follows.

Blob storage settings

  • BLOB_STORAGE_ADAPTER_TYPE: azure
  • BLOB_STORAGE_ADAPTER_BUCKET
  • ETL_BLOB_CACHE_BUCKET_NAME
  • ETL_API_BLOB_STORAGE_ADAPTER_BUCKET
  • ETL_API_BLOB_STORAGE_ADAPTER_TYPE: azure
  • ETL_API_DB_REMOTE_BUCKET_NAME
  • ETL_API_JOB_STATUS_DEST_BUCKET_NAME
  • JOB_STATUS_BUCKET_NAME
  • JOB_DB_BUCKET_NAME

Environment

  • ENV, ENVIRONMENT
  • JOB_ENV, JOB_ENVIRONMENT

Observability and OpenTelementry (OTel)

  • JOB_OTEL_EXPORTER_OTLP_ENDPOINT
  • JOB_OTEL_METRICS_EXPORTER
  • JOB_OTEL_TRACES_EXPORTER
  • OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_METRICS_EXPORTER
  • OTEL_TRACES_EXPORTER

Unstructured API and authentication

  • UNSTRUCTURED_API_URL
  • JWKS_URL
  • JWT_ISSUER
  • JWT_AUDIENCE
  • SINGLE_PLANE_DEPLOYMENT

Front end and dashboard

  • API_BASE_URL
  • API_CLIENT_BASE_URL
  • API_URL
  • APM_SERVICE_NAME
  • APM_SERVICE_NAME_CLIENT
  • AUTH_STRATEGY
  • FRONTEND_BASE_URL
  • KEYCLOAK_CALLBACK_URL
  • KEYCLOAK_CLIENT_ID
  • KEYCLOAK_DOMAIN
  • KEYCLOAK_REALM
  • KEYCLOAK_SSL_ENABLED
  • KEYCLOAK_TRUST_ISSUER
  • PUBLIC_BASE_URL
  • PUBLIC_RELEASE_CHANNEL

Redis

  • REDIS_DSN

Other

  • IMAGE_PULL_SECRETS
  • PRIVATE_KEY_SECRETS_ADAPTER_TYPE: azure
  • PRIVATE_KEY_SECRETS_ADAPTER_AZURE_REGION
  • SECRETS_ADAPTER_TYPE: azure
  • SECRETS_ADAPTER_AZURE_REGION

The preceding Secrets and ConfigMaps must be added to the following files:

File NameTypeResource nameNamespaceData keys
data-broker-env-cm.yamlConfigMapdata-broker-envapiJOB_STATUS_BUCKET_NAME, JOB_DB_BUCKET_NAME, BLOB_STORAGE_ADAPTER_TYPE
data-broker-env-secret.yamlSecretdata-broker-envapiBLOB_STORAGE_ADAPTER_ACCOUNT_NAME, BLOB_STORAGE_ADAPTER_ACCOUNT_KEY, BLOB_STORAGE_ADAPTER_CONTAINER_REGION
dataplane-api-env-cm.yamlSecretdataplane-api-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME
etl-operator-env-cm.yamlConfigMapetl-operator-envetl-operatorBLOB_STORAGE_ADAPTER_BUCKET, JOB_STATUS_BUCKET_NAME, JOB_DB_BUCKET_NAME, BLOB_STORAGE_ADAPTER_TYPE, ENV, ENVIRONMENT, REDIS_DSN, ETL_API_BLOB_STORAGE_ADAPTER_BUCKET, ETL_API_BLOB_STORAGE_ADAPTER_TYPE, ETL_API_DB_REMOTE_BUCKET_NAME, ETL_API_JOB_STATUS_DEST_BUCKET_NAME (x2), ETL_BLOB_CACHE_BUCKET_NAME, IMAGE_PULL_SECRETS, JOB_ENV, JOB_ENVIRONMENT, JOB_OTEL_EXPORTER_OTLP_ENDPOINT, JOB_OTEL_METRICS_EXPORTER, JOB_OTEL_TRACES_EXPORTER, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER, OTEL_TRACES_EXPORTER, UNSTRUCTURED_API_URL
etl-operator-env-secret.yamlSecretetl-operator-envetl-operatorBLOB_STORAGE_ADAPTER_ACCOUNT_NAME, BLOB_STORAGE_ADAPTER_ACCOUNT_KEY, BLOB_STORAGE_ADAPTER_CONTAINER_REGION
frontend-env-cm.yamlConfigMapfrontend-envwwwAPI_BASE_URL, API_CLIENT_BASE_URL, API_URL, APM_SERVICE_NAME, APM_SERVICE_NAME_CLIENT, AUTH_STRATEGY, ENV, FRONTEND_BASE_URL, KEYCLOAK_CALLBACK_URL, KEYCLOAK_CLIENT_ID, KEYCLOAK_DOMAIN, KEYCLOAK_REALM, KEYCLOAK_SSL_ENABLED, KEYCLOAK_TRUST_ISSUER, PUBLIC_BASE_URL, PUBLIC_RELEASE_CHANNEL, SENTRY_DSN, SENTRY_SAMPLE_RATE, WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM, CUSTOM_WORKFLOW_FF_REQUEST_FORM
frontend-env-secret.yamlSecretfrontend-envwwwAPI_BEARER_TOKEN, KEYCLOAK_ADMIN_SECRET, KEYCLOAK_CLIENT_SECRET, SESSION_SECRET, SHARED_SECRET
keycloak-secret.yamlSecretphasetwo-keycloak-envwwwKEYCLOAK_ADMIN, KEYCLOAK_ADMIN_PASSWORD
platform-api-env-cm.yamlConfigMapplatform-api-envapiJWKS_URL, JWT_ISSUER, JWT_AUDIENCE, SINGLE_PLANE_DEPLOYMENT
platform-api-env-secret.yamlSecretplatform-api-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME, DB_DATABASE, JWT_SECRET_KEY, AUTH_STRATEGY
recommender-env-cm.yamlConfigMaprecommender-envrecommenderBLOB_STORAGE_ADAPTER_TYPE, ETL_BLOB_CACHE_BUCKET_NAME
recommender-env-secret.yamlSecretrecommender-envrecommenderBLOB_STORAGE_ADAPTER_ACCOUNT_NAME, BLOB_STORAGE_ADAPTER_ACCOUNT_KEY, BLOB_STORAGE_ADAPTER_CONTAINER_REGION
secret-provider-api-env-cm.yamlConfigMapsecrets-provider-api-envsecretsENV, ENVIRONMENT, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER, OTEL_TRACES_EXPORTER, PRIVATE_KEY_SECRETS_ADAPTER_AZURE_REGION, PRIVATE_KEY_SECRETS_ADAPTER_TYPE, SECRETS_ADAPTER_AZURE_REGION, SECRETS_ADAPTER_TYPE
secret-provider-api-env-secret.yamlSecretsecrets-provider-api-envsecretsBLOB_STORAGE_ADAPTER_ACCOUNT_NAME, BLOB_STORAGE_ADAPTER_ACCOUNT_KEY, BLOB_STORAGE_ADAPTER_CONTAINER_REGION
usage-collector-env-secret.yamlSecretusage-collector-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME, BLOB_STORAGE_ADAPTER_TYPE

For example, for the data-broker-env-cm.yaml ConfigMap file, the contents would look like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: data-broker-env
  namespace: api
data:
  JOB_STATUS_BUCKET_NAME: "<your-value>"
  JOB_DB_BUCKET_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_TYPE: "<your-value>"

The data-broker-env-secret.yaml Secret file would look like this:

apiVersion: v1
kind: Secret
metadata:
  name: data-broker-env
  namespace: api
type: Opaque
stringData:
  BLOB_STORAGE_ADAPTER_ACCOUNT_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_ACCOUNT_KEY: "<your-value>"
  BLOB_STORAGE_ADAPTER_CONTAINER_REGION: "<your-value>"