To proceed with a self-hosted deployment, your organization must first sign a self-hosting agreement with Unstructured.

If you have not yet signed this agreement, stop here, and begin the self-hosting agreement process by contacting your Unstructured sales representative, emailing Unstructured Sales at sales@unstructured.io, or filling out the contact form on the Unstructured website.

After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you must first set up your GCP account as follows.

Questions? Need help?

If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are, email Unstructured Sales at sales@unstructured.io, or fill out the contact form on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible.

Onboarding checklist

Set up the following infrastructure within your GCP account for Unstructured to deploy the Unstructured UI and API into.

VPC and networking (GCP equivalent)

  • VPC Network

    • Name: u10d-platform
    • Subnet Mode: Custom
    • CIDR: 10.0.0.0/16
    • DNS: Internal DNS supported by default
  • Internet Gateway

    • GCP provides implicit internet access via default internet gateway

      (No need to explicitly create)

  • Public Subnet

    • Subnet: public-subnet10.0.0.0/24
    • Region: ${region}
    • Enable external IPs on VM instances for internet access
  • NAT Gateway

    • Use Cloud NAT attached to a Cloud Router in public subnet
    • Needed to provide egress internet access to private subnet instances
  • Private Subnets (x2)

    • private-subnet-a: 10.0.1.0/24, region ${region}-a
    • private-subnet-b: 10.0.2.0/24, region ${region}-b
  • Routes

    • Public subnet: default route 0.0.0.0/0 to Internet Gateway (via external IPs)
    • Private subnets: route 0.0.0.0/0 via Cloud NAT

IAM roles and policies

  • GKE Cluster IAM Service Account

    • Grant roles:

      • roles/container.clusterAdmin
      • roles/compute.networkAdmin
  • GKE Node IAM Service Account

    • Grant roles:

      • roles/container.nodeServiceAccount
      • roles/compute.viewer
      • roles/storage.objectViewer
  • Workload Identity IAM Bindings (x3)

    • Namespaces: recommender, etl-operator, data-broker
    • Use Workload Identity Federation
    • Bind GCP IAM Service Accounts to Kubernetes service accounts
    • Grant roles/storage.objectAdmin for access to GCS buckets

GKE cluster

  • Control Plane

    • Version: 1.31 or higher
    • Private Cluster: Enabled
    • Master Authorized Networks: your IP(s)
    • Enable Public Endpoint Access: Yes
  • Node Pool

    • Machine type: n2-standard-16
    • Disk: 100GB, SSD (default boot disk)
    • Node count: min 2, max 5, autoscaling enabled
    • SSH access: via OS Login + SSH keys
    • SSH key: Add public key to instance metadata
  • Firewall Rules

    • Allow:

      • Internal: 10.0.0.0/16
      • Egress: all
    • Kubernetes master access to nodes

Kubernetes add-ons (installed via kubectl or Helm)

  • Workload Identity Config

  • Metrics Server

    • Deployed manually (same version: v0.7.2)
  • GCP CSI Driver

    • Provisioner: pd.csi.storage.gke.io
    • Role binding needed for controller SA

Storage class

apiVersion: [storage.k8s.io/v1](http://storage.k8s.io/v1)
kind: StorageClass
metadata:
 name: pd-ssd
 annotations:
  [storageclass.kubernetes.io/is-default-class:](http://storageclass.kubernetes.io/is-default-class:) "true"
provisioner: [pd.csi.storage.gke.io](http://pd.csi.storage.gke.io/)
parameters:
 type: pd-ssd
 encrypted: "true"
volumeBindingMode: WaitForFirstConsumer

Cloud SQL (Postgres)

  • Private IP-enabled Cloud SQL instance

    • Engine: Postgres 16
    • Size: db-f1-micro (or db-custom-1-3840)
    • Storage: 20GB
    • Credentials: Username/password
    • Private network: Use the private VPC
  • Cloud SQL Auth Proxy or private VPC peering to connect from GKE

GCS Buckets

  • Buckets:

    • u10d-{stack_name}-etl-blob-cache
    • u10d-{stack_name}-etl-job-db
    • u10d-{stack_name}-etl-job-status
    • u10d-{stack_name}-job-files
  • Config:

    • Versioning: Enabled
    • Encryption: Default (Google-managed key or CMEK if needed)
    • Lifecycle rule: Auto-delete / force destroy if needed (optional)

Keys

  • SSH Key Pair

    • Generate manually (ssh-keygen -t rsa -b 4096)
    • Upload public key to project metadata or OS Login
    • Export private key as PEM for automation

Secrets and ConfigMaps

After your infrastructure is set up, but before Unstructured can deploy the Unstructured UI and API into your insfrastructure, Unstructured will need to know the values of the following Secrets and configuration mappings (also known as ConfigMaps).

The Secrets are as follows.

Blob storage credentials

  • BLOB_STORAGE_ADAPTER_GCP_SERVICE_ACCOUNT_KEY_JSON
  • BLOB_STORAGE_ADAPTER_REGION_NAME

Database credentials

  • DB_USERNAME
  • DB_PASSWORD
  • DB_HOST
  • DB_NAME
  • DB_DATABASE (used in platform-api only)

Authentication

  • JWT_SECRET_KEY
  • AUTH_STRATEGY (sometimes encoded, sometimes not)
  • SESSION_SECRET
  • SHARED_SECRET
  • KEYCLOAK_CLIENT_SECRET
  • KEYCLOAK_ADMIN_SECRET
  • KEYCLOAK_ADMIN
  • KEYCLOAK_ADMIN_PASSWORD
  • API_BEARER_TOKEN

The ConfigMaps are as follows.

Blob storage settings

  • BLOB_STORAGE_ADAPTER_TYPE (always gcp for GCP)
  • BLOB_STORAGE_ADAPTER_BUCKET
  • ETL_BLOB_CACHE_BUCKET_NAME
  • ETL_API_BLOB_STORAGE_ADAPTER_BUCKET
  • ETL_API_BLOB_STORAGE_ADAPTER_TYPE
  • ETL_API_DB_REMOTE_BUCKET_NAME
  • ETL_API_JOB_STATUS_DEST_BUCKET_NAME
  • JOB_STATUS_BUCKET_NAME
  • JOB_DB_BUCKET_NAME

Environment

  • ENV
  • ENVIRONMENT
  • JOB_ENV
  • JOB_ENVIRONMENT

Observability and OpenTelemetry (OTel)

  • JOB_OTEL_EXPORTER_OTLP_ENDPOINT
  • JOB_OTEL_METRICS_EXPORTER
  • JOB_OTEL_TRACES_EXPORTER
  • OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_METRICS_EXPORTER
  • OTEL_TRACES_EXPORTER

Unstructured API and authentication

  • UNSTRUCTURED_API_URL
  • JWKS_URL
  • JWT_ISSUER
  • JWT_AUDIENCE
  • SINGLE_PLANE_DEPLOYMENT

Front end and dashboard

  • API_BASE_URL
  • API_CLIENT_BASE_URL
  • API_URL
  • APM_SERVICE_NAME
  • APM_SERVICE_NAME_CLIENT
  • AUTH_STRATEGY
  • FRONTEND_BASE_URL
  • KEYCLOAK_CALLBACK_URL
  • KEYCLOAK_CLIENT_ID
  • KEYCLOAK_DOMAIN
  • KEYCLOAK_REALM
  • KEYCLOAK_SSL_ENABLED
  • KEYCLOAK_TRUST_ISSUER
  • PUBLIC_BASE_URL
  • PUBLIC_RELEASE_CHANNEL

Sentry & Feature Flags

  • SENTRY_DSN
  • SENTRY_SAMPLE_RATE
  • WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM
  • CUSTOM_WORKFLOW_FF_REQUEST_FORM

Redis

  • REDIS_DSN

Other

  • IMAGE_PULL_SECRETS
  • PRIVATE_KEY_SECRETS_ADAPTER_TYPE
  • PRIVATE_KEY_SECRETS_ADAPTER_GCP_REGION
  • SECRETS_ADAPTER_TYPE
  • SECRETS_ADAPTER_GCP_REGION

The preceding Secrets and ConfigMaps must be added to the following files:

File nameTypeResource nameNamespaceData keys
data-broker-env-cm.yamlConfigMapdata-broker-envapiJOB_STATUS_BUCKET_NAME, JOB_DB_BUCKET_NAME, BLOB_STORAGE_ADAPTER_TYPE
data-broker-env-secret.yamlSecretdata-broker-envapiBLOB_STORAGE_ADAPTER_GCP_SERVICE_ACCOUNT_KEY_JSON, BLOB_STORAGE_ADAPTER_REGION_NAME
dataplane-api-env-cm.yamlSecretdataplane-api-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME
etl-operator-env-cm.yamlConfigMapetl-operator-envetl-operatorBLOB_STORAGE_ADAPTER_BUCKET, JOB_STATUS_BUCKET_NAME, JOB_DB_BUCKET_NAME, BLOB_STORAGE_ADAPTER_TYPE, ENV, ENVIRONMENT, REDIS_DSN, ETL_API_BLOB_STORAGE_ADAPTER_BUCKET, ETL_API_BLOB_STORAGE_ADAPTER_TYPE, ETL_API_DB_REMOTE_BUCKET_NAME, ETL_API_JOB_STATUS_DEST_BUCKET_NAME (x2), ETL_BLOB_CACHE_BUCKET_NAME, IMAGE_PULL_SECRETS, JOB_ENV, JOB_ENVIRONMENT, JOB_OTEL_EXPORTER_OTLP_ENDPOINT, JOB_OTEL_METRICS_EXPORTER, JOB_OTEL_TRACES_EXPORTER, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER, OTEL_TRACES_EXPORTER, UNSTRUCTURED_API_URL
etl-operator-env-secret.yamlSecretetl-operator-envetl-operatorBLOB_STORAGE_ADAPTER_GCP_SERVICE_ACCOUNT_KEY_JSON, BLOB_STORAGE_ADAPTER_REGION_NAME,
frontend-env-cm.yamlConfigMapfrontend-envwwwAPI_BASE_URL, API_CLIENT_BASE_URL, API_URL, APM_SERVICE_NAME, APM_SERVICE_NAME_CLIENT, AUTH_STRATEGY, ENV, FRONTEND_BASE_URL, KEYCLOAK_CALLBACK_URL, KEYCLOAK_CLIENT_ID, KEYCLOAK_DOMAIN, KEYCLOAK_REALM, KEYCLOAK_SSL_ENABLED, KEYCLOAK_TRUST_ISSUER, PUBLIC_BASE_URL, PUBLIC_RELEASE_CHANNEL, SENTRY_DSN, SENTRY_SAMPLE_RATE, WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM, CUSTOM_WORKFLOW_FF_REQUEST_FORM
frontend-env-secret.yamlSecretfrontend-envwwwAPI_BEARER_TOKEN, KEYCLOAK_ADMIN_SECRET, KEYCLOAK_CLIENT_SECRET, SESSION_SECRET, SHARED_SECRET
keycloak-secret.yamlSecretphasetwo-keycloak-envwwwKEYCLOAK_ADMIN, KEYCLOAK_ADMIN_PASSWORD
platform-api-env-cm.yamlConfigMapplatform-api-envapiJWKS_URL, JWT_ISSUER, JWT_AUDIENCE, SINGLE_PLANE_DEPLOYMENT
platform-api-env-secret.yamlSecretplatform-api-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME, DB_DATABASE, JWT_SECRET_KEY, AUTH_STRATEGY
recommender-env-cm.yamlConfigMaprecommender-envrecommenderBLOB_STORAGE_ADAPTER_TYPE, ETL_BLOB_CACHE_BUCKET_NAME
recommender-env-secret.yamlSecretrecommender-envrecommenderBLOB_STORAGE_ADAPTER_GCP_SERVICE_ACCOUNT_KEY_JSON, BLOB_STORAGE_ADAPTER_REGION_NAME
secret-provider-api-env-cm.yamlConfigMapsecrets-provider-api-envsecretsENV, ENVIRONMENT, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER, OTEL_TRACES_EXPORTER, PRIVATE_KEY_SECRETS_ADAPTER_GCP_REGION, PRIVATE_KEY_SECRETS_ADAPTER_TYPE, SECRETS_ADAPTER_GCP_REGION, SECRETS_ADAPTER_TYPE
secret-provider-api-env-secret.yamlSecretsecrets-provider-api-envsecretsBLOB_STORAGE_ADAPTER_GCP_SERVICE_ACCOUNT_KEY_JSON, BLOB_STORAGE_ADAPTER_REGION_NAME
usage-collector-env-secret.yamlSecretusage-collector-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME, BLOB_STORAGE_ADAPTER_TYPE

For example, for the data-broker-env-cm.yaml ConfigMap file, the contents would look like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: data-broker-env
  namespace: api
data:
  JOB_STATUS_BUCKET_NAME: "<your-value>"
  JOB_DB_BUCKET_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_TYPE: "<your-value>"

For the data-broker-env-secret.yaml Secret file, the contents would look like this:

apiVersion: v1
kind: Secret
metadata:
  name: data-broker-env
  namespace: api
type: Opaque
stringData:
  BLOB_STORAGE_ADAPTER_GCP_SERVICE_ACCOUNT_KEY_JSON: "<your-value>"
  BLOB_STORAGE_ADAPTER_REGION_NAME: "<your-value>"