To proceed with a self-hosted deployment, your organization must first sign a self-hosting agreement with Unstructured.

If you have not yet signed this agreement, stop here, and begin the self-hosting agreement process by contacting your Unstructured sales representative, emailing Unstructured Sales at sales@unstructured.io, or filling out the contact form on the Unstructured website.

After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you must first set up your AWS account as follows.

Questions? Need help?

If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are, email Unstructured Sales at sales@unstructured.io, or fill out the contact form on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible.

Onboarding checklist

Set up the following infrastructure within your AWS account for Unstructured to deploy the Unstructured UI and API into.

VPC and networking

  • VPC

    • CIDR: 10.0.0.0/16 - Any CIDR should work, but make sure it has enough space.
    • DNS Hostnames: Enabled
    • DNS Support: Enabled
  • Internet Gateway

    • Attached to the VPC
  • Public Subnet

    • CIDR: 10.0.0.0/24
    • Public IP on launch: true
    • Availability Zone: ${region}a
  • NAT Gateway + Elastic IP

    • Lives in the public subnet
  • Private Subnets (x2)

    • CIDRs: 10.0.1.0/24, 10.0.2.0/24
    • AZs: ${region}a and ${region}b
  • Route Tables

    • Public: default route (0.0.0.0/0) via IGW
    • Private (x2): default route via NAT Gateway

IAM roles and policies

  • EKS Cluster Role

    • Trusts: eks.amazonaws.com

    • Attached policies:

      • AmazonEKSClusterPolicy
      • AmazonEKSVPCResourceController
  • EKS Node Group Role

    • Trusts: ec2.amazonaws.com, eks.amazonaws.com

    • Attached policies:

      • AmazonEKSWorkerNodePolicy
      • AmazonEKS_CNI_Policy
      • AmazonEC2ContainerRegistryReadOnly
  • OIDC Service Account IAM Roles (x3)

    • Namespaces: recommender, etl-operator, data-broker
    • Each role assumes via sts:AssumeRoleWithWebIdentity with OIDC provider
    • Each has an S3 policy allowing access to specific buckets (see below)

EKS cluster

  • EKS Control Plane

    • Version: 1.31 or greater
    • Subnet: Private subnets only
    • Public endpoint access: Enabled
    • Private endpoint access: Disabled
  • Node Group

    • Instance type: c5.4xlarge (or larger, depending on cost factors)
    • Disk size: 100 GB
    • Desired size: 2 (min 2, max 5)
    • Remote SSH access: Enabled (with generated SSH key)
    • SSH key: Key pair created and exported
  • Security Groups

    • EKS Cluster SG (implicitly created by AWS)
    • Node SG: Allows all traffic within cluster CIDR (10.0.0.0/16), self, and metadata IP
    • Egress: Allows all

Kubernetes add-ons

Installed via aws.eks.Addon:

  • EKS Pod Identity Agent

    • Version: v1.3.4-eksbuild.1
  • Metrics Server

    • Version: v0.7.2-eksbuild.1
  • EBS CSI Driver

    • Version: v1.38.1-eksbuild.2

    • Configured with:

      • Service account annotation: eks.amazonaws.com/role-arn
      • Pod identity access annotation

Storage class

  • Name: ebs-sc
  • Default: Yes
  • Provisioner: ebs.csi.aws.com
  • Parameters: type=gp3, encrypted=true
  • Volume Binding Mode: WaitForFirstConsumer

RDS

  • RDS Subnet Group

    • Uses the private subnets
  • RDS Instance

    • Engine: Postgres 16
    • Size: db.t3.micro
    • Allocated storage: 20 GB
    • Auth: Setup a Username and Password, keep secure.
    • Security group: Allows all traffic from 10.0.0.0/16 (keep in mind your CIDR group from the VPC)
    • DB name: postgres

S3 buckets

  • u10d-{stack_name}-etl-blob-cache
  • u10d-{stack_name}-etl-job-db
  • u10d-{stack_name}-etl-job-status
  • u10d-{stack_name}-job-files

All created with:

  • Versioning enabled
  • Server-side encryption (AES256)
  • Force destroy: true

Keys

  • SSH Key Pair (RSA 4096-bit)

    • Key exported as private_key (PEM)

Secrets and ConfigMaps

After your infrastructure is set up, but before Unstructured can deploy the Unstructured UI and API into your insfrastructure, Unstructured will need to know the values of the following Secrets and ConfigMaps. These must be provided to Unstructured as a set of YAML files in Kubernetes Secret and ConfigMap format.

The Secrets are as follows.

Blob storage credentials

  • BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID
  • BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY
  • BLOB_STORAGE_ADAPTER_REGION_NAME

Database credentials

  • DB_USERNAME
  • DB_PASSWORD
  • DB_HOST
  • DB_NAME
  • DB_DATABASE (used in platform-api only)

Authentication

  • JWT_SECRET_KEY
  • AUTH_STRATEGY (sometimes encoded, sometimes not)
  • SESSION_SECRET
  • SHARED_SECRET
  • KEYCLOAK_CLIENT_SECRET
  • KEYCLOAK_ADMIN_SECRET
  • KEYCLOAK_ADMIN
  • KEYCLOAK_ADMIN_PASSWORD
  • API_BEARER_TOKEN

The ConfigMaps are as follows.

Blob storage settings

  • BLOB_STORAGE_ADAPTER_TYPE (always s3 for AWS)
  • BLOB_STORAGE_ADAPTER_BUCKET
  • ETL_BLOB_CACHE_BUCKET_NAME
  • ETL_API_BLOB_STORAGE_ADAPTER_BUCKET
  • ETL_API_BLOB_STORAGE_ADAPTER_TYPE
  • ETL_API_DB_REMOTE_BUCKET_NAME
  • ETL_API_JOB_STATUS_DEST_BUCKET_NAME
  • JOB_STATUS_BUCKET_NAME
  • JOB_DB_BUCKET_NAME

Environment

  • ENV
  • ENVIRONMENT
  • JOB_ENV
  • JOB_ENVIRONMENT

Observability and OpenTelemetry (OTel)

  • JOB_OTEL_EXPORTER_OTLP_ENDPOINT
  • JOB_OTEL_METRICS_EXPORTER
  • JOB_OTEL_TRACES_EXPORTER
  • OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_METRICS_EXPORTER
  • OTEL_TRACES_EXPORTER

Unstructured API and authentication

  • UNSTRUCTURED_API_URL
  • JWKS_URL
  • JWT_ISSUER
  • JWT_AUDIENCE
  • SINGLE_PLANE_DEPLOYMENT

Front end and dashboard

  • API_BASE_URL
  • API_CLIENT_BASE_URL
  • API_URL
  • APM_SERVICE_NAME
  • APM_SERVICE_NAME_CLIENT
  • AUTH_STRATEGY
  • FRONTEND_BASE_URL
  • KEYCLOAK_CALLBACK_URL
  • KEYCLOAK_CLIENT_ID
  • KEYCLOAK_DOMAIN
  • KEYCLOAK_REALM
  • KEYCLOAK_SSL_ENABLED
  • KEYCLOAK_TRUST_ISSUER
  • PUBLIC_BASE_URL
  • PUBLIC_RELEASE_CHANNEL

Sentry and feature flags

  • SENTRY_DSN
  • SENTRY_SAMPLE_RATE
  • WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM
  • CUSTOM_WORKFLOW_FF_REQUEST_FORM

Redis

  • REDIS_DSN

Other

  • IMAGE_PULL_SECRETS
  • PRIVATE_KEY_SECRETS_ADAPTER_TYPE
  • PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION
  • SECRETS_ADAPTER_TYPE
  • SECRETS_ADAPTER_AWS_REGION

The preceding Secrets and ConfigMaps must be added to the following files:

File nameTypeResource nameNamespaceData keys
data-broker-env-cm.yamlConfigMapdata-broker-envapiJOB_STATUS_BUCKET_NAME, JOB_DB_BUCKET_NAME, BLOB_STORAGE_ADAPTER_TYPE
data-broker-env-secret.yamlSecretdata-broker-envapi BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID, BLOB_STORAGE_ADAPTER_REGION_NAME, BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY
dataplane-api-env-cm.yamlSecretdataplane-api-envapi DB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME
etl-operator-env-cm.yamlConfigMapetl-operator-envetl-operator BLOB_STORAGE_ADAPTER_BUCKET, JOB_STATUS_BUCKET_NAME, JOB_DB_BUCKET_NAME, BLOB_STORAGE_ADAPTER_TYPE, ENV, ENVIRONMENT, REDIS_DSN, ETL_API_BLOB_STORAGE_ADAPTER_BUCKET, ETL_API_BLOB_STORAGE_ADAPTER_TYPE, ETL_API_DB_REMOTE_BUCKET_NAME, ETL_API_JOB_STATUS_DEST_BUCKET_NAME (x2), ETL_BLOB_CACHE_BUCKET_NAME, IMAGE_PULL_SECRETS, JOB_ENV, JOB_ENVIRONMENT, JOB_OTEL_EXPORTER_OTLP_ENDPOINT, JOB_OTEL_METRICS_EXPORTER, JOB_OTEL_TRACES_EXPORTER, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER, OTEL_TRACES_EXPORTER, UNSTRUCTURED_API_URL
etl-operator-env-secret.yamlSecretetl-operator-envetl-operator BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID, BLOB_STORAGE_ADAPTER_REGION_NAME, BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY
frontend-env-cm.yamlConfigMapfrontend-envwwwAPI_BASE_URL, API_CLIENT_BASE_URL, API_URL, APM_SERVICE_NAME, APM_SERVICE_NAME_CLIENT, AUTH_STRATEGY, ENV, FRONTEND_BASE_URL, KEYCLOAK_CALLBACK_URL, KEYCLOAK_CLIENT_ID, KEYCLOAK_DOMAIN, KEYCLOAK_REALM, KEYCLOAK_SSL_ENABLED, KEYCLOAK_TRUST_ISSUER, PUBLIC_BASE_URL, PUBLIC_RELEASE_CHANNEL, SENTRY_DSN, SENTRY_SAMPLE_RATE, WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM, CUSTOM_WORKFLOW_FF_REQUEST_FORM
frontend-env-secret.yamlSecretfrontend-envwwwAPI_BEARER_TOKEN, KEYCLOAK_ADMIN_SECRET, KEYCLOAK_CLIENT_SECRET, SESSION_SECRET, SHARED_SECRET
keycloak-secret.yamlSecretphasetwo-keycloak-envwwwKEYCLOAK_ADMIN, KEYCLOAK_ADMIN_PASSWORD
platform-api-env-cm.yamlConfigMapplatform-api-envapiJWKS_URL, JWT_ISSUER, JWT_AUDIENCE, SINGLE_PLANE_DEPLOYMENT
platform-api-env-secret.yamlSecretplatform-api-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME, DB_DATABASE, JWT_SECRET_KEY, AUTH_STRATEGY
recommender-env-cm.yamlConfigMaprecommender-envrecommenderBLOB_STORAGE_ADAPTER_TYPE, ETL_BLOB_CACHE_BUCKET_NAME
recommender-env-secret.yamlSecretrecommender-envrecommenderBLOB_STORAGE_ADAPTER_ACCESS_KEY_ID, BLOB_STORAGE_ADAPTER_REGION_NAME, BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY
secret-provider-api-env-cm.yamlConfigMapsecrets-provider-api-envsecretsENV, ENVIRONMENT, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_METRICS_EXPORTER, OTEL_TRACES_EXPORTER, PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION, PRIVATE_KEY_SECRETS_ADAPTER_TYPE, SECRETS_ADAPTER_AWS_REGION, SECRETS_ADAPTER_TYPE
secret-provider-api-env-secret.yamlSecretsecrets-provider-api-envsecretsBLOB_STORAGE_ADAPTER_ACCESS_KEY_ID, BLOB_STORAGE_ADAPTER_REGION_NAME, BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY
usage-collector-env-secret.yamlSecretusage-collector-envapiDB_PASSWORD, DB_USERNAME, DB_HOST, DB_NAME, BLOB_STORAGE_ADAPTER_TYPE

For example, for the etl-operator-env-cm.yaml ConfigMap file, the contents would look like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: data-broker-env
  namespace: api
data:
  JOB_STATUS_BUCKET_NAME: "<your-value>"
  JOB_DB_BUCKET_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_TYPE: "<your-value>"

For the etl-operator-env-secret.yaml Secret file, the contents would look like this:

apiVersion: v1
kind: Secret
metadata:
  name: data-broker-env
  namespace: api
type: Opaque
stringData:
  BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID: "<your-value>"
  BLOB_STORAGE_ADAPTER_REGION_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY: "<your-value>"