AWS self-hosted onboarding checklist
To proceed with a self-hosted deployment, your organization must first sign a self-hosting agreement with Unstructured.
If you have not yet signed this agreement, stop here, and begin the self-hosting agreement process by contacting your Unstructured sales representative, emailing Unstructured Sales at sales@unstructured.io, or filling out the contact form on the Unstructured website.
After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you must first set up your AWS account as follows.
Questions? Need help?
If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are, email Unstructured Sales at sales@unstructured.io, or fill out the contact form on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible.
Onboarding checklist
Set up the following infrastructure within your AWS account for Unstructured to deploy the Unstructured UI and API into.
VPC and networking
-
VPC
- CIDR:
10.0.0.0/16
- Any CIDR should work, but make sure it has enough space. - DNS Hostnames: Enabled
- DNS Support: Enabled
- CIDR:
-
Internet Gateway
- Attached to the VPC
-
Public Subnet
- CIDR:
10.0.0.0/24
- Public IP on launch: true
- Availability Zone:
${region}a
- CIDR:
-
NAT Gateway + Elastic IP
- Lives in the public subnet
-
Private Subnets (x2)
- CIDRs:
10.0.1.0/24
,10.0.2.0/24
- AZs:
${region}a
and${region}b
- CIDRs:
-
Route Tables
- Public: default route (
0.0.0.0/0
) via IGW - Private (x2): default route via NAT Gateway
- Public: default route (
IAM roles and policies
-
EKS Cluster Role
-
Trusts:
eks.amazonaws.com
-
Attached policies:
AmazonEKSClusterPolicy
AmazonEKSVPCResourceController
-
-
EKS Node Group Role
-
Trusts:
ec2.amazonaws.com
,eks.amazonaws.com
-
Attached policies:
AmazonEKSWorkerNodePolicy
AmazonEKS_CNI_Policy
AmazonEC2ContainerRegistryReadOnly
-
-
OIDC Service Account IAM Roles (x3)
- Namespaces:
recommender
,etl-operator
,data-broker
- Each role assumes via
sts:AssumeRoleWithWebIdentity
with OIDC provider - Each has an S3 policy allowing access to specific buckets (see below)
- Namespaces:
EKS cluster
-
EKS Control Plane
- Version:
1.31
or greater - Subnet: Private subnets only
- Public endpoint access: Enabled
- Private endpoint access: Disabled
- Version:
-
Node Group
- Instance type:
c5.4xlarge
(or larger, depending on cost factors) - Disk size: 100 GB
- Desired size: 2 (min 2, max 5)
- Remote SSH access: Enabled (with generated SSH key)
- SSH key: Key pair created and exported
- Instance type:
-
Security Groups
- EKS Cluster SG (implicitly created by AWS)
- Node SG: Allows all traffic within cluster CIDR (
10.0.0.0/16
), self, and metadata IP - Egress: Allows all
Kubernetes add-ons
Installed via aws.eks.Addon
:
-
EKS Pod Identity Agent
- Version:
v1.3.4-eksbuild.1
- Version:
-
Metrics Server
- Version:
v0.7.2-eksbuild.1
- Version:
-
EBS CSI Driver
-
Version:
v1.38.1-eksbuild.2
-
Configured with:
- Service account annotation:
eks.amazonaws.com/role-arn
- Pod identity access annotation
- Service account annotation:
-
Storage class
- Name:
ebs-sc
- Default: Yes
- Provisioner:
ebs.csi.aws.com
- Parameters:
type=gp3
,encrypted=true
- Volume Binding Mode:
WaitForFirstConsumer
RDS
-
RDS Subnet Group
- Uses the private subnets
-
RDS Instance
- Engine: Postgres 16
- Size:
db.t3.micro
- Allocated storage: 20 GB
- Auth: Setup a Username and Password, keep secure.
- Security group: Allows all traffic from
10.0.0.0/16
(keep in mind your CIDR group from the VPC) - DB name:
postgres
S3 buckets
u10d-{stack_name}-etl-blob-cache
u10d-{stack_name}-etl-job-db
u10d-{stack_name}-etl-job-status
u10d-{stack_name}-job-files
All created with:
- Versioning enabled
- Server-side encryption (AES256)
- Force destroy: true
Keys
-
SSH Key Pair (RSA 4096-bit)
- Key exported as
private_key
(PEM)
- Key exported as
Secrets and ConfigMaps
After your infrastructure is set up, but before Unstructured can deploy the Unstructured UI and API into your insfrastructure, Unstructured will need to know the values of the following Secrets and ConfigMaps. These must be provided to Unstructured as a set of YAML files in Kubernetes Secret and ConfigMap format.
The Secrets are as follows.
Blob storage credentials
BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID
BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY
BLOB_STORAGE_ADAPTER_REGION_NAME
Database credentials
DB_USERNAME
DB_PASSWORD
DB_HOST
DB_NAME
DB_DATABASE
(used inplatform-api
only)
Authentication
JWT_SECRET_KEY
AUTH_STRATEGY
(sometimes encoded, sometimes not)SESSION_SECRET
SHARED_SECRET
KEYCLOAK_CLIENT_SECRET
KEYCLOAK_ADMIN_SECRET
KEYCLOAK_ADMIN
KEYCLOAK_ADMIN_PASSWORD
API_BEARER_TOKEN
The ConfigMaps are as follows.
Blob storage settings
BLOB_STORAGE_ADAPTER_TYPE
(alwayss3
for AWS)BLOB_STORAGE_ADAPTER_BUCKET
ETL_BLOB_CACHE_BUCKET_NAME
ETL_API_BLOB_STORAGE_ADAPTER_BUCKET
ETL_API_BLOB_STORAGE_ADAPTER_TYPE
ETL_API_DB_REMOTE_BUCKET_NAME
ETL_API_JOB_STATUS_DEST_BUCKET_NAME
JOB_STATUS_BUCKET_NAME
JOB_DB_BUCKET_NAME
Environment
ENV
ENVIRONMENT
JOB_ENV
JOB_ENVIRONMENT
Observability and OpenTelemetry (OTel)
JOB_OTEL_EXPORTER_OTLP_ENDPOINT
JOB_OTEL_METRICS_EXPORTER
JOB_OTEL_TRACES_EXPORTER
OTEL_EXPORTER_OTLP_ENDPOINT
OTEL_METRICS_EXPORTER
OTEL_TRACES_EXPORTER
Unstructured API and authentication
UNSTRUCTURED_API_URL
JWKS_URL
JWT_ISSUER
JWT_AUDIENCE
SINGLE_PLANE_DEPLOYMENT
Front end and dashboard
API_BASE_URL
API_CLIENT_BASE_URL
API_URL
APM_SERVICE_NAME
APM_SERVICE_NAME_CLIENT
AUTH_STRATEGY
FRONTEND_BASE_URL
KEYCLOAK_CALLBACK_URL
KEYCLOAK_CLIENT_ID
KEYCLOAK_DOMAIN
KEYCLOAK_REALM
KEYCLOAK_SSL_ENABLED
KEYCLOAK_TRUST_ISSUER
PUBLIC_BASE_URL
PUBLIC_RELEASE_CHANNEL
Sentry and feature flags
SENTRY_DSN
SENTRY_SAMPLE_RATE
WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM
CUSTOM_WORKFLOW_FF_REQUEST_FORM
Redis
REDIS_DSN
Other
IMAGE_PULL_SECRETS
PRIVATE_KEY_SECRETS_ADAPTER_TYPE
PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION
SECRETS_ADAPTER_TYPE
SECRETS_ADAPTER_AWS_REGION
The preceding Secrets and ConfigMaps must be added to the following files:
File name | Type | Resource name | Namespace | Data keys |
---|---|---|---|---|
data-broker-env-cm.yaml | ConfigMap | data-broker-env | api | JOB_STATUS_BUCKET_NAME , JOB_DB_BUCKET_NAME , BLOB_STORAGE_ADAPTER_TYPE |
data-broker-env-secret.yaml | Secret | data-broker-env | api | BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID , BLOB_STORAGE_ADAPTER_REGION_NAME , BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY |
dataplane-api-env-cm.yaml | Secret | dataplane-api-env | api | DB_PASSWORD , DB_USERNAME , DB_HOST , DB_NAME |
etl-operator-env-cm.yaml | ConfigMap | etl-operator-env | etl-operator | BLOB_STORAGE_ADAPTER_BUCKET , JOB_STATUS_BUCKET_NAME , JOB_DB_BUCKET_NAME , BLOB_STORAGE_ADAPTER_TYPE , ENV , ENVIRONMENT , REDIS_DSN , ETL_API_BLOB_STORAGE_ADAPTER_BUCKET , ETL_API_BLOB_STORAGE_ADAPTER_TYPE , ETL_API_DB_REMOTE_BUCKET_NAME , ETL_API_JOB_STATUS_DEST_BUCKET_NAME (x2) , ETL_BLOB_CACHE_BUCKET_NAME , IMAGE_PULL_SECRETS , JOB_ENV , JOB_ENVIRONMENT , JOB_OTEL_EXPORTER_OTLP_ENDPOINT , JOB_OTEL_METRICS_EXPORTER , JOB_OTEL_TRACES_EXPORTER , OTEL_EXPORTER_OTLP_ENDPOINT , OTEL_METRICS_EXPORTER , OTEL_TRACES_EXPORTER , UNSTRUCTURED_API_URL |
etl-operator-env-secret.yaml | Secret | etl-operator-env | etl-operator | BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID , BLOB_STORAGE_ADAPTER_REGION_NAME , BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY |
frontend-env-cm.yaml | ConfigMap | frontend-env | www | API_BASE_URL , API_CLIENT_BASE_URL , API_URL , APM_SERVICE_NAME , APM_SERVICE_NAME_CLIENT , AUTH_STRATEGY , ENV , FRONTEND_BASE_URL , KEYCLOAK_CALLBACK_URL , KEYCLOAK_CLIENT_ID , KEYCLOAK_DOMAIN , KEYCLOAK_REALM , KEYCLOAK_SSL_ENABLED , KEYCLOAK_TRUST_ISSUER , PUBLIC_BASE_URL , PUBLIC_RELEASE_CHANNEL , SENTRY_DSN , SENTRY_SAMPLE_RATE , WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM , CUSTOM_WORKFLOW_FF_REQUEST_FORM |
frontend-env-secret.yaml | Secret | frontend-env | www | API_BEARER_TOKEN , KEYCLOAK_ADMIN_SECRET , KEYCLOAK_CLIENT_SECRET , SESSION_SECRET , SHARED_SECRET |
keycloak-secret.yaml | Secret | phasetwo-keycloak-env | www | KEYCLOAK_ADMIN , KEYCLOAK_ADMIN_PASSWORD |
platform-api-env-cm.yaml | ConfigMap | platform-api-env | api | JWKS_URL , JWT_ISSUER , JWT_AUDIENCE , SINGLE_PLANE_DEPLOYMENT |
platform-api-env-secret.yaml | Secret | platform-api-env | api | DB_PASSWORD , DB_USERNAME , DB_HOST , DB_NAME , DB_DATABASE , JWT_SECRET_KEY , AUTH_STRATEGY |
recommender-env-cm.yaml | ConfigMap | recommender-env | recommender | BLOB_STORAGE_ADAPTER_TYPE , ETL_BLOB_CACHE_BUCKET_NAME |
recommender-env-secret.yaml | Secret | recommender-env | recommender | BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID , BLOB_STORAGE_ADAPTER_REGION_NAME , BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY |
secret-provider-api-env-cm.yaml | ConfigMap | secrets-provider-api-env | secrets | ENV , ENVIRONMENT , OTEL_EXPORTER_OTLP_ENDPOINT , OTEL_METRICS_EXPORTER , OTEL_TRACES_EXPORTER , PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION , PRIVATE_KEY_SECRETS_ADAPTER_TYPE , SECRETS_ADAPTER_AWS_REGION , SECRETS_ADAPTER_TYPE |
secret-provider-api-env-secret.yaml | Secret | secrets-provider-api-env | secrets | BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID , BLOB_STORAGE_ADAPTER_REGION_NAME , BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY |
usage-collector-env-secret.yaml | Secret | usage-collector-env | api | DB_PASSWORD , DB_USERNAME , DB_HOST , DB_NAME , BLOB_STORAGE_ADAPTER_TYPE |
For example, for the etl-operator-env-cm.yaml
ConfigMap file, the contents would look like this:
For the etl-operator-env-secret.yaml
Secret file, the contents would look like this:
Was this page helpful?