> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured Business in-VPC on Amazon Web Services (AWS) - onboarding checklist

<Note>
  The following information applies only to in-VPC deployments of [Unstructured Business](/business/overview).

  For dedicated instance deployments of Unstructured Business, contact your Unstructured sales representative,
  or email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

  For dedicated instance deployments of Unstructured Business that also use PrivateLink, see
  the [onboarding process for AWS PrivateLink](/business/aws/dedicated-instance-privatelink).
</Note>

After your organization has signed the **Business** account agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:

* [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
* [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.

## Questions? Need help?

If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are,
email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io), and a member of the Unstructured sales or technical enablement teams
will get back to you as soon as possible.

## Do it all for me

If you want Unstructured to set up the required infrastructure for you in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service principal in your AWS account that has the following required permissions.

### Core networking permissions

For VPC and subnet management:

* `ec2:CreateVpc`
* `ec2:CreateSubnet`
* `ec2:CreateRouteTable`
* `ec2:CreateInternetGateway`
* `ec2:CreateNatGateway`
* `ec2:ModifyVpcAttribute` (for DNS settings)
* `ec2:AssociateRouteTable`, `ec2:CreateRoute` (for public and private route tables)
* `ec2:AllocateAddress` (for Elastic IP assignment to the NAT Gateway)

For security group rules:

* `ec2:AuthorizeSecurityGroupIngress/Egress` (to configure cluster and node security groups to allow VPC CIDR traffic)

### EKS permissions

For the cluster role:

* Attach the managed policies `AmazonEKSClusterPolicy` and `AmazonEKSVPCResourceController` to a role with `sts:AssumeRole` trust for `eks.amazonaws.com`

For the node group role:

Attach these managed policies:

* `AmazonEKSWorkerNodePolicy` (for node operations)
* `AmazonEKS_CNI_Policy` (for networking)
* `AmazonEC2ContainerRegistryReadOnly` (for ECR access)

For OIDC integration:

* `iam:CreateOpenIDConnectProvider` (to associate the EKS cluster with IAM OIDC)
* `iam:CreateRole` + `iam:AttachRolePolicy` (for service accounts in the `recommender`, `etl-operator`, and `data-broker` namespaces)

### Storage and database

These permissions:

* `s3:CreateBucket`
* `s3:PutBucketVersioning`
* `s3:PutBucketEncryption`

For these S3 buckets:

* `u10d-*-etl-blob-cache`
* `u10d-*-etl-job-db`
* `u10d-*-etl-job-status`
* `u10d-*-job-files`

For RDS:

* `rds:CreateDBInstance`
* `rds:CreateDBSubnetGroup`
* `rds:CreateDBSecurityGroup` + `ec2:AuthorizeSecurityGroupIngress` (to allow VPC CIDR access)

### Add-ons and utilities

For the EBS CSI Driver:

* `eks:CreateAddon` with IAM role attachment permissions for the `ebs.csi.aws.com` service account

For the SSH Key:

* `ec2:CreateKeyPair` + `ec2:ExportKeyPair` (for node group remote access)

### Cross-service requirements

* For IAM: `iam:PassRole` (to assign roles to EKS, RDS, and S3)
* For KMS: `kms:CreateKey` (if using CMK for S3 and RDS encryption)
* For CloudFormation: `cloudformation:*`

For least privilege, scope resource ARNs in policies (for example, restrict S3 bucket names with wildcards such as `u10d-*-etl*`).
The EKS Pod Identity Agent requires `eks-auth:AssumeRoleForPodIdentity` permission on node roles when used with IRSA.

## Bring my own infrastructure

If you want to set up the required infrastructure yourself, set things up as follows within your AWS account for Unstructured to deploy the Unstructured UI and API into.

You must also provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service principal in your AWS account that has access to the target Amazon Elastic Kubernetes Service (EKS) cluster to deploy the
Unstructured UI and API into.

### VPC and networking

* **VPC**

  * CIDR: `10.0.0.0/16` - Any CIDR should work, but make sure it has enough space.
  * DNS Hostnames: Enabled
  * DNS Support: Enabled

* **Internet Gateway**

  * Attached to the VPC

* **Public Subnet**

  * CIDR: `10.0.0.0/24`
  * Public IP on launch: true
  * Availability Zone: `${region}a`

* **NAT Gateway + Elastic IP**

  * Lives in the public subnet

* **Private Subnets (x2)**

  * CIDRs: `10.0.1.0/24`, `10.0.2.0/24`
  * AZs: `${region}a` and `${region}b`

* **Route Tables**

  * Public: default route (`0.0.0.0/0`) via IGW
  * Private (x2): default route via NAT Gateway

### **IAM roles and policies**

* **EKS Cluster Role**

  * Trusts: `eks.amazonaws.com`
  * Attached policies:

    * `AmazonEKSClusterPolicy`
    * `AmazonEKSVPCResourceController`

* **EKS Node Group Role**

  * Trusts: `ec2.amazonaws.com`, `eks.amazonaws.com`
  * Attached policies:

    * `AmazonEKSWorkerNodePolicy`
    * `AmazonEKS_CNI_Policy`
    * `AmazonEC2ContainerRegistryReadOnly`

* **OIDC Service Account IAM Roles (x3)**

  * Namespaces: `recommender`, `etl-operator`, `data-broker`
  * Each role assumes via `sts:AssumeRoleWithWebIdentity` with OIDC provider
  * Each has an S3 policy allowing access to specific buckets (see below)

### **EKS cluster**

* **EKS Control Plane**

  * Version: `1.31` or greater
  * Subnet: Private subnets only
  * Public endpoint access: Enabled
  * Private endpoint access: Disabled

* **Node Group**

  * Instance type: `c5.4xlarge` (or larger, depending on cost factors)
  * Disk size: 100 GB
  * Desired size: 2 (min 2, max 5)
  * Remote SSH access: Enabled (with generated SSH key)
  * SSH key: Key pair created and exported

* **Security Groups**

  * EKS Cluster SG (implicitly created by AWS)
  * Node SG: Allows all traffic within cluster CIDR (`10.0.0.0/16`), self, and metadata IP
  * Egress: Allows all

#### **Kubernetes add-ons**

Installed via `aws.eks.Addon`:

* **EKS Pod Identity Agent**

  * Version: `v1.3.4-eksbuild.1`

* **Metrics Server**

  * Version: `v0.7.2-eksbuild.1`

* **EBS CSI Driver**

  * Version: `v1.38.1-eksbuild.2`
  * Configured with:

    * Service account annotation: `eks.amazonaws.com/role-arn`
    * Pod identity access annotation

#### **Storage class**

* Name: `ebs-sc`
* Default: Yes
* Provisioner: `ebs.csi.aws.com`
* Parameters: `type=gp3`, `encrypted=true`
* Volume Binding Mode: `WaitForFirstConsumer`

### **RDS**

* **RDS Subnet Group**

  * Uses the private subnets

* **RDS Instance**

  * Engine: Postgres 16
  * Size: `db.t3.micro`
  * Allocated storage: 20 GB
  * Auth: Setup a Username and Password, keep secure.
  * Security group: Allows all traffic from `10.0.0.0/16` (keep in mind your CIDR group from the VPC)
  * DB name: `postgres`

### **S3 buckets**

* `u10d-{stack_name}-etl-blob-cache`
* `u10d-{stack_name}-etl-job-db`
* `u10d-{stack_name}-etl-job-status`
* `u10d-{stack_name}-job-files`

All created with:

* Versioning enabled
* Server-side encryption (AES256)
* Force destroy: true

### **Keys**

* **SSH Key Pair** (RSA 4096-bit)

  * Key exported as `private_key` (PEM)

### Secrets and ConfigMaps

After your infrastructure is set up, but before Unstructured can deploy the Unstructured UI and API into your insfrastructure,
Unstructured will need to know the values of the following Secrets and ConfigMaps. These must be provided to Unstructured as a
set of YAML files in Kubernetes [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) and
[ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) format.

The Secrets are as follows.

#### **Blob storage credentials**

* `BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`
* `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`
* `BLOB_STORAGE_ADAPTER_REGION_NAME`

#### **Database credentials**

* `DB_USERNAME`
* `DB_PASSWORD`
* `DB_HOST`
* `DB_NAME`
* `DB_DATABASE` (used in `platform-api` only)

#### **Authentication**

* `JWT_SECRET_KEY`
* `AUTH_STRATEGY` (sometimes encoded, sometimes not)
* `SESSION_SECRET`
* `SHARED_SECRET`
* `KEYCLOAK_CLIENT_SECRET`
* `KEYCLOAK_ADMIN_SECRET`
* `KEYCLOAK_ADMIN`
* `KEYCLOAK_ADMIN_PASSWORD`
* `API_BEARER_TOKEN`

The ConfigMaps are as follows.

#### **Blob storage settings**

* `BLOB_STORAGE_ADAPTER_TYPE` (always `s3` for AWS)
* `BLOB_STORAGE_ADAPTER_BUCKET`
* `ETL_BLOB_CACHE_BUCKET_NAME`
* `ETL_API_BLOB_STORAGE_ADAPTER_BUCKET`
* `ETL_API_BLOB_STORAGE_ADAPTER_TYPE`
* `ETL_API_DB_REMOTE_BUCKET_NAME`
* `ETL_API_JOB_STATUS_DEST_BUCKET_NAME`
* `JOB_STATUS_BUCKET_NAME`
* `JOB_DB_BUCKET_NAME`

#### **Environment**

* `ENV`
* `ENVIRONMENT`
* `JOB_ENV`
* `JOB_ENVIRONMENT`

#### **Observability and OpenTelemetry (OTel)**

* `JOB_OTEL_EXPORTER_OTLP_ENDPOINT`
* `JOB_OTEL_METRICS_EXPORTER`
* `JOB_OTEL_TRACES_EXPORTER`
* `OTEL_EXPORTER_OTLP_ENDPOINT`
* `OTEL_METRICS_EXPORTER`
* `OTEL_TRACES_EXPORTER`

#### **Unstructured API and authentication**

* `UNSTRUCTURED_API_URL`
* `JWKS_URL`
* `JWT_ISSUER`
* `JWT_AUDIENCE`
* `SINGLE_PLANE_DEPLOYMENT`

#### **Front end and dashboard**

* `API_BASE_URL`
* `API_CLIENT_BASE_URL`
* `API_URL`
* `APM_SERVICE_NAME`
* `APM_SERVICE_NAME_CLIENT`
* `AUTH_STRATEGY`
* `FRONTEND_BASE_URL`
* `KEYCLOAK_CALLBACK_URL`
* `KEYCLOAK_CLIENT_ID`
* `KEYCLOAK_DOMAIN`
* `KEYCLOAK_REALM`
* `KEYCLOAK_SSL_ENABLED`
* `KEYCLOAK_TRUST_ISSUER`
* `PUBLIC_BASE_URL`
* `PUBLIC_RELEASE_CHANNEL`

#### **Sentry and feature flags**

* `SENTRY_DSN`
* `SENTRY_SAMPLE_RATE`
* `WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM`
* `CUSTOM_WORKFLOW_FF_REQUEST_FORM`

#### **Redis**

* `REDIS_DSN`

#### **Other**

* `IMAGE_PULL_SECRETS`
* `PRIVATE_KEY_SECRETS_ADAPTER_TYPE`
* `PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION`
* `SECRETS_ADAPTER_TYPE`
* `SECRETS_ADAPTER_AWS_REGION`

The preceding Secrets and ConfigMaps must be added to the following files:

| File name                             | Type      | Resource name              | Namespace      | Data keys                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| ------------------------------------- | --------- | -------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `data-broker-env-cm.yaml`             | ConfigMap | `data-broker-env`          | `api`          | `JOB_STATUS_BUCKET_NAME`, `JOB_DB_BUCKET_NAME`, `BLOB_STORAGE_ADAPTER_TYPE`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `data-broker-env-secret.yaml`         | Secret    | `data-broker-env`          | `api`          | ` BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `dataplane-api-env-cm.yaml`           | Secret    | `dataplane-api-env`        | `api`          | ` DB_PASSWORD`, `DB_USERNAME`, `DB_HOST`, `DB_NAME`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `etl-operator-env-cm.yaml`            | ConfigMap | `etl-operator-env`         | `etl-operator` | ` BLOB_STORAGE_ADAPTER_BUCKET`, `JOB_STATUS_BUCKET_NAME`, `JOB_DB_BUCKET_NAME`, `BLOB_STORAGE_ADAPTER_TYPE`, `ENV`, `ENVIRONMENT`, `REDIS_DSN`, `ETL_API_BLOB_STORAGE_ADAPTER_BUCKET`, `ETL_API_BLOB_STORAGE_ADAPTER_TYPE`, `ETL_API_DB_REMOTE_BUCKET_NAME`, `ETL_API_JOB_STATUS_DEST_BUCKET_NAME (x2)`, `ETL_BLOB_CACHE_BUCKET_NAME`, `IMAGE_PULL_SECRETS`, `JOB_ENV`, `JOB_ENVIRONMENT`, `JOB_OTEL_EXPORTER_OTLP_ENDPOINT`, `JOB_OTEL_METRICS_EXPORTER`, `JOB_OTEL_TRACES_EXPORTER`, `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_METRICS_EXPORTER`, `OTEL_TRACES_EXPORTER`, `UNSTRUCTURED_API_URL` |
| `etl-operator-env-secret.yaml`        | Secret    | `etl-operator-env`         | `etl-operator` | ` BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `frontend-env-cm.yaml`                | ConfigMap | `frontend-env`             | `www`          | `API_BASE_URL`, `API_CLIENT_BASE_URL`, `API_URL`, `APM_SERVICE_NAME`, `APM_SERVICE_NAME_CLIENT`, `AUTH_STRATEGY`, `ENV`, `FRONTEND_BASE_URL`, `KEYCLOAK_CALLBACK_URL`, `KEYCLOAK_CLIENT_ID`, `KEYCLOAK_DOMAIN`, `KEYCLOAK_REALM`, `KEYCLOAK_SSL_ENABLED`, `KEYCLOAK_TRUST_ISSUER`, `PUBLIC_BASE_URL`, `PUBLIC_RELEASE_CHANNEL`, `SENTRY_DSN`, `SENTRY_SAMPLE_RATE`, `WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM`, `CUSTOM_WORKFLOW_FF_REQUEST_FORM`                                                                                                                                                 |
| `frontend-env-secret.yaml`            | Secret    | `frontend-env`             | `www`          | `API_BEARER_TOKEN`, `KEYCLOAK_ADMIN_SECRET`, `KEYCLOAK_CLIENT_SECRET`, `SESSION_SECRET`, `SHARED_SECRET`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `keycloak-secret.yaml`                | Secret    | `phasetwo-keycloak-env`    | `www`          | `KEYCLOAK_ADMIN`, `KEYCLOAK_ADMIN_PASSWORD`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `platform-api-env-cm.yaml`            | ConfigMap | `platform-api-env`         | `api`          | `JWKS_URL`, `JWT_ISSUER`, `JWT_AUDIENCE`, `SINGLE_PLANE_DEPLOYMENT`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `platform-api-env-secret.yaml`        | Secret    | `platform-api-env`         | `api`          | `DB_PASSWORD`, `DB_USERNAME`, `DB_HOST`, `DB_NAME`, `DB_DATABASE`, `JWT_SECRET_KEY`, `AUTH_STRATEGY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `recommender-env-cm.yaml`             | ConfigMap | `recommender-env`          | `recommender`  | `BLOB_STORAGE_ADAPTER_TYPE`, `ETL_BLOB_CACHE_BUCKET_NAME`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `recommender-env-secret.yaml`         | Secret    | `recommender-env`          | `recommender`  | `BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `secret-provider-api-env-cm.yaml`     | ConfigMap | `secrets-provider-api-env` | `secrets`      | `ENV`, `ENVIRONMENT`, `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_METRICS_EXPORTER`, `OTEL_TRACES_EXPORTER`, `PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION`, `PRIVATE_KEY_SECRETS_ADAPTER_TYPE`, `SECRETS_ADAPTER_AWS_REGION`, `SECRETS_ADAPTER_TYPE`                                                                                                                                                                                                                                                                                                                                                      |
| `secret-provider-api-env-secret.yaml` | Secret    | `secrets-provider-api-env` | `secrets`      | `BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `usage-collector-env-secret.yaml`     | Secret    | `usage-collector-env`      | `api`          | `DB_PASSWORD`, `DB_USERNAME`, `DB_HOST`, `DB_NAME`, `BLOB_STORAGE_ADAPTER_TYPE`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |

For example, for the `etl-operator-env-cm.yaml` [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) file, the contents would look like this:

```yaml  theme={null}
apiVersion: v1
kind: ConfigMap
metadata:
  name: data-broker-env
  namespace: api
data:
  JOB_STATUS_BUCKET_NAME: "<your-value>"
  JOB_DB_BUCKET_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_TYPE: "<your-value>"
```

For the `etl-operator-env-secret.yaml` [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) file, the contents would look like this:

```yaml  theme={null}
apiVersion: v1
kind: Secret
metadata:
  name: data-broker-env
  namespace: api
type: Opaque
stringData:
  BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID: "<your-value>"
  BLOB_STORAGE_ADAPTER_REGION_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY: "<your-value>"
```
