You will need:

  1. An AWS account:

  2. IAM permissions: In a later step, AWS CloudFormation creates required infrastructure in your account. To learn how to create the associated role, see Creating IAM Roles with AWS CloudFormation.

  3. SSH key pair: For secure access to the Amazon EC2 instance that CloudFormation creates in a later step, create an SSH key pair. To learn how, see Create a key pair for your Amazon EC2 instance.

Part I: Setting up the Virtual Private Cloud (VPC)

Note: If you have already configured a Virtual Private Cloud (VPC) for your organization that meets the requirements for deploying the Unstructured API, you may skip this part and proceed to the Part II. Ensure that your existing VPC setup includes the necessary subnets, internet gateway, and route tables as outlined in this guide.

In Part I, you will construct a resilient and secure infrastructure within AWS by setting up a Virtual Private Cloud (VPC). Your VPC will encompass a dual-tiered subnet model consisting of both public and private subnets across multiple Availability Zones (AZs).

You will establish the foundational network structure for deploying the Unstructured API by creating two public subnets and one private subnet within your VPC. The public subnets will host resources that require direct access to the internet, such as a load balancer, enabling them to communicate with external users. The private subnet is designed for resources that should not be directly accessible from the internet, like EC2 Compute Engine.

Infrastructure Diagram

  1. Access the VPC dashboard:

    a. In the AWS Management Console, in the top menu bar, click Services > Networking & Content Delivery > VPC.

    b. In the sidebar, click Your VPCs, and then click Create VPC.

  2. Create the VPC:

    a. Select VPC only.

    b. Enter a Name tag for your VPC.

    c. Specify the IPv4 CIDR block (for example, 10.0.0.0/16).

    d. You may leave IPv6 CIDR block, Tenancy, and Tags settings at their defaults.

    e. Click Create VPC.

create vpc

  1. Create the subnets:

    a. After creating the VPC, in the sidebar, click Subnets.

    b. Click Create subnet.

    c. In the VPC ID dropdown menu. select the VPC that you just created.

    d. For the first public subnet:

    • Enter a Subnet name.

    • Select an Availability Zone.

    • Specify the IPv4 CIDR block (for exampple, 10.0.0.0/16).

    • Specify the IPv4 subnet CIDR block (for example, 10.0.1.0/24).

    • You may leave the Tags setting at its default.

    • Click Add new subnet. (Do not click Create subnet yet.)

    e. Repeat the process for the second public subnet with a different Availability Zone and IPv4 subnet CIDR block (for example, 10.0.2.0/24).

    • Note: Each subnet must reside entirely within one Availability Zone and cannot span zones. If you specify the same Availability Zone or IPv4 subnet CIDR block as the first public subnet, AWS CloudFormation might fail in a later step.

    • To learn more, see Subnet basics.

    • Click Add new subnet. (Do not click Create subnet yet.)

    f. Repeat the process for the private subnet with a different Availability Zone and IPv4 subnet CIDR block (for example, 10.0.3.0/24).

    • Note: Each subnet must reside entirely within one Availability Zone and cannot span zones. If you specify the same Availability Zone or IPv4 subnet CIDR block as the first or second public subnets, AWS CloudFormation might fail in a later step.

    g. Click Create subnet.

create subnet

  1. Create the internet gateway (for the public subnets):

    a. In the sidebar, click Internet gateways.

    b. Click Create internet gateway, enter a Name tag, and click Create internet gateway.

    c. In the sidebar, click Internet gateways again.

    d. Click the Internet gateway ID for the internet gateway that you just created.

    e. Click Actions > Attach to VPC.

    f. In the Available VPCs dropdown list, select the VPC from Step 2 - Create the VPC.

    g. Click Attach internet gateway.

create internet gateway

  1. Set up route tables (for the public subnets):

    AWS automatically created a default route table in Step 3 - Create the subnets. To tailor your network architecture, you will create a new route table specifically for your public subnets, which will include a route to the internet gateway from Step 4 - Create the internet gateway (for the public subnets).

    a. In the sidebar, click Route tables.

    b. Click Create route table.

    c. Enter a Name.

    d. Select the VPC from Step 2 - Create the VPC.

    e. Click Create route table.

create route table

  1. Associate public subnets to the route table and internet gateway:

    a. Connect the public subnets to the route table from Step 5 - Set up route tables (for the public subnets):

    • In the sidebar, click Subnets.

    • Select the first public subnet from Step 3 - Create the subnets.

    • Click Actions > Edit route table association.

    • In the Route table ID dropdown list, select the route table from Step 5 - Set up route tables (for the public subnets), and then click Save.

    • Repeat the process for the second public subnet.

    b. Now, you’ll ensure that the two public subnets can access the internet by connecting the route table to the internet gateway:

    • In the sidebar, click Route tables.

    • Select the route table from Step 5 - Set up route tables (for the public subnets).

    • Click Actions > Edit routes.

    • Click Add route, in the destination box, enter 0.0.0.0/0, which represents all IP addresses. In the Target dropdown list, select Internet Gateway, and select the internet gateway from Step 4 - Create the internet gateway (for the public subnets).

    • Click Save changes to establish the route, granting internet access to the first and second public subnets at the same time.

    c. For the private subnet:

    • In the sidebar, click Subnets.

    • Select the private subnet from Step 3 - Create the subnets.

    • Click Actions > Edit route table association.

    • In the Route table ID dropdown list, select the main route table, or create and then select a new route table without a route to the internet gateway.

    • Click Save.

connect public subnet to route table edit routes

  1. Inspect the VPC resource map:

    You can check the configurations from the resource maps on the VPC details dashboard by clicking Your VPCs in the sidebar, clicking the VPC ID for your VPC, and then clicking the Resource map tab.

VPC Resource Maps

Part II: Deploying the Unstructured API from the AWS Marketplace

  1. Go to the Unstructured API page on AWS Marketplace:

    a. Go to the Unstructured API product page in the AWS Marketplace.

    b. Click Continue to Subscribe.

    c. Review the terms and conditions.

    d. Click Continue to Configuration.

Unstructured API on AWS Marketplace

  1. Configure the CloudFormation template:

    a. In the Fulfillment option dropdown list, select CloudFormation Template.

    b. For Fulfillment option and Software version, leave the default UnstructuredAPI template and software version.

    c. In the Region dropdown list, select the Region that corresponds to the VPC from Part I.

    • Note: You must select the same Region where you set up the VPC in Part I.

    d. Click Continue to Launch.

    e. In the Choose Action dropdown list, select Launch CloudFormation.

    f. Click Launch.

CloudFormation Configuration

  1. Create the CloudFormation stack:

After you click Launch, the Create stack page appears in CloudFormation.

Step 1: Create the stack

a. Leave Choose an existing template selected.

b. Leave Amazon S3 URL selected and the default Amazon S3 URL value unchanged.

c. Click Next.

Create Stack

Step 2: Specify the stack’s details

a. Enter some unique Stack name.

b. In the Parameters section, for KeyName, select the name of the SSH key pair from the beginning of this article.

c. In the LoadBalancerScheme dropdown list, select internet-facing.

d. For SSHLocation, enter 0.0.0.0/0, but only if you allow public access on the internet.

  • Note: It is generally recommended to limit SSH access to a specific IP range for enhanced security. This can be done by setting the SSHLocation to the IP address or range associated with your organization. Please consult your IT department or VPN vendor to obtain the correct IP information for these settings.

  • AWS provides AWS Client VPN, which is a managed client-based VPN service that enables secure access AWS resources and resources in your on-premises network. To learn more, see Getting started with AWS Client VPN.

e. In the Subnets dropdown multiselect list, select the two public subnets and the private subnet from Part I.

f. In the VPC dropdown list, select the VPC from Part I.

g. You can leave the default values for all of the other Parameters fields.

h. Click Next button.

Specify stack details

Step 3: Configure the stack’s options

a. You can leave the default values, or specify any non-default stack options.

b. Click Next.

Specify stack options

Step 4: Review

a. Review the stack’s settings.

b. Click Submit.

Review stack 11. Get the Unstructured API endpoint:

a. Check the status of the CloudFormation stack. A successful deployment will show a CREATE_COMPLETE status on the Stack Info tables. The deployment can take several minutes.

b. Click the Resources tab, click the ApplicationLoadBalancer link.

c. On the EC2 > Load balancers > (Load balancer ID) page, copy the DNS Name value, which is shown as an (A Record) and ends with .elb.amazonaws.com.

  • Note: You will use this DNS Name to replace the <application-load-balancer-dns-name> for the following healthcheck and data processing steps.

Unstructured API Endpoint

Healthcheck

Perform a health check by running this curl command, replacing <application-load-balancer-dns-name> with your application load balancer’s DNS name:

curl http://<application-load-balancer-dns-name>/healthcheck

Healthcheck

Data processing

Data processing can be performed by using curl commands. For example, run the following command, replacing:

  • <application-load-balancer-dns-name> with your application load balancer’s DNS name.
  • <path/to/input-file> with the path on your local machine to a file to process. If you do not have any input files available, you can download any of the ones from the example-docs folder in GitHub.
  • <path/to/output-file> with the path on your local machine to the processed output in JSON format.
curl -q -X 'POST' http://<application-load-balancer-dns-name>/general/v0/general \
     -H 'accept: application/json' \
     -H 'Content-Type: multipart/form-data' \
     -F files=@<path/to/input-file> \
     -o <path/to/output-file>.json

Data Processing Endpoint

Unstructured does not recommend POST to process multiple files at a time. Instead, use the Unstructured CLI or the Unstructured Python SDK with their provided source connectors and destination connectors.