This page was recently updated. What do you think about it? Let us know!.

Batch process all your records to store structured outputs in an S3 bucket.

You will need:

The Amazon S3 prerequisites:

  • An AWS account. Create an AWS account.

  • An S3 bucket. Create an S3 bucket.

  • Anonymous or authenticated access to the bucket.

  • For authenticated bucket read access, the authenticated AWS IAM user must have at minimum the permissions of s3:ListBucket and s3:GetObject for that bucket. Learn how.

  • For bucket write access, authenticated access to the bucket must be enabled (anonymous access must not be enabled), and the authenticated AWS IAM user must have at minimum the permission of s3:PutObject for that bucket. Learn how.

  • For authenticated access, an AWS access key and secret access key for the authenticated AWS IAM user in the account. Create an AWS access key and secret access key.

  • For authenticated access in untrusted environments or enhanced security scenarios, an AWS STS session token for temporary access, in addition to an AWS access key and secret access key. Create a session token.

  • If the target files are in the root of the bucket, the path to the bucket, formatted as protocol://bucket/ (for example, s3://my-bucket/). If the target files are in a folder, the path to the target folder in the S3 bucket, formatted as protocol://bucket/path/to/folder/ (for example, s3://my-bucket/my-folder/).

  • If the target files are in a folder, and authenticated bucket access is enabled, make sure the authenticated AWS IAM user has authenticated access to the folder as well. Enable authenticated folder access.

The S3 connector dependencies:

CLI, Python
pip install "unstructured-ingest[s3]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • AWS_S3_URL - The path to the S3 bucket or folder, formatted as s3://my-bucket/ (if the files are in the bucket’s root) or s3://my-bucket/my-folder/.

  • If the bucket does not have anonymous access enabled, provide the AWS credentials:

    • AWS_ACCESS_KEY_ID - The AWS access key ID for the authenticated AWS IAM user, represented by --key (CLI) or key (Python).
    • AWS_SECRET_ACCESS_KEY - The corresponding AWS secret access key, represented by --secret (CLI) or secret (Python).
    • AWS_TOKEN - If required, the AWS STS session token for temporary access, represented by --token (CLI) or token (Python).
  • If the bucket has anonymous access enabled for reading from the bucket, set --anonymous (CLI) or anonymous=True (Python) instead.

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector: