Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
If you’re new to Unstructured, read this note first.Before you can create a source connector, you must first sign in to your Unstructured account:
- If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Let’s Go account, at https://platform.unstructured.io. To sign up for a Business account instead, contact Unstructured Sales, or learn more.
- If you already have an Unstructured Let’s Go, Pay-As-You-Go, or Business SaaS account and are not already signed in, sign in to your account at https://platform.unstructured.io. For other types of Business accounts, see your Unstructured account administrator for sign-in instructions, or email Unstructured Support at support@unstructured.io.
-
After you sign in to your Unstructured Let’s Go, Pay-As-You-Go, or Business account, click API Keys on the sidebar.
For a Business account, before you click API Keys, make sure you have selected the organizational workspace you want to create an API key for. Each API key works with one and only one organizational workspace. Learn more. -
Click Generate API Key.
-
Follow the on-screen instructions to finish generating the key.
-
Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
Requirements
You will need:Set up OpenSearch
Supported OpenSearch installations vary by product:- Unstructured UI and Unstructured API: Non-local OpenSearch only. Local OpenSearch instances are not supported.
- Unstructured Ingest: Both local and non-local OpenSearch instances are supported.
Set up an AWS OpenSearch Service domain
To set up an AWS OpenSearch Service domain, complete steps similar to the following:- Sign in to your AWS account, and then open your AWS Management Console.
- Open your Amazon OpenSearch Service console.
- On the sidebar, expand Managed clusters, and then click Dashboard.
- Click Create domain.
- In the Name tile, for Domain name, enter some unique domain name for your new OpenSearch domain.
- In the Domain creation method tile, select Easy create. This option provides faster setup using default configurations and enables fine-grained access control (FGAC) by default. (With Standard create, you must enable FGAC manually.) Standard create method documentation.
- In the Engine options tile, for Version, AWS recommends that you select the latest version.
- In the Network tile, for Network, select a network access method. For faster setup, this example uses the Public access method. VPC access method documentation.
- For IP address type, select Dual-stack mode.
-
In the Fine-grained access control (FGAC) tile, do one of the following:
- If you want to use an existing AWS IAM user in the AWS account as the domain’s master user, then for Master user, select Set IAM ARN as master user. Then enter the IAM ARN for the master user in the IAM ARN box.
- If you want to create a master user and password as the domain’s master user instead, then for Master user, select Create master user. Then specify some username and password for this new master user by filling in the Master username, Master password, and Confirm master password fields. Make sure to save the master user’s password in a secure location.
- Click Create.
-
After the domain is created, you must allow Unstructured to access the domain, as follows:
a. If the new domain’s settings page is not already showing, open it as follows:
in your Amazon OpenSearch Service console, on the sidebar, expand Managed clusters, and then click Domains. Then,
in the list of available domains, click the name of the newly created domain.
b. On the Security configuration tab, click Edit.
c. In the Access policy tile, for Domain access policy, select Only use fine-grained access control.
d. Click Clear policy. This removes any existing resource-based access policy from the domain. With no domain access policy, access control relies entirely on fine-grained access control (FGAC).
e. Click Save changes.
Set up an Amazon OpenSearch Serverless collection
To set up an Amazon OpenSearch Serverless collection, complete steps similar to the following:- Sign in to your AWS account, and then open your AWS Management Console.
- Open your Amazon OpenSearch Service console.
- On the sidebar, expand Serverless, and then click Dashboard.
- Click Create collection.
- In the Collection details tile, for Collection name, enter some unique name for your new OpenSearch Serverless collection. Optionally, for Description, enter some meaningful description for your new collection.
-
For Collection type, select Search.
Unstructured does not support the Vector search collection type. If you need vector search support, you can either continue with these steps to use the Search collection type, or you can follow the preceding steps to set up an Amazon OpenSearch Service managed cluster instead. However, note that the Amazon OpenSearch Serverless Search collection type is not as optimal as the Vector search collection type.
- In the Collection creation method tile, select Standard create.
- For Encryption, choose an AWS KMS key type.
- For Network access settings, choose an Access type.
- For Resource type, select both Enable access to OpenSearch endpoint and Enable access to OpenSearch Dashboards.
- Click Next.
- In the Definition method tile, select JSON.
-
In the JSON editor box, enter the following JSON, replacing the following placeholders:
- Replace
<collection-name>with the name of the new OpenSearch Serverless collection. - Replace
<account-id>with the target AWS account ID. - Replace
<user-id>with the ID of the target AWS IAM user.
- Replace
- Click Next.
- For Data access policy settings, select Create as a new data access policy.
- In the Name and description tile, enter some unique name and an optional description for the new data access policy.
- Click Next.
-
Enter any desired index details, and click Next again. For example:
a. For Index name, enter the name of the new index in the collection.
b. For Automatic Semantic Enrichment fields, click Add, enterembeddingsfor Automatic Semantic Enrichment field name, click Add, and click Confirm.
c. For Lexical search fields, click Add, entertextfor Field name and select Text for Data type, click Add, and click Confirm.
- Click Submit.
Set up a local OpenSearch instance
The following video shows how to set up a local OpenSearch instance.Get the host URL
Find your host URL in the AWS console using the steps for your OpenSearch type.Amazon OpenSearch Service domain
- Sign in to your AWS account, and then open your AWS Management Console.
- Open your Amazon OpenSearch Service console.
- On the sidebar, expand Managed clusters, and then click Dashboard.
- In the list of available domains, click the name of your domain.
- In the General information tile, copy the value of Domain endpoint v2 (dual stack).
Amazon OpenSearch Serverless collection
- Sign in to your AWS account, and then open your AWS Management Console.
- Open your Amazon OpenSearch Service console.
- On the sidebar, expand Serverless, and then click Dashboard.
- In the list of available collections, click the name of your collection.
- On the Overview tab, in the Endpoint tile, copy the value of OpenSearch endpoint.
Local OpenSearch instance
Your local instance URL depends on how OpenSearch is installed and configured. For guidance, see Communicate with OpenSearch in the OpenSearch documentation.Create a search index
The name of the search index on the instance is required. For the destination connector, if you need to create an index and you’re using a master user and password as the domain’s master user, you can use for example the followingcurl command. Replace the following placeholders:
- Replace
<host>with the instance’s host URL. - Replace
<port>with the instance’s port number, which is typically443(for encrypted connections, and less commonly9200for unencrypted connections). - Replace
<master-username>with the master user’s name, and replace<master-password>with the master user’s password. - Replace
<index-name>with the name of the new search index on the instance. - Replace
<index-schema>with the schema for the new search index on the instance. A schema is optional; see the explanation following thiscurlcommand for more information.
curl command. To learn how, see create-index in the AWS CLI Command Reference.
For the destination connector, the index does not need to contain a schema beforehand. If Unstructured encounters an index without a schema,
Unstructured will automatically create a compatible schema for you before inserting items into the index. Nonetheless,
to reduce possible schema compatibility issues, Unstructured recommends that you create a schema that is compatible with Unstructured’s schema.
Unstructured cannot provide a schema that is guaranteed to work in all
circumstances. This is because these schemas will vary based on your source files’ types; how you
want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.
For objects in the metadata field that Unstructured produces and that you want to store in an OpenSearch index, you must create fields in your index’s schema that
follows Unstructured’s metadata field naming convention. For example, if Unstructured produces a metadata field with the following
child objects:
metadata, the following fields are
required by Unstructured whenever you create your own index schema:
element_idrecord_id, which is required by Unstructured for intelligent record updates.type, which is not required, but highly recommended.textembeddingsif embeddings are generated; make sure to setdimensionto the same number of dimensions as the embedding model generates.
- Create an index
- Mappings and field types
- Explicit mapping
- Dynamic mapping
- Unstructured document elements and metadata
Set up master user authentication
If you are using Enterprise Connect on a dedicated instance, you do not need master user credentials. Skip to the Enterprise Connect section on this page.
- The path to the Certificate Authority (CA) bundle, if you use intermediate CAs with your root CA.
- The path to the combined private key and certificate file, or
- The paths to the separate private key and certificate files.
Examples
To create an OpenSearch source connector, see the following examples. For more information on working with source connectors using the Unstructured API, see Source endpoints.Configuration settings
Replace the preceding placeholders as follows:-
<name>(required) - A unique name for this connector. -
https://<host>:<port>(required) - The OpenSearch instance’s host URL, which typically takes the form ofhttps://<host>:<port>. -
<index-name>(required) - The name of the search index on the instance. -
<username>- If you’re using basic authentication to the instance, the domain’s master user’s name. -
<password>- If you’re using basic authentication to the instance, the domain’s master user’s password. -
<aws-access-key-id>- If you’re using an existing AWS IAM user as the domain’s master user, the AWS access key ID for the AWS IAM user. If you’re also using AWS STS for authentication, this will be a temporary AWS access key ID. -
<aws-secret-access-key>- If you’re using an existing AWS IAM user as the domain’s master user, the AWS secret access key for the AWS IAM user. If you’re also using AWS STS for authentication, this will be a temporary AWS secret access key. -
<aws-session-token>- If you’re using AWS STS for authentication, the temporary AWS STS session token. -
<field-name>(source connectors only) - Any specific fields to be accessed in the index. -
<use-ssl>(required) - True if the OpenSearch instance requires an SSL connection; otherwise, false.
Set up Enterprise Connect authentication
Enterprise Connect is available for dedicated instance customers only, and must be enabled on your instance before use. Contact your Unstructured account team or Unstructured Support to request access and have it enabled.
Create the IAM role
-
Choose an External ID — a unique value that prevents unauthorized parties from assuming your IAM role. You will add this value to your AWS trust policy and enter it in the Unstructured connector. Use upper and lower case alphanumeric characters, underscores, or any of
+=,.@:\/-; no spaces; 2–1224 characters. -
In your AWS account, create an IAM role that Unstructured will assume to access your OpenSearch resources (for example,
unstructured-connector-role), or use an existing one. For more information, see Create a role using custom trust policies in the AWS IAM User Guide. Attach the following trust policy, replacing the placeholder values:ThePlaceholder Value <unstructured-service-role-arn>The ARN of the Unstructured service role for your dedicated instance. Get this value from your Unstructured account team. <external-id>The unique value you chose as your External ID. sts:ExternalIdcondition prevents the confused deputy problem. It ensures only your Unstructured workspace can use this role, even if another party knows the Unstructured service role ARN.
Attach a permissions policy
To allow Unstructured to access your OpenSearch installation, you may need to attach a permissions policy to the IAM role. Whether it is required depends on your OpenSearch type and how your domain access policy is configured.Amazon OpenSearch Service (managed)
A permissions policy is typically not required for a managed OpenSearch service domain that has the following:- The domain’s access policy is permissive (for example,
"Principal": "*") - Fine-grained access control (FGAC) is the primary access control
| Placeholder | Value |
|---|---|
<region> | The AWS region of the domain, for example us-east-1. |
<account-id> | The 12-digit ID of your AWS account. |
<domain-name> | The name of the OpenSearch domain. |
Amazon OpenSearch Serverless
The OpenSearch Serverless connector requires a permissions policy. Attach the following policy, replacing the placeholder values:| Placeholder | Value |
|---|---|
<region> | The AWS region of the collection, for example us-east-1. |
<account-id> | The 12-digit ID of your AWS account. |
<collection-id> | The ID of the OpenSearch Serverless collection. |
Grant access to OpenSearch resources
After creating the IAM role and configuring AWS permissions, you must also authorize the role within OpenSearch itself. AWS authorization and OpenSearch authorization are separate systems. The steps in this section configure the OpenSearch authorization for your IAM role. The process you follow depends on your OpenSearch type.Amazon OpenSearch Service (managed) — Map the role inside OpenSearch
You only need to map the IAM role inside OpenSearch if fine-grained access control (FGAC) is enabled on your domain:- If you selected Easy create in step 6 of Set up an AWS OpenSearch Service domain, FGAC was enabled automatically and these steps are required.
- If your domain has FGAC disabled, skip this section. Your IAM role setup is complete. Proceed to the next section to create your connector.
- For a quick proof-of-concept, use the built-in
all_accessOpenSearch role, already present in OpenSearch Dashboards under Security → Roles. - For production, first create a least-privilege OpenSearch role in OpenSearch Dashboards under Security → Roles, scoped to the specific indices your connector will access.
- OpenSearch Dashboards: Go to Security → Roles → select the OpenSearch role → Mapped users tab → add the IAM role ARN under Backend roles. For more information, see Fine-grained access control in Amazon OpenSearch Service in the AWS OpenSearch Service Developer Guide.
-
Using the API:
Replace
<opensearch-role-name>with the name of the OpenSearch role that you selected or created in this section. Replace<assumed-role-arn>with the ARN of the IAM role that you created in step 2 of Create the IAM role.PUTreplaces the entire role mapping. To preserve existing backend roles, firstGETthe current mapping and include all existing entries in yourPUTrequest.
Amazon OpenSearch Serverless — Add the role to a data access policy
OpenSearch Serverless does not use FGAC. Authorization is managed through data access policies attached to the collection.OpenSearch Serverless limits each AWS account and region to 10 data access policies. Where possible, add the IAM role ARN to an existing policy rather than creating a new one.
--policy-version value comes from the get-access-policy command above (#2) and enables optimistic concurrency. The --policy document replaces the entire policy. Include all existing principals and rules, not just your addition.
Example policy.json, replacing the placeholder values:
| Placeholder | Value |
|---|---|
<collection-name> | The name of the OpenSearch Serverless collection. |
<account-id> | The 12-digit ID of your AWS account. |
<role-name> | The name of the IAM role that you created in step 2 of Create the IAM role. |
aoss:WriteDocument, aoss:UpdateIndex, and aoss:CreateIndex from the Permission array.
Policy changes propagate within seconds, but may occasionally take up to a minute. If the connector returns a 403 immediately after updating the policy, wait a moment and retry before treating it as a configuration error.
Create the source connector with Enterprise Connect
To create a source connector see the following examples.| Placeholder | Value |
|---|---|
<name> | A unique name for this connector. |
https://<host>:<port> | The OpenSearch instance’s host URL. |
<index-name> | The name of the search index on the instance. |
<role-arn> | The ARN of the IAM role Unstructured will assume via AWS STS. For example, arn:aws:iam::123456789012:role/MyRole. |
<external-id> | The unique value you chose as your External ID. Must match the sts:ExternalId condition in the IAM role’s trust policy. |
use_ssl | True (Python SDK) or true (curl) if the OpenSearch instance requires an SSL connection; otherwise False or false. |

