> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenSearch

<Note>
  First time creating a connector? [Read this first](/api-reference/workflow/connector-first-time-reqs).
</Note>

Ingest your files into Unstructured from OpenSearch. This page covers configuration for both Amazon OpenSearch Service (managed domains) and Amazon OpenSearch Serverless collections.

## Requirements

You will need:

## Set up OpenSearch

Supported OpenSearch installations vary by product:

* **[Unstructured Pipelines](/pipelines/overview) and [Unstructured API](/api-reference/overview):** Non-local OpenSearch only. Local OpenSearch instances are not supported.
* **[Unstructured Ingest](/open-source/ingestion/overview):** Both local and non-local OpenSearch instances are supported.

### Set up an AWS OpenSearch Service domain

To set up an [AWS OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html) domain, complete steps similar to the following:

1. Sign in to your AWS account, and then open your AWS Management Console.

2. Open your Amazon OpenSearch Service console.

3. On the sidebar, expand **Managed clusters**, and then click **Dashboard**.

4. Click **Create domain**.

5. In the **Name** tile, for **Domain name**, enter some unique domain name for your new OpenSearch domain.

6. In the **Domain creation method** tile, select **Easy create**. This option provides faster setup using default configurations and enables fine-grained access control (FGAC) by default. (With **Standard create**, you must enable FGAC manually.) [Standard create method documentation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html).

7. In the **Engine options** tile, for **Version**, AWS recommends that you select the latest version.

8. In the **Network** tile, for **Network**, select a network access method.
   For faster setup, this example uses the **Public access** method.
   [VPC access method documentation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/vpc.html#prerequisites-vpc-endpoints).

9. For **IP address type**, select **Dual-stack mode**.

10. In the **Fine-grained access control** (FGAC) tile, do one of the following:

    * If you want to use an existing AWS IAM user in the AWS account as the domain's master user, then for **Master user**, select **Set IAM ARN as master user**. Then enter the IAM ARN for the master user in the **IAM ARN** box.
    * If you want to create a master user and password as the domain's master user instead, then for **Master user**, select **Create master user**. Then specify some username and password for this
      new master user by filling in the **Master username**, **Master password**, and **Confirm master password** fields. Make
      sure to save the master user's password in a secure location.

11. Click **Create**.

12. After the domain is created, you must allow Unstructured to access the domain, as follows:

    a. If the new domain's settings page is not already showing, open it as follows:
    in your Amazon OpenSearch Service console, on the sidebar, expand **Managed clusters**, and then click **Domains**. Then,
    in the list of available domains, click the name of the newly created domain.<br />
    b. On the **Security configuration** tab, click **Edit**.<br />
    c. In the **Access policy** tile, for **Domain access policy**, select **Only use fine-grained access control**.<br />
    d. Click **Clear policy**. This removes any existing resource-based access policy from the domain. With no domain access policy, access control relies entirely on fine-grained access control (FGAC).<br />
    e. Click **Save changes**.

### Set up an Amazon OpenSearch Serverless collection

To set up an [Amazon OpenSearch Serverless collection](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-create-console.html), complete steps similar to the following:

1. Sign in to your AWS account, and then open your AWS Management Console.

2. Open your Amazon OpenSearch Service console.

3. On the sidebar, expand **Serverless**, and then click **Dashboard**.

4. Click **Create collection**.

5. In the **Collection details** tile, for **Collection name**, enter some unique name for your new OpenSearch Serverless collection.
   Optionally, for **Description**, enter some meaningful description for your new collection.

6. For **Collection type**, select **Search**.

   <Note>
     Unstructured does not support the **Vector search** collection type. If you need vector search support, you can either continue
     with these steps to use the **Search** collection type, or you can follow the preceding steps to set up an Amazon OpenSearch Service managed cluster instead.
     However, note that the Amazon OpenSearch Serverless **Search** collection type is not as optimal as the **Vector search** collection type.
   </Note>

7. In the **Collection creation method** tile, select **Standard create**.

8. For **Encryption**, choose an AWS KMS key type.

9. For **Network access settings**, choose an **Access type**.

10. For **Resource type**, select both **Enable access to OpenSearch endpoint** and **Enable access to OpenSearch Dashboards**.

11. Click **Next**.

12. In the **Definition method** tile, select **JSON**.

13. In the **JSON editor** box, enter the following JSON, replacing the following placeholders:

    * Replace `<collection-name>` with the name of the new OpenSearch Serverless collection.
    * Replace `<account-id>` with the target AWS account ID.
    * Replace `<user-id>` with the ID of the target AWS IAM user.

    ```json theme={null}
    [
        {
            "Rules": [
                {
                    "Resource": ["collection/<collection-name>"],
                    "Permission": [
                        "aoss:CreateCollectionItems",
                        "aoss:UpdateCollectionItems",
                        "aoss:DescribeCollectionItems"
                    ],
                    "ResourceType": "collection"
                },
                {
                    "Resource": ["index/<collection-name>/*"],
                    "Permission": [
                        "aoss:CreateIndex",
                        "aoss:DescribeIndex",
                        "aoss:ReadDocument",
                        "aoss:WriteDocument",
                        "aoss:UpdateIndex",
                        "aoss:DeleteIndex"
                    ],
                    "ResourceType": "index"
                },
                {
                    "Resource": ["model/<collection-name>/*"],
                    "Permission": [
                        "aoss:DescribeMLResource",
                        "aoss:CreateMLResource",
                        "aoss:UpdateMLResource",
                        "aoss:DeleteMLResource",
                        "aoss:ExecuteMLResource"
                    ],
                    "ResourceType": "model"
                }
            ],
            "Principal": ["arn:aws:iam::<account-id>:user/<user-id>"]
        }
    ]
    ```

14. Click **Next**.

15. For **Data access policy settings**, select **Create as a new data access policy**.

16. In the **Name and description** tile, enter some unique name and an optional description for the new data access policy.

17. Click **Next**.

18. Enter any desired index details, and click **Next** again. For example:

    a. For **Index name**, enter the name of the new index in the collection.<br />
    b. For **Automatic Semantic Enrichment fields**, click **Add**, enter `embeddings` for **Automatic Semantic Enrichment field name**, click **Add**, and click **Confirm**.<br />
    c. For **Lexical search fields**, click **Add**, enter `text` for **Field name** and select **Text** for **Data type**, click **Add**, and click **Confirm**.<br />

19. Click **Submit**.

### Set up a local OpenSearch instance

The following video shows how to set up a [local OpenSearch](https://opensearch.org/downloads.html) instance.

<iframe width="560" height="315" src="https://www.youtube.com/embed/Rew3_pNnYIs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

### Get the host URL

Find your host URL in the AWS console using the steps for your OpenSearch type.

#### Amazon OpenSearch Service domain

1. Sign in to your AWS account, and then open your AWS Management Console.
2. Open your Amazon OpenSearch Service console.
3. On the sidebar, expand **Managed clusters**, and then click **Dashboard**.
4. In the list of available domains, click the name of your domain.
5. In the **General information** tile, copy the value of **Domain endpoint v2 (dual stack)**.

#### Amazon OpenSearch Serverless collection

1. Sign in to your AWS account, and then open your AWS Management Console.
2. Open your Amazon OpenSearch Service console.
3. On the sidebar, expand **Serverless**, and then click **Dashboard**.
4. In the list of available collections, click the name of your collection.
5. On the **Overview** tab, in the **Endpoint** tile, copy the value of **OpenSearch endpoint**.

#### Local OpenSearch instance

Your local instance URL depends on how OpenSearch is installed and configured. For guidance, see [Communicate with OpenSearch](https://opensearch.org/docs/latest/getting-started/communicate/) in the OpenSearch documentation.

### Create a search index

The name of the search index on the instance is required.

For the destination connector, if you need to create an index and you're using a master user and password as the domain's master user, you can use for example the following `curl` command. Replace the following placeholders:

* Replace `<host>` with the instance's host URL.
* Replace `<port>` with the instance's port number, which is typically `443` (for encrypted connections, and less commonly `9200` for unencrypted connections).
* Replace `<master-username>` with the master user's name, and replace `<master-password>` with the master user's password.
* Replace `<index-name>` with the name of the new search index on the instance.
* Replace `<index-schema>` with the schema for the new search index on the instance. A schema is optional; see the explanation
  following this `curl` command for more information.

```bash theme={null}
curl --request PUT "<host>:<port>/<index-name>" \
--user "<master-username>:<master-password>" \
[--header "Content-Type: application/json" \
--data '<index-schema>']
```

If you're using an existing AWS IAM user as the domain's master user instead, you should use the AWS Command Line Interface (CLI) to create the index instead of using the preceding `curl` command. To learn how, see [create-index](https://docs.aws.amazon.com/cli/latest/reference/opensearch/create-index.html) in the AWS CLI Command Reference.

For the destination connector, the index does not need to contain a schema beforehand. If Unstructured encounters an index without a schema,
Unstructured will automatically create a compatible schema for you before inserting items into the index. Nonetheless,
to reduce possible schema compatibility issues, Unstructured recommends that you create a schema that is compatible with Unstructured's schema.
Unstructured cannot provide a schema that is guaranteed to work in all
circumstances. This is because these schemas will vary based on your source files' types; how you
want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

For objects in the `metadata` field that Unstructured produces and that you want to store in an OpenSearch index, you must create fields in your index's schema that
follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
child objects:

```json theme={null}
"metadata": {
  "is_extracted": "true",
  "coordinates": {
    "points": [
      [
        134.20055555555555,
        241.36027777777795
      ],
      [
        134.20055555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        241.36027777777795
      ]
    ],
    "system": "PixelSpace",
    "layout_width": 1654,
    "layout_height": 2339
  },
  "filetype": "application/pdf",
  "languages": [
    "eng"
  ],
  "page_number": 1,
  "image_mime_type": "image/jpeg",
  "filename": "realestate.pdf",
  "data_source": {
    "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
    "record_locator": {
      "protocol": "file",
      "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
    }
  },
  "entities": {
    "items": [
      {
        "entity": "HOME FOR FUTURE",
        "type": "ORGANIZATION"
      },
      {
        "entity": "221 Queen Street, Melbourne VIC 3000",
        "type": "LOCATION"
      }
    ],
    "relationships": [
      {
        "from": "HOME FOR FUTURE",
        "relationship": "based_in",
        "to": "221 Queen Street, Melbourne VIC 3000"
      }
    ]
  }
}
```

You can adapt the following index schema example for your own needs. Note that outside of `metadata`, the following fields are
required by Unstructured whenever you create your own index schema:

* `element_id`
* `record_id`, which is required by Unstructured for intelligent record updates.
* `type`, which is not required, but highly recommended.
* `text`
* `embeddings` if embeddings are generated; make sure to set `dimension` to the same number of dimensions as the embedding model generates.

```json theme={null}
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "element_id": {
        "type": "keyword"
      },
      "record_id": {
        "type": "text"
      },
      "text": {
        "type": "text"
      },
      "type": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "embeddings": {
        "type": "knn_vector",
        "dimension": 1536
      },
      "metadata": {
        "properties": {
          "is_extracted": {
            "type": "boolean"
          },
          "coordinates-points": {
            "type": "float"
          },
          "coordinates-system": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "coordinates-layout_width": {
            "type": "long"
          },
          "coordinates-layout_height": {
            "type": "long"
          },
          "filetype": {
            "type": "keyword"
          },
          "languages": {
            "type": "keyword"
          },
          "page_number": {
            "type": "integer"
          },
          "image_mime_type": {
            "type": "keyword"
          },
          "filename": {
            "type": "keyword"
          },
          "data_source-url": {
            "type": "keyword"
          },
          "data_source-record_locator-protocol": {
            "type": "keyword"
          },
          "data_source-record_locator-remote_file_path": {
            "type": "keyword"
          },
          "entities-items": {
            "properties": {
              "entity": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "type": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "entities-relationships": {
            "properties": {
              "from": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "relationship": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "to": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
```

See also:

* [Create an index](https://opensearch.org/docs/latest/api-reference/index-apis/create-index/)
* [Mappings and field types](https://opensearch.org/docs/latest/field-types/)
* [Explicit mapping](https://opensearch.org/docs/latest/field-types/#explicit-mapping)
* [Dynamic mapping](https://opensearch.org/docs/latest/field-types/#dynamic-mapping)
* [Unstructured document elements and metadata](/concepts/document-elements)

### Set up master user authentication

<Note>
  If you are using Enterprise Connect on a [dedicated instance](/business/dedicated-instances/overview), you do not need **master user credentials**. Skip to the **Enterprise Connect** section on this page.
</Note>

For non-local OpenSearch instances, or if you're using basic authentication to a local OpenSearch instance, you will need the master user's name and password.

For local OpenSearch instances, if you're using certificates for authentication instead of basic authentication, you will need:

* The path to the Certificate Authority (CA) bundle, if you use intermediate CAs with your root CA.
* The path to the combined private key and certificate file, or
* The paths to the separate private key and certificate files.

To learn more, see [Authentication backends](https://opensearch.org/docs/latest/security/authentication-backends/authc-index/), [HTTP basic authentication](https://opensearch.org/docs/latest/security/authentication-backends/basic-authc/), and [Client certificate authentication](https://opensearch.org/docs/latest/security/authentication-backends/client-auth/).

## Examples

To create an OpenSearch source connector, see the following examples.

For more information on working with source connectors using the Unstructured API, see [Source endpoints](/api-reference/api/source/source-apis).

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="opensearch",
                  config={
                      "hosts": ["https://<host>:<port>"],
                      "index_name": "<index-name>",
                      "username": "<username>",
                      "password": "<password>",
                      "aws_access_key_id": "<aws-access-key-id>",
                      "aws_secret_access_key": "<aws-secret-access-key>",
                      "aws_session_token": "<aws-session-token>",
                      "fields": ["<field-name>", "<field-name>"],
                      "use_ssl": <True|False>
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "opensearch",
      "config": {
          "hosts": ["https://<host>:<port>"],
          "index_name": "<index-name>",
          "username": "<username>",
          "password": "<password>",
          "aws_access_key_id": "<aws-access-key-id>",
          "aws_secret_access_key": "<aws-secret-access-key>",
          "aws_session_token": "<aws-session-token>",
          "fields": ["<field-name>", "<field-name>"],
          "use_ssl": "true|false"
      }
  }'
  ```
</CodeGroup>

## Configuration settings

Replace the preceding placeholders as follows:

<ParamField body="name" type="string" required>
  A unique name for this connector.
</ParamField>

<ParamField body="hosts" type="string[]" required>
  The OpenSearch instance's host URL, in the format `https://<host>:<port>`.
</ParamField>

<ParamField body="index_name" type="string" required>
  The name of the search index on the instance.
</ParamField>

<ParamField body="username" type="string">
  For basic authentication, the domain's master user's name.
</ParamField>

<ParamField body="password" type="string">
  For basic authentication, the domain's master user's password.
</ParamField>

<ParamField body="aws_access_key_id" type="string">
  For AWS IAM user authentication, the AWS access key ID. For AWS STS authentication, a temporary AWS access key ID.
</ParamField>

<ParamField body="aws_secret_access_key" type="string">
  For AWS IAM user authentication, the AWS secret access key. For AWS STS authentication, a temporary AWS secret access key.
</ParamField>

<ParamField body="aws_session_token" type="string">
  For AWS STS authentication, the temporary AWS STS session token.
</ParamField>

<Warning>
  If you are using the **IAM + Session Token** authentication method: AWS STS credentials (consisting of a temporary AWS access key, temporary AWS secret access key, and temporary AWS STS session token) can be valid for as little as 15 minutes or as long as 36 hours, depending on how the credentials were initially generated. After the expiry time, the credentials are no longer valid and will no longer work with the corresponding OpenSearch connector. You must get a new set of AWS STS credentials to replace the expired ones, which produces a new, refreshed temporary AWS access key, temporary AWS secret access key, and temporary AWS STS session token. For more information, see [Request temporary security credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).

  After you generate refreshed temporary AWS STS credentials, you must update the OpenSearch connector's settings with the new, refreshed AWS STS credentials.
</Warning>

<ParamField body="fields" type="string[]">
  Source connector only. Any specific fields to be accessed in the index.
</ParamField>

<ParamField body="use_ssl" type="boolean" required>
  Set to `true` if the OpenSearch instance requires an SSL connection; otherwise `false`.
</ParamField>

## Set up Enterprise Connect authentication

<Note>
  Enterprise Connect is available for [dedicated instance](/business/dedicated-instances/overview) customers only, and must be enabled on your instance before use. Contact your Unstructured account team or [Unstructured Support](https://support.unstructured.io/) to request access and have it enabled.
</Note>

Enterprise Connect is an authentication method for AWS connectors. During a workflow run, Unstructured assumes an IAM role in your AWS account and uses short-lived credentials scoped to that operation. Credentials are never stored and expire automatically. For an overview, see [Enterprise Connect for AWS](/business/aws/enterprise-connect).

To configure an OpenSearch connector to use Enterprise Connect via the Unstructured API, first set up your AWS IAM role:

### Create the IAM role

1. Choose an **External ID** — a unique value that prevents unauthorized parties from assuming your IAM role. You will add this value to your AWS trust policy and enter it in the Unstructured connector. Use upper and lower case alphanumeric characters, underscores, or any of `+=,.@:\/-`; no spaces; 2–1224 characters.

2. In your AWS account, create an IAM role that Unstructured will assume to access your OpenSearch resources (for example, `unstructured-connector-role`), or use an existing one. For more information, see [Create a role using custom trust policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-custom.html) in the *AWS IAM User Guide*.

   Attach the following trust policy, replacing the placeholder values:

   ```json theme={null}
   {
     "Version": "2012-10-17",
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "<unstructured-service-role-arn>"
         },
         "Action": "sts:AssumeRole",
         "Condition": {
           "StringEquals": {
             "sts:ExternalId": "<external-id>"
           }
         }
       }
     ]
   }
   ```

   | Placeholder                       | Value                                                                                                                     |
   | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
   | `<unstructured-service-role-arn>` | The ARN of the Unstructured service role for your dedicated instance. Get this value from your Unstructured account team. |
   | `<external-id>`                   | The unique value you chose as your External ID.                                                                           |

   The `sts:ExternalId` condition prevents the [confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html). It ensures only your Unstructured workspace can use this role, even if another party knows the Unstructured service role ARN.

### Attach a permissions policy

To allow Unstructured to access your OpenSearch installation, you may need to attach a permissions policy to the IAM role. Whether it is required depends on your OpenSearch type and how your domain access policy is configured.

#### Amazon OpenSearch Service (managed)

A permissions policy is typically not required for a managed OpenSearch service domain that has the following:

* The domain's access policy is permissive (for example, `"Principal": "*"`)
* Fine-grained access control (FGAC) is the primary access control

Both conditions above are satisfied if you selected **Easy create** in step 6 (which enables FGAC by default) and followed step 12(d) (for the domain access policy) in the [Set up an AWS OpenSearch Service domain](#set-up-an-aws-opensearch-service-domain) section. No identity-based permissions policy is needed, and the trust policy you attached in [Create the IAM role](#create-the-iam-role) is sufficient for Enterprise Connect to function. Skip to [Grant access to OpenSearch resources](#grant-access-to-opensearch-resources).

If either condition is not met and your domain's access policy explicitly restricts access to specific principals or actions, attach the following policy, replacing the placeholder values:

```json theme={null}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "es:ESHttp*",
      "Resource": "arn:aws:es:<region>:<account-id>:domain/<domain-name>/*"
    }
  ]
}
```

| Placeholder     | Value                                                  |
| --------------- | ------------------------------------------------------ |
| `<region>`      | The AWS region of the domain, for example `us-east-1`. |
| `<account-id>`  | The 12-digit ID of your AWS account.                   |
| `<domain-name>` | The name of the OpenSearch domain.                     |

<Tip>
  You can attach the permissions policy immediately after creating the role, without leaving the AWS IAM console. Continue following the [Create a role using custom trust policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-custom.html) guide in the *AWS IAM User Guide* referenced in the previous step.
</Tip>

#### Amazon OpenSearch Serverless

The OpenSearch Serverless connector requires a permissions policy. Attach the following policy, replacing the placeholder values:

```json theme={null}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "aoss:APIAccessAll",
      "Resource": "arn:aws:aoss:<region>:<account-id>:collection/<collection-id>"
    }
  ]
}
```

| Placeholder       | Value                                                      |
| ----------------- | ---------------------------------------------------------- |
| `<region>`        | The AWS region of the collection, for example `us-east-1`. |
| `<account-id>`    | The 12-digit ID of your AWS account.                       |
| `<collection-id>` | The ID of the OpenSearch Serverless collection.            |

<Tip>
  You can attach the permissions policy immediately after creating the role, without leaving the AWS IAM console. Continue following the [Create a role using custom trust policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-custom.html) guide in the *AWS IAM User Guide* referenced in the previous step.
</Tip>

### Grant access to OpenSearch resources

After creating the IAM role and configuring AWS permissions, you must also authorize the role within OpenSearch itself. AWS authorization and OpenSearch authorization are separate systems.

The steps in this section configure the OpenSearch authorization for your IAM role. The process you follow depends on your OpenSearch type.

#### Amazon OpenSearch Service (managed) — Map the role inside OpenSearch

You only need to map the IAM role inside OpenSearch if fine-grained access control (FGAC) is enabled on your domain:

* If you selected **Easy create** in step 6 of [Set up an AWS OpenSearch Service domain](#set-up-an-aws-opensearch-service-domain), FGAC was enabled automatically and these steps are required.
* If your domain has FGAC disabled, skip this section. Your IAM role setup is complete. Proceed to the next section to create your connector.

**Select or create an OpenSearch role**

An *OpenSearch role* is an internal security construct, distinct from an AWS IAM role. The IAM role ARN is mapped to an internal *OpenSearch role* by registering it as a **backend role**. This is how OpenSearch links an external AWS identity to an internal permission set. First, select or create the *OpenSearch role* that has permission to perform the actions your connector needs:

* For a quick proof-of-concept, use the built-in `all_access` *OpenSearch role*, already present in OpenSearch Dashboards under **Security** → **Roles**.
* For production, first [create a least-privilege *OpenSearch role*](https://opensearch.org/docs/latest/security/access-control/users-roles/) in OpenSearch Dashboards under **Security** → **Roles**, scoped to the specific indices your connector will access.

Once you have your *OpenSearch role*, complete the steps below to map your IAM role ARN to it. Skipping this mapping causes the connector to fail with a 403 error, even if the IAM role passes the AWS authorization checks:

```
security_exception ... no permissions for [indices:admin/aliases/get] and User [name=arn:aws:iam::...:role/...]
```

For more information about managing OpenSearch roles, see [Users and roles](https://opensearch.org/docs/latest/security/access-control/users-roles/) in the OpenSearch documentation.

**Map your IAM role ARN to the OpenSearch role**

You can add the mapping using OpenSearch Dashboards or the API:

* **OpenSearch Dashboards:** Go to **Security** → **Roles** → select the *OpenSearch role* → **Mapped users** tab → add the IAM role ARN under **Backend roles**. For more information, see [Fine-grained access control in Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html) in the *AWS OpenSearch Service Developer Guide*.

* Using the **API**:

  ```bash theme={null}
  curl -XPUT "$DOMAIN/_plugins/_security/api/rolesmapping/<opensearch-role-name>" \
    -H "Content-Type: application/json" \
    -u "$ADMIN_USER:$ADMIN_PASS" \
    -d '{"backend_roles":["<assumed-role-arn>"]}'
  ```

  Replace `<opensearch-role-name>` with the name of the *OpenSearch role* that you selected or created in this section. Replace `<assumed-role-arn>` with the ARN of the IAM role that you created in step 2 of [Create the IAM role](#create-the-iam-role).

  <Note>
    `PUT` replaces the entire role mapping. To preserve existing backend roles, first `GET` the current mapping and include all existing entries in your `PUT` request.
  </Note>

#### Amazon OpenSearch Serverless — Add the role to a data access policy

OpenSearch Serverless does not use FGAC. Authorization is managed through data access policies attached to the collection.

<Note>
  OpenSearch Serverless limits each AWS account and region to 10 data access policies. Where possible, add the IAM role ARN to an existing policy rather than creating a new one.
</Note>

Use the AWS CLI to update the collection's data access policy:

```bash theme={null}
# 1. Find the data access policy for your collection
aws opensearchserverless list-access-policies \
  --type data \
  --region <region> \
  --resource "collection/<collection-name>"

# 2. Get the current policy and note the policyVersion value in the response
aws opensearchserverless get-access-policy \
  --type data \
  --name <policy-name> \
  --region <region>

# 3. Update the policy, adding the IAM role ARN to the Principal array
aws opensearchserverless update-access-policy \
  --type data \
  --name <policy-name> \
  --policy-version <version-from-step-2> \
  --region <region> \
  --policy file://policy.json
```

The `--policy-version` value comes from the `get-access-policy` command above (#2) and enables optimistic concurrency. The `--policy` document replaces the entire policy. Include all existing principals and rules, not just your addition.

Example `policy.json`, replacing the placeholder values:

```json theme={null}
[
  {
    "Rules": [
      {
        "Resource": ["collection/<collection-name>"],
        "Permission": ["aoss:DescribeCollectionItems"],
        "ResourceType": "collection"
      },
      {
        "Resource": ["index/<collection-name>/*"],
        "Permission": [
          "aoss:CreateIndex",
          "aoss:DescribeIndex",
          "aoss:ReadDocument",
          "aoss:WriteDocument",
          "aoss:UpdateIndex"
        ],
        "ResourceType": "index"
      }
    ],
    "Principal": ["arn:aws:iam::<account-id>:role/<role-name>"],
    "Description": "Unstructured connector access"
  }
]
```

| Placeholder         | Value                                                                                               |
| ------------------- | --------------------------------------------------------------------------------------------------- |
| `<collection-name>` | The name of the OpenSearch Serverless collection.                                                   |
| `<account-id>`      | The 12-digit ID of your AWS account.                                                                |
| `<role-name>`       | The name of the IAM role that you created in step 2 of [Create the IAM role](#create-the-iam-role). |

**For source-only connectors:** remove `aoss:WriteDocument`,  `aoss:UpdateIndex`, and `aoss:CreateIndex` from the `Permission` array.

<Note>
  Policy changes propagate within seconds, but may occasionally take up to a minute. If the connector returns a 403 immediately after updating the policy, wait a moment and retry before treating it as a configuration error.
</Note>

### Create the source connector with Enterprise Connect

To create a source connector, see the following examples.

For more information on working with source connectors using the Unstructured API, see [Source endpoints](/api-reference/api/source/source-apis).

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="opensearch",
                  config={
                      "hosts": ["https://<host>:<port>"],
                      "index_name": "<index-name>",
                      "role_arn": "<role-arn>",
                      "external_id": "<external-id>",
                      "use_ssl": <True|False>
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "opensearch",
      "config": {
          "hosts": ["https://<host>:<port>"],
          "index_name": "<index-name>",
          "role_arn": "<role-arn>",
          "external_id": "<external-id>",
          "use_ssl": "true|false"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows.

<ParamField body="name" type="string" required>
  A unique name for this connector.
</ParamField>

<ParamField body="hosts" type="string[]" required>
  The OpenSearch instance's host URL, for example `https://<host>:<port>`.
</ParamField>

<ParamField body="index_name" type="string" required>
  The name of the search index on the instance.
</ParamField>

<ParamField body="role_arn" type="string" required>
  The ARN of the IAM role Unstructured will assume via AWS STS. For example, `arn:aws:iam::123456789012:role/MyRole`.
</ParamField>

<ParamField body="external_id" type="string" required>
  The unique value you chose as your External ID. Must match the `sts:ExternalId` condition in the IAM role's trust policy.
</ParamField>

<ParamField body="use_ssl" type="boolean">
  Set to `true` if the OpenSearch instance requires an SSL connection; otherwise `false`.
</ParamField>
