> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure AI Search

<Note>
  First time creating a connector? [Read this first](/api-reference/workflow/connector-first-time-reqs).
</Note>

Send processed data from Unstructured to Azure AI Search.

## Requirements

You will need:

The following video shows how to fulfill the minimum set of Azure AI Search requirements:

<iframe width="560" height="315" src="https://www.youtube.com/embed/6ZjU5OupWE8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

Here are some more details about these requirements:

* The endpoint and API key for Azure AI Search. [Create an endpoint and API key](https://learn.microsoft.com/azure/search/search-create-service-portal).
* The name of the index in Azure AI Search. [Create an index](https://learn.microsoft.com/rest/api/searchservice/create-index).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/WY8h8Gtyo7o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  The Azure AI Search index that you use must have an index schema that is compatible with the schema of the documents
  that Unstructured produces for you. Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

  You can adapt the following index schema example for your own needs. Be sure to replace `<number-of-dimensions>`
  (in three locations in the following example) with the number of dimensions of the embedding model you are using:

  ```json theme={null}
  {
    "name": "elements-index",
    "fields": [
      {
        "name": "id",
        "type": "Edm.String",
        "key": true
      },
      {
        "name": "record_id",
        "type": "Edm.String",
        "filterable": true
      },
      {
        "name": "element_id",
        "type": "Edm.String"
      },
      {
        "name": "text",
        "type": "Edm.String",
        "searchable": true
      },
      {
        "name": "type",
        "type": "Edm.String"
      },
      {
        "name": "metadata",
        "type": "Edm.ComplexType",
        "fields": [
          {
            "name": "orig_elements",
            "type": "Edm.String"
          },
          {
            "name": "category_depth",
            "type": "Edm.Int32"
          },
          {
            "name": "parent_id",
            "type": "Edm.String"
          },
          {
            "name": "attached_to_filename",
            "type": "Edm.String"
          },
          {
            "name": "filetype",
            "type": "Edm.String"
          },
          {
            "name": "last_modified",
            "type": "Edm.DateTimeOffset"
          },
          {
            "name": "is_continuation",
            "type": "Edm.Boolean"
          },
          {
            "name": "file_directory",
            "type": "Edm.String"
          },
          {
            "name": "filename",
            "type": "Edm.String"
          },
          {
            "name": "data_source",
            "type": "Edm.ComplexType",
            "fields": [
              {
                "name": "url",
                "type": "Edm.String"
              },
              {
                "name": "version",
                "type": "Edm.String"
              },
              {
                "name": "date_created",
                "type": "Edm.DateTimeOffset"
              },
              {
                "name": "date_modified",
                "type": "Edm.DateTimeOffset"
              },
              {
                "name": "date_processed",
                "type": "Edm.DateTimeOffset"
              },
              {
                "name": "permissions_data",
                "type": "Edm.String"
              },
              {
                "name": "record_locator",
                "type": "Edm.String"
              }
            ]
          },
          {
            "name": "coordinates",
            "type": "Edm.ComplexType",
            "fields": [
              {
                "name": "system",
                "type": "Edm.String"
              },
              {
                "name": "layout_width",
                "type": "Edm.Double"
              },
              {
                "name": "layout_height",
                "type": "Edm.Double"
              },
              {
                "name": "points",
                "type": "Edm.String"
              }
            ]
          },
          {
            "name": "languages",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "page_number",
            "type": "Edm.String"
          },
          {
            "name": "links",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "page_name",
            "type": "Edm.String"
          },
          {
            "name": "link_urls",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "link_texts",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "sent_from",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "sent_to",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "subject",
            "type": "Edm.String"
          },
          {
            "name": "section",
            "type": "Edm.String"
          },
          {
            "name": "header_footer_type",
            "type": "Edm.String"
          },
          {
            "name": "emphasized_text_contents",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "emphasized_text_tags",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "text_as_html",
            "type": "Edm.String"
          },
          {
            "name": "regex_metadata",
            "type": "Edm.String"
          },
          {
            "name": "detection_class_prob",
            "type": "Edm.Double"
          }
        ]
      },
      {
        "name": "embeddings",
        "type": "Collection(Edm.Single)",
        "dimensions": <number-of-dimensions>,
        "vectorSearchProfile": "embeddings-config-profile"
      }
    ],
    "vectorSearch": {
      "algorithms": [
        {
          "name": "hnsw-<number-of-dimensions>",
          "kind": "hnsw",
          "hnswParameters": {
            "m": 4,
            "efConstruction": 400,
            "efSearch": 500,
            "metric": "cosine"
          }
        }
      ],
      "profiles": [
        {
          "name": "embeddings-config-profile",
          "algorithm": "hnsw-<number-of-dimensions>"
        }
      ]
    },
    "semantic": {
      "configurations": [
        {
          "name": "default-semantic-config",
          "prioritizedFields": {
            "titleField": null,
            "prioritizedContentFields": [
              { "fieldName": "text" }
            ],
            "prioritizedKeywordsFields": []
          }
        }
      ]
    }
  }
  ```

  <Info>
    The `record_id`, `element_id`, and `id` fields are closely related, but each has a distinct purpose. For more information, see [How connectors use record IDs, element IDs, and IDs](/api-reference/record-element-id).
  </Info>

  See also:

  * [Search indexes in Azure AI Search](https://learn.microsoft.com/azure/search/search-what-is-an-index)
  * [Schema of a search index](https://learn.microsoft.com/azure/search/search-what-is-an-index#schema-of-a-search-index)
  * [Example index schema](https://learn.microsoft.com/rest/api/searchservice/create-index#examples)
  * [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

## Examples

The following examples create an Azure AI Search destination connector using the Unstructured API.

For more information on working with destination connectors using the Unstructured API, see [Destination endpoints](/api-reference/api/destination/destination-apis).

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="azure_ai_search",
                  config={
                      "endpoint": "<endpoint>",
                      "index": "<index>",
                      "key": "<key>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "azure_ai_search",
      "config": {
          "endpoint": "<endpoint>",
          "index": "<index>",
          "key": "<key>"
      }
  }'
  ```
</CodeGroup>

## Configuration settings

Replace the preceding placeholders as follows:

<ParamField body="name" type="string" required>
  A unique name for this connector.
</ParamField>

<ParamField body="endpoint" type="string" required>
  The endpoint URL for your Azure AI Search service, in the format `https://<service-name>.search.windows.net`.
</ParamField>

<ParamField body="index" type="string" required>
  The name of the index in your Azure AI Search service.
</ParamField>

<ParamField body="key" type="string">
  The admin API key for your Azure AI Search service. Required if not using Enterprise Connect authentication.
</ParamField>

<h2 id="set-up-enterprise-connect-authentication">
  Set up Enterprise Connect authentication
</h2>

<Note>
  Enterprise Connect is available for [dedicated instance](/business/dedicated-instances/overview) customers only, and must be enabled on your instance before use. Contact your Unstructured account team or [Unstructured Support](https://support.unstructured.io/) to request access and have it enabled.
</Note>

Enterprise Connect is an authentication method for Azure connectors. It uses a federated identity credential to authenticate Unstructured as a customer-configured App Registration. During a workflow run, Unstructured uses this credential to receive a short-lived access token. Tokens expire automatically and no secrets are stored. For an overview, see [Enterprise Connect for Azure](/business/azure/enterprise-connect).

To configure an Azure AI Search connector to use Enterprise Connect, first complete the following setup in your Azure subscription:

1. Enable role-based access on your Azure AI Search service.

   <Warning>
     Azure AI Search defaults to API key authentication only. Complete this step to configure the service to accept the Microsoft Entra ID tokens that Enterprise Connect uses.
   </Warning>

   Follow the instructions in [Enable role-based access control for Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-security-enable-roles) in the Azure AI Search documentation. In the Azure portal, navigate to your search service, select **Settings** > **Keys**, and select **Both** to allow both API key and role-based authentication. If you want to use role-based authentication exclusively, select **Role-based access control** instead.

2. Create an App Registration for Unstructured in Microsoft Entra ID.

   In your Azure subscription, follow the instructions in [How to register an app in Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app) in the Microsoft Entra documentation. Enter a meaningful name for your App Registration (for example, `unstructured-connector`). For **Supported account types**, select **Single tenant only**.

   You are registering this app for a third-party service (Unstructured) accessing resources in your own tenant. This is the [single-tenant scenario](https://learn.microsoft.com/en-us/entra/identity-platform/single-and-multi-tenant-apps) as defined by Microsoft.

3. Add a federated identity credential to the App Registration.

   Follow the instructions in [Configure an app to trust an external identity provider](https://learn.microsoft.com/en-us/entra/workload-id/workload-identity-federation-create-trust) in the Microsoft Entra documentation. Navigate to your App Registration, select **Certificates & secrets** in the left navigation pane, select the **Federated credentials** tab, and select **Add credential**.

   For **Federated credential scenario**, select **Other issuer**.

   Set the following values:

   | Field        | Value                                                                                                                                                                                                                                                                                                            |
   | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
   | **Name**     | A unique name for this credential (for example, `unstructured-federated-credential`). This cannot be changed after creation.                                                                                                                                                                                     |
   | **Issuer**   | The OIDC issuer URL for your Unstructured instance. Get this value from your Unstructured account team. (Example: `https://oidc.prod-aks.example.com/...`)                                                                                                                                                       |
   | **Subject**  | The service account identity for your Unstructured instance. Get this value from your Unstructured account team. (Example: `system:serviceaccount:etl:etl-job-runner`) This value must exactly match what Unstructured provides. If it does not match, the token exchange will fail without displaying an error. |
   | **Audience** | Set this to `api://AzureADTokenExchange`.                                                                                                                                                                                                                                                                        |

   Your Unstructured instance may require more than one federated identity credential. The platform uses separate identities for different operations, such as connection testing and running workflows. If your account team provides more than one Subject value, repeat these steps for each one.

4. Add a role assignment to grant your App Registration access to your Azure AI Search service.

   See [Assign Azure roles using the Azure portal](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) in the Azure documentation. Use the following values:

   * **Scope**: the Azure AI Search service that contains the index you want the connector to access.
   * **Role**: assign both **Search Index Data Contributor** (required for document indexing) and **Search Service Contributor** (required for connection testing). Repeat the role assignment steps for each role.
   * **Members**: select **User, group, or service principal**, then search for and select the App Registration you created in Step 2.

   When you reach the **Review + assign** tab, click **Review + assign** to complete the assignment.

5. Note the following values from your App Registration. You will need them when configuring the connector in Unstructured. Both values are available on the **Overview** page of your App Registration in the [Microsoft Entra admin center](https://entra.microsoft.com).

   * The **Tenant ID** (also called Directory ID) for your Azure subscription.
   * The **Client ID** of your App Registration.

Next, see the **Create the destination connector with Enterprise Connect** section below for examples.

### Create the destination connector with Enterprise Connect

The following examples create an Azure AI Search destination connector using Enterprise Connect authentication.

For more information on working with destination connectors using the Unstructured API, see [Destination endpoints](/api-reference/api/destination/destination-apis).

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="azure_ai_search",
                  config={
                      "endpoint": "<endpoint>",
                      "index": "<index>",
                      "tenant_id": "<tenant-id>",
                      "client_id": "<client-id>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "azure_ai_search",
      "config": {
          "endpoint": "<endpoint>",
          "index": "<index>",
          "tenant_id": "<tenant-id>",
          "client_id": "<client-id>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows.

<ParamField body="name" type="string" required>
  A unique name for this connector.
</ParamField>

<ParamField body="endpoint" type="string" required>
  The endpoint URL for your Azure AI Search service, in the format `https://<service-name>.search.windows.net`.
</ParamField>

<ParamField body="index" type="string" required>
  The name of the index in your Azure AI Search service.
</ParamField>

<ParamField body="tenant_id" type="string" required>
  The Tenant ID (also called Directory ID) for your Azure subscription.
</ParamField>

<ParamField body="client_id" type="string" required>
  The Client ID of your App Registration.
</ParamField>
