Send processed data from Unstructured to Azure AI Search (formerly Azure Cognitive Search).

You’ll need:

The Azure AI Search (formerly Azure Cognitive Search) prerequisites:

The following video shows how to fulfill the minimum set of Azure AI Search prerequisites:

Here are some more details about these prerequisites:

  • The endpoint and API key for Azure AI Search. Create an endpoint and API key.

  • The name of the index in Azure AI Search. Create an index.

    The Azure AI Search index that you use must have an index schema that is compatible with the schema of the documents that Unstructured produces for you. Unstructured cannot provide a schema that is guaranteed to work in all circumstances. This is because these schemas will vary based on your source files’ types; how you want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

    You can adapt the following index schema example for your own needs:

    {
        "name": "<your-index-name>",
        "fields": [
            {
                "name": "id",
                "type": "Edm.String",
                "key": true,
                "retrievable": true
            },
            {
                "name": "element_id",
                "type": "Edm.String",
                "searchable": false,
                "filterable": true,
                "sortable": true,
                "facetable": false
            },
            {
                "name": "type",
                "type": "Edm.String",
                "searchable": true,
                "filterable": true,
                "sortable": true,
                "facetable": true
            },
            {
                "name": "text",
                "type": "Edm.String",
                "searchable": true,
                "filterable": false,
                "sortable": false,
                "facetable": false
            },
            {
                "name": "embeddings",
                "type": "Collection(Edm.Single)",
                "searchable": true,
                "filterable": false,
                "sortable": false,
                "facetable": false,
                "dimensions": 384,
                "vectorSearchProfile": "embeddings-config"
            },
            {
                "name": "metadata",
                "type": "Edm.ComplexType",
                "fields": [
                    {
                        "name": "parent_id",
                        "type": "Edm.String",
                        "searchable": false,
                        "filterable": true,
                        "sortable": true,
                        "facetable": false
                    },
                    {
                        "name": "page_number",
                        "type": "Edm.Int32",
                        "searchable": false,
                        "filterable": true,
                        "sortable": true,
                        "facetable": true
                    },
                    {
                        "name": "is_continuation",
                        "type": "Edm.Boolean",
                        "searchable": false,
                        "filterable": true,
                        "sortable": true,
                        "facetable": true
                    },
                    {
                        "name": "orig_elements",
                        "type": "Edm.String",
                        "searchable": true,
                        "filterable": false,
                        "sortable": false,
                        "facetable": false
                    }
                ]
            }
        ],
        "vectorSearch": {
            "compressions": [
                {
                    "name": "scalar-quantization",
                    "kind": "scalarQuantization",
                    "rerankWithOriginalVectors": true,
                    "defaultOversampling": 10.0,
                        "scalarQuantizationParameters": {
                            "quantizedDataType": "int8"
                        }
                }
            ],
            "algorithms": [
                {
                    "name": "hnsw-1",
                    "kind": "hnsw",
                    "hnswParameters": {
                        "metric": "cosine",
                        "m": 4,
                        "efConstruction": 400,
                        "efSearch": 500
                    }
                }
            ],
            "profiles": [
                {
                    "name": "embeddings-config",
                    "algorithm": "hnsw-1",
                    "compression": "scalar-quantization"
                }
            ]
        }
    }
    

    See also:

To create the destination connector:

  1. On the sidebar, click Destinations.
  2. Click New Destination.
  3. In the Type drop-down list, select Azure Cognitive Search.
  4. Fill in the fields as described later on this page.
  5. Click Save and Test.
  6. Click Close.

Fill in the following fields:

  • Name (required): A unique name for this connector.
  • Endpoint (required): The endpoint URL for Azure AI Search (formerly Azure Cognitive Search).
  • API Key (required): The API key for Azure AI Search.
  • Index Name (required): The name of the index for Azure AI Search.