> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Milvus

<Note>
  First time creating a connector? [Read this first](/api-reference/workflow/connector-first-time-reqs).
</Note>

Send processed data from Unstructured to Milvus.

## Requirements

You will need:

* For the [Unstructured Pipelines](/pipelines/overview) or the [Unstructured API](/api-reference/overview), only Milvus cloud-based instances (such as Milvus on IBM watsonx.data, or Zilliz Cloud) are supported.

* For [Unstructured Ingest](/open-source/ingestion/overview), Milvus local and cloud-based instances are supported.

* For Milvus on IBM watsonx.data, you will need:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/hLCwoe2fCnc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * An [IBM Cloud account](https://cloud.ibm.com/registration).

  * An IBM watsonx.data [Lite plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_prov_lite_1)
    or [Enterprise plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started_1) within your IBM Cloud account.

    * If you are provisioning a Lite plan, be sure to choose the **Generative AI** use case when prompted, as this is the only use case offered that includes Milvus.

  * A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).

    * If you are creating a Milvus service instance within a watsonx.data Lite plan, when you are prompted to choose a Milvus instance size, you can only select **Lite**. Because the Lite
      Milvus instance size is recommended only for 384 dimensions, you should also use an embedding model that uses 384 dimensions only.
    * If you are creating a Milvus service instance within a watsonx.data Enterprise plan, you can choose any available Milvus instance size. However, all Milvus instance sizes other than
      **Custom** are recommended only for 384 dimensions, which means you should use an embedding model that uses 384 dimensions only.
      The **Custom** Milvus instance size is recommended for any number of dimensions.

  * The URI of the instance, which takes the format of `https://`, followed by the instance's **GRPC host**, followed by a colon and the **GRPC port**.
    This takes the format of `https://<host>:<port>`. To get this information, do the following:

    a. Sign in to your IBM Cloud account.<br />
    b. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br />
    c. Expand **Databases**, and then click the name of the target **watsonx.data** plan.<br />
    d. Click **Open web console**.<br />
    e. On the sidebar, click **Infrastructure manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the title bar.<br />
    f. Click the target Milvus service instance.<br />
    g. On the **Details** tab, under **Type**, click **View connect details**.<br />
    h. Under **Service details**, expand **GRPC**, and note the value of **GRPC host** and **GRPC port**.<br />

  * The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.

  * The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.

  * The username and password to access the instance.

    * The username for Milvus on IBM watsonx.data is typically `ibmlhapikey`.

      <Note>
        More recent versions of Milvus on IBM watsonx.data require `ibmlhapikey_<your-IBMid>` instead, where `<your-IBMid>` is
        your IBMid, for example `me@example.com`. To get your IBMid, do the following:

        1. Sign in to your IBM Cloud account.
        2. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
        3. In the sidebar, expand **Manage identities**, and then click **Users**.
        4. In the list of users, click your user name.
        5. On the **User details** tab, in the **Details** tile, note the value of **IBMid**.
      </Note>

    * The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key. To create an IBM Cloud user API key:

      a. Sign in to your IBM Cloud account.<br />
      b. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.<br />
      c. On the sidebar, under **Manage identities**, click **API keys**. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br />
      d. Click **Create**.<br />
      e. Enter some **Name** for the API key.<br />
      f. Optionally, enter some **Description** for the API key.<br />
      g. For **Leaked action**, leave **Disable the leaked key** selected.<br />
      h. For **Session management**, leave **No** selected.<br />
      i. Click **Create**.<br />
      j. Click **Download** (or **Copy**), and then download the API key to a secure location (or paste the copied API key into a secure location). You won't be able to access this API key from this dialog again. If you lose this API key, you can create a new one (and you should then delete the old one).<br />

* For Zilliz Cloud, you will need:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/vwWudGvBEKQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * A [Zilliz Cloud account](https://cloud.zilliz.com/signup).

  * A [Zilliz Cloud cluster](https://docs.zilliz.com/docs/create-cluster).

  * The URI of the cluster, also known as the cluster's *public endpoint*, which takes a format such as
    `https://<cluster-id>.<cluster-type>.<cloud-provider>-<region>.cloud.zilliz.com`. To get this public endpoint value, do the following:

    1. After you sign in to your Zilliz Cloud account, on the sidebar, in the list of available projects, select the project that contains the cluster.
    2. On the sidebar, click **Clusters**.
    3. Click the tile for the cluster.
    4. On the **Cluster Details** tab, on the **Connect** subtab, copy the **Public Endpoint** value.

  * The username and password to access the cluster, as follows:

    1. After you sign in to your Zilliz Cloud account, on the sidebar, in the list of available projects, select the project that contains the cluster.
    2. On the sidebar, click **Clusters**.
    3. Click the tile for the cluster.
    4. On the **Users** tab, copy the name of the user.
    5. Next to the user's name, under **Actions**, click the ellipsis (three dots) icon, and then click **Reset Password**.
    6. Enter a new password for the user, and then click **Confirm**. Copy this new password.

  * The name of the [database](https://docs.zilliz.com/docs/database#create-database) in the instance.

  * The name of the [collection](https://docs.zilliz.com/docs/manage-collections-console#create-collection) in the database.

    The collection must have a defined schema before Unstructured can write to the collection. The minimum viable
    schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
    `type` is an optional field, but highly recommended. For settings for additional Unstructured-produced fields,
    such as the ones within `metadata`, see the usage notes toward the end of this section and adapt them to your specific needs.

    | Field Name                       | Field Type        | Max Length | Dimension |
    | -------------------------------- | ----------------- | ---------- | --------- |
    | `element_id` (primary key field) | **VARCHAR**       | `200`      | --        |
    | `embeddings` (vector field)      | **FLOAT\_VECTOR** | --         | `384`     |
    | `record_id`                      | **VARCHAR**       | `200`      | --        |
    | `text`                           | **VARCHAR**       | `65536`    | --        |
    | `type`                           | **VARCHAR**       | `200`      | --        |

    In the **Create Index** area for the collection, next to **Vector Fields**, click **Edit Index**. Make sure that for the
    `embeddings` field, the **Field Type** is set to **FLOAT\_VECTOR** and the **Metric Type** is set to **Cosine**.

    <Warning>
      The number of dimensions for the `embeddings` field must match the number of dimensions for the embedding model that you plan to use.
    </Warning>

    <Note>
      Fields with a **VARCHAR** data type are limited to a maximum length of 65,535 characters. Attempting to exceed this character count
      will cause Unstructured to throw errors when attempting to write to a Milvus collection, and the associated Unstructured job could fail.
      For example, `metadata` fields that typically exceed these character counts include `image_base64` and `orig_elements`.
    </Note>

    <Info>
      The `record_id`, `element_id`, and `id` fields are closely related, but each has a distinct purpose. For more information, see [How connectors use record IDs, element IDs, and IDs](/api-reference/record-element-id).
    </Info>

* For Milvus local, you will need:

  * A [Milvus instance](https://milvus.io/docs/install-overview.md).
  * The [URI](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Client/MilvusClient.md) of the instance.
  * The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
  * The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database.
    Note the collection requirements at the end of this section.
  * The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.

All Milvus instances require the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
`type` is an optional field, but highly recommended.

This example code demonstrates the use of the
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this schema,
targeting Milvus on IBM watsonx.data. For the `MilvusClient` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:

```python Python theme={null}
import os

from pymilvus import (
    MilvusClient,
    FieldSchema,
    DataType,
    CollectionSchema
)

DATABASE_NAME   = "default"
COLLECTION_NAME = "my_collection"

client = MilvusClient(
    uri="https://" +
        os.getenv("MILVUS_USER") + 
        ":" + 
        os.getenv("MILVUS_PASSWORD") + 
        "@" + 
        os.getenv("MILVUS_GRPC_HOST") + 
        ":" + 
        os.getenv("MILVUS_GRPC_PORT"),
    db_name=DATABASE_NAME
)

# IMPORTANT: The number of dimensions for the "embeddings" field that
# follows must match the number of dimensions for the embedding model 
# that you plan to use.
fields = [
    FieldSchema(name="element_id", dtype=DataType.VARCHAR, is_primary=True, max_length=200),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=384),
    FieldSchema(name="record_id", dtype=DataType.VARCHAR, max_length=200),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
    FieldSchema(name="type", dtype=DataType.VARCHAR,max_length=200, nullable=True) # Optional, but highly recommended.
]

schema = CollectionSchema(fields=fields)

client.create_collection(
    collection_name=COLLECTION_NAME,
    schema=schema,
    using=DATABASE_NAME
)

index_params = client.prepare_index_params()

index_params.add_index(
    field_name="embeddings",
    metric_type="COSINE",
    index_type="IVF_FLAT",
    params={"nlist": 1024}
)

client.create_index(
    collection_name=COLLECTION_NAME,
    index_params=index_params
)

client.load_collection(collection_name=COLLECTION_NAME)
```

For objects in the `metadata` field that Unstructured produces and that you want to store in Milvus, you must create fields in your Milvus collection schema that
follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
child objects:

```json theme={null}
"metadata": {
  "is_extracted": "true",
  "coordinates": {
    "points": [
      [
        134.20055555555555,
        241.36027777777795
      ],
      [
        134.20055555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        241.36027777777795
      ]
    ],
    "system": "PixelSpace",
    "layout_width": 1654,
    "layout_height": 2339
  },
  "filetype": "application/pdf",
  "languages": [
    "eng"
  ],
  "page_number": 1,
  "image_mime_type": "image/jpeg",
  "filename": "realestate.pdf",
  "data_source": {
    "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
    "record_locator": {
      "protocol": "file",
      "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
    }
  },
  "entities": {
    "items": [
      {
        "entity": "HOME FOR FUTURE",
        "type": "ORGANIZATION"
      },
      {
        "entity": "221 Queen Street, Melbourne VIC 3000",
        "type": "LOCATION"
      }
    ],
    "relationships": [
      {
        "from": "HOME FOR FUTURE",
        "relationship": "based_in",
        "to": "221 Queen Street, Melbourne VIC 3000"
      }
    ]
  }
}
```

You could create corresponding fields in your Milvus collection schema by using the following field names, data types, and related settings for the
Python SDK for Milvus:

```python theme={null}
# ...
fields = [
    # Define minimum viable required fields: "element_id", "embeddings", "record_id", and "text".
    # "type" is an optional field, but highly recommended.
    FieldSchema(name="element_id", dtype=DataType.VARCHAR, is_primary=True, max_length=200),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=1536),
    FieldSchema(name="record_id", dtype=DataType.VARCHAR, max_length=200),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
    FieldSchema(name="type",dtype=DataType.VARCHAR,max_length=200, nullable=True),
    # Define optional fields, such as these ones related to the "metadata" field.
    FieldSchema(name="is_extracted", dtype=DataType.VARCHAR, max_length=5, nullable=True),
    FieldSchema(name="coordinates_points", dtype=DataType.JSON, nullable=True),
    FieldSchema(name="coordinates_system", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="coordinates_layout_width", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="coordinates_layout_height", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="filetype", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="languages", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=64, max_capacity=10, nullable=True),
    FieldSchema(name="page_number", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="image_mime_type", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="filename", dtype=DataType.VARCHAR, max_length=256, nullable=True),
    FieldSchema(name="data_source_url", dtype=DataType.VARCHAR, max_length=1024, nullable=True),
    FieldSchema(name="data_source_record_locator_protocol", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="data_source_record_locator_remote_file_path", dtype=DataType.VARCHAR, max_length=1024, nullable=True),
    FieldSchema(name="entities_items", dtype=DataType.JSON, nullable=True),
    FieldSchema(name="entities_relationships", dtype=DataType.JSON, nullable=True)
]
# ...
```

<Note>
  Fields with a `DataType.VARCHAR` data type are limited to a maximum length of 65,535 characters. Attempting to exceed this character count
  will cause Unstructured to throw errors when attempting to write to a Milvus collection, and the associated Unstructured job could fail.
  For example, `metadata` fields that typically exceed these character counts include `image_base64` and `orig_elements`.
</Note>

## Examples

To create a Milvus destination connector, see the following examples.

For more information on working with destination connectors using the Unstructured API, see [Destination endpoints](/api-reference/api/destination/destination-apis).

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="milvus",
                  config={
                      "user": "<user>",
                      "uri": "<uri>",
                      "db_name": "<db-name>",
                      "password": "<password>",
                      "collection_name": "<collection-name>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "milvus",
      "config": {
          "user": "<user>",
          "uri": "<uri>",
          "db_name": "<db-name>",
          "password": "<password>",
          "collection_name": "<collection-name>"
      }
  }'
  ```
</CodeGroup>

## Configuration settings

Replace the preceding placeholders as follows:

<ParamField body="name" type="string" required>
  A unique name for this connector.
</ParamField>

<ParamField body="user" type="string" required>
  The username to access the Milvus instance.
</ParamField>

<ParamField body="uri" type="string" required>
  The URI of the instance, for example: `https://12345.serverless.gcp-us-west1.cloud.zilliz.com.`
</ParamField>

<ParamField body="db_name" type="string" required>
  The name of the database in the instance.
</ParamField>

<ParamField body="password" type="string" required>
  The password corresponding to the username to access the instance.
</ParamField>

<ParamField body="collection_name" type="string" required>
  The name of the collection in the database.
</ParamField>