> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Milvus

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free). After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/ui/overview#how-am-i-billed%3F).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to create your destination connector.

  After you create the destination connector, add it along with a
  [source connector](/ui/sources/overview) to a [workflow](/ui/workflows). Then run the worklow as a
  [job](/ui/jobs). To learn how, try out the [hands-on UI quickstart](/ui/quickstart#remote-quickstart) or watch the 4-minute
  [video tutorial](https://www.youtube.com/watch?v=Wn2FfHT6H-o).

  You can also create destination connectors with the Unstructured API.
  [Learn how](/api-reference/workflow/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Milvus.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Milvus cloud-based instances (such as Milvus on IBM watsonx.data, or Zilliz Cloud) are supported.

* For [Unstructured Ingest](/open-source/ingestion/overview), Milvus local and cloud-based instances are supported.

* For Milvus on IBM watsonx.data, you will need:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/hLCwoe2fCnc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * An [IBM Cloud account](https://cloud.ibm.com/registration).

  * An IBM watsonx.data [Lite plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_prov_lite_1)
    or [Enterprise plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started_1) within your IBM Cloud account.

    * If you are provisioning a Lite plan, be sure to choose the **Generative AI** use case when prompted, as this is the only use case offered that includes Milvus.

  * A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).

    * If you are creating a Milvus service instance within a watsonx.data Lite plan, when you are prompted to choose a Milvus instance size, you can only select **Lite**. Because the Lite
      Milvus instance size is recommended only for 384 dimensions, you should also use an embedding model that uses 384 dimensions only.
    * If you are creating a Milvus service instance within a watsonx.data Enterprise plan, you can choose any available Milvus instance size. However, all Milvus instance sizes other than
      **Custom** are recommended only for 384 dimensions, which means you should use an embedding model that uses 384 dimensions only.
      The **Custom** Milvus instance size is recommended for any number of dimensions.

  * The URI of the instance, which takes the format of `https://`, followed by the instance's **GRPC host**, followed by a colon and the **GRPC port**.
    This takes the format of `https://<host>:<port>`. To get this information, do the following:

    a. Sign in to your IBM Cloud account.<br />
    b. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br />
    c. Expand **Databases**, and then click the name of the target **watsonx.data** plan.<br />
    d. Click **Open web console**.<br />
    e. On the sidebar, click **Infrastructure manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the title bar.<br />
    f. Click the target Milvus service instance.<br />
    g. On the **Details** tab, under **Type**, click **View connect details**.<br />
    h. Under **Service details**, expand **GRPC**, and note the value of **GRPC host** and **GRPC port**.<br />

  * The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.

  * The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.

  * The username and password to access the instance.

    * The username for Milvus on IBM watsonx.data is typically `ibmlhapikey`.

      <Note>
        More recent versions of Milvus on IBM watsonx.data require `ibmlhapikey_<your-IBMid>` instead, where `<your-IBMid>` is
        your IBMid, for example `me@example.com`. To get your IBMid, do the following:

        1. Sign in to your IBM Cloud account.
        2. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
        3. In the sidebar, expand **Manage identities**, and then click **Users**.
        4. In the list of users, click your user name.
        5. On the **User details** tab, in the **Details** tile, note the value of **IBMid**.
      </Note>

    * The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key. To create an IBM Cloud user API key:

      a. Sign in to your IBM Cloud account.<br />
      b. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.<br />
      c. On the sidebar, under **Manage identities**, click **API keys**. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br />
      d. Click **Create**.<br />
      e. Enter some **Name** for the API key.<br />
      f. Optionally, enter some **Description** for the API key.<br />
      g. For **Leaked action**, leave **Disable the leaked key** selected.<br />
      h. For **Session management**, leave **No** selected.<br />
      i. Click **Create**.<br />
      j. Click **Download** (or **Copy**), and then download the API key to a secure location (or paste the copied API key into a secure location). You won't be able to access this API key from this dialog again. If you lose this API key, you can create a new one (and you should then delete the old one).<br />

* For Zilliz Cloud, you will need:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/vwWudGvBEKQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * A [Zilliz Cloud account](https://cloud.zilliz.com/signup).

  * A [Zilliz Cloud cluster](https://docs.zilliz.com/docs/create-cluster).

  * The URI of the cluster, also known as the cluster's *public endpoint*, which takes a format such as
    `https://<cluster-id>.<cluster-type>.<cloud-provider>-<region>.cloud.zilliz.com`. To get this public endpoint value, do the following:

    1. After you sign in to your Zilliz Cloud account, on the sidebar, in the list of available projects, select the project that contains the cluster.
    2. On the sidebar, click **Clusters**.
    3. Click the tile for the cluster.
    4. On the **Cluster Details** tab, on the **Connect** subtab, copy the **Public Endpoint** value.

  * The username and password to access the cluster, as follows:

    1. After you sign in to your Zilliz Cloud account, on the sidebar, in the list of available projects, select the project that contains the cluster.
    2. On the sidebar, click **Clusters**.
    3. Click the tile for the cluster.
    4. On the **Users** tab, copy the name of the user.
    5. Next to the user's name, under **Actions**, click the ellipsis (three dots) icon, and then click **Reset Password**.
    6. Enter a new password for the user, and then click **Confirm**. Copy this new password.

  * The name of the [database](https://docs.zilliz.com/docs/database#create-database) in the instance.

  * The name of the [collection](https://docs.zilliz.com/docs/manage-collections-console#create-collection) in the database.

    The collection must have a defined schema before Unstructured can write to the collection. The minimum viable
    schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
    `type` is an optional field, but highly recommended. For settings for additional Unstructured-produced fields,
    such as the ones within `metadata`, see the usage notes toward the end of this section and adapt them to your specific needs.

    | Field Name                       | Field Type        | Max Length | Dimension |
    | -------------------------------- | ----------------- | ---------- | --------- |
    | `element_id` (primary key field) | **VARCHAR**       | `200`      | --        |
    | `embeddings` (vector field)      | **FLOAT\_VECTOR** | --         | `384`     |
    | `record_id`                      | **VARCHAR**       | `200`      | --        |
    | `text`                           | **VARCHAR**       | `65536`    | --        |
    | `type`                           | **VARCHAR**       | `200`      | --        |

    In the **Create Index** area for the collection, next to **Vector Fields**, click **Edit Index**. Make sure that for the
    `embeddings` field, the **Field Type** is set to **FLOAT\_VECTOR** and the **Metric Type** is set to **Cosine**.

    <Warning>
      The number of dimensions for the `embeddings` field must match the number of dimensions for the embedding model that you plan to use.
    </Warning>

    <Note>
      Fields with a **VARCHAR** data type are limited to a maximum length of 65,535 characters. Attempting to exceed this character count
      will cause Unstructured to throw errors when attempting to write to a Milvus collection, and the associated Unstructured job could fail.
      For example, `metadata` fields that typically exceed these character counts include `image_base64` and `orig_elements`.
    </Note>

* For Milvus local, you will need:

  * A [Milvus instance](https://milvus.io/docs/install-overview.md).
  * The [URI](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Client/MilvusClient.md) of the instance.
  * The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
  * The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database.
    Note the collection requirements at the end of this section.
  * The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.

All Milvus instances require the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
`type` is an optional field, but highly recommended.

This example code demonstrates the use of the
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this schema,
targeting Milvus on IBM watsonx.data. For the `MilvusClient` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:

```python Python theme={null}
import os

from pymilvus import (
    MilvusClient,
    FieldSchema,
    DataType,
    CollectionSchema
)

DATABASE_NAME   = "default"
COLLECTION_NAME = "my_collection"

client = MilvusClient(
    uri="https://" +
        os.getenv("MILVUS_USER") + 
        ":" + 
        os.getenv("MILVUS_PASSWORD") + 
        "@" + 
        os.getenv("MILVUS_GRPC_HOST") + 
        ":" + 
        os.getenv("MILVUS_GRPC_PORT"),
    db_name=DATABASE_NAME
)

# IMPORTANT: The number of dimensions for the "embeddings" field that
# follows must match the number of dimensions for the embedding model 
# that you plan to use.
fields = [
    FieldSchema(name="element_id", dtype=DataType.VARCHAR, is_primary=True, max_length=200),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=384),
    FieldSchema(name="record_id", dtype=DataType.VARCHAR, max_length=200),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
    FieldSchema(name="type", dtype=DataType.VARCHAR,max_length=200, nullable=True) # Optional, but highly recommended.
]

schema = CollectionSchema(fields=fields)

client.create_collection(
    collection_name=COLLECTION_NAME,
    schema=schema,
    using=DATABASE_NAME
)

index_params = client.prepare_index_params()

index_params.add_index(
    field_name="embeddings",
    metric_type="COSINE",
    index_type="IVF_FLAT",
    params={"nlist": 1024}
)

client.create_index(
    collection_name=COLLECTION_NAME,
    index_params=index_params
)

client.load_collection(collection_name=COLLECTION_NAME)
```

For objects in the `metadata` field that Unstructured produces and that you want to store in Milvus, you must create fields in your Milvus collection schema that
follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
child objects:

```json  theme={null}
"metadata": {
  "is_extracted": "true",
  "coordinates": {
    "points": [
      [
        134.20055555555555,
        241.36027777777795
      ],
      [
        134.20055555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        241.36027777777795
      ]
    ],
    "system": "PixelSpace",
    "layout_width": 1654,
    "layout_height": 2339
  },
  "filetype": "application/pdf",
  "languages": [
    "eng"
  ],
  "page_number": 1,
  "image_mime_type": "image/jpeg",
  "filename": "realestate.pdf",
  "data_source": {
    "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
    "record_locator": {
      "protocol": "file",
      "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
    }
  },
  "entities": {
    "items": [
      {
        "entity": "HOME FOR FUTURE",
        "type": "ORGANIZATION"
      },
      {
        "entity": "221 Queen Street, Melbourne VIC 3000",
        "type": "LOCATION"
      }
    ],
    "relationships": [
      {
        "from": "HOME FOR FUTURE",
        "relationship": "based_in",
        "to": "221 Queen Street, Melbourne VIC 3000"
      }
    ]
  }
}
```

You could create corresponding fields in your Milvus collection schema by using the following field names, data types, and related settings for the
Python SDK for Milvus:

```python  theme={null}
# ...
fields = [
    # Define minimum viable required fields: "element_id", "embeddings", "record_id", and "text".
    # "type" is an optional field, but highly recommended.
    FieldSchema(name="element_id", dtype=DataType.VARCHAR, is_primary=True, max_length=200),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=1536),
    FieldSchema(name="record_id", dtype=DataType.VARCHAR, max_length=200),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
    FieldSchema(name="type",dtype=DataType.VARCHAR,max_length=200, nullable=True),
    # Define optional fields, such as these ones related to the "metadata" field.
    FieldSchema(name="is_extracted", dtype=DataType.VARCHAR, max_length=5, nullable=True),
    FieldSchema(name="coordinates_points", dtype=DataType.JSON, nullable=True),
    FieldSchema(name="coordinates_system", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="coordinates_layout_width", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="coordinates_layout_height", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="filetype", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="languages", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=64, max_capacity=10, nullable=True),
    FieldSchema(name="page_number", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="image_mime_type", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="filename", dtype=DataType.VARCHAR, max_length=256, nullable=True),
    FieldSchema(name="data_source_url", dtype=DataType.VARCHAR, max_length=1024, nullable=True),
    FieldSchema(name="data_source_record_locator_protocol", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="data_source_record_locator_remote_file_path", dtype=DataType.VARCHAR, max_length=1024, nullable=True),
    FieldSchema(name="entities_items", dtype=DataType.JSON, nullable=True),
    FieldSchema(name="entities_relationships", dtype=DataType.JSON, nullable=True)
]
# ...
```

<Note>
  Fields with a `DataType.VARCHAR` data type are limited to a maximum length of 65,535 characters. Attempting to exceed this character count
  will cause Unstructured to throw errors when attempting to write to a Milvus collection, and the associated Unstructured job could fail.
  For example, `metadata` fields that typically exceed these character counts include `image_base64` and `orig_elements`.
</Note>

To create the destination connector:

1. On the sidebar, click **Connectors**.
2. Click **Destinations**.
3. Cick **New** or **Create Connector**.
4. Give the connector some unique **Name**.
5. In the **Provider** area, click **Milvus**.
6. Click **Continue**.
7. Follow the on-screen instructions to fill in the fields as described later on this page.
8. Click **Save and Test**.

Fill in the following fields for Milvus on IBM watsonx.data:

* **Name** (*required*): A unique name for this connector.

* **GRPC Host** (*required*): The GRPC host name for the Milvus instance.

* **GRPC Port**: The GRPC port number for the instance.

* **DB Name**: The name of the database in the instance. The default is `default` if not otherwise specified.

* **Collection Name** (*required*): The name of the collection in the database.

* **Username**: The username to access the Milvus instance. The default is `ibmlhapikey` if not otherwise specified.

  <Note>
    More recent versions of Milvus on IBM watsonx.data require `ibmlhapikey_<your-IBMid>` instead, where `<your-IBMid>` is
    your IBMid, for example `me@example.com`. To get your IBMid, do the following:

    1. Sign in to your IBM Cloud account.
    2. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
    3. In the sidebar, expand **Manage identities**, and then click **Users**.
    4. In the list of users, click your user name.
    5. On the **User details** tab, in the **Details** tile, note the value of **IBMid**.
  </Note>

* **API Key** (*required*): The IBM Cloud user API key.

Fill in the following fields for Milvus on Zilliz Cloud:

* **Name** (*required*): A unique name for this connector.
* **URI** (*required*): The URI of the Milvus instance, for example: `https://12345.serverless.gcp-us-west1.cloud.zilliz.com`.
* **DB Name**: The name of the database in the instance. The default is `default` if not otherwise specified.
* **Collection Name** (*required*): The name of the collection in the database.
* **Username** (*required*): The username to access the Milvus instance.
* **Password** (*required*): The password corresponding to the username to access the instance.

Fill in the following fields for other Milvus deployments:

* **Name** (*required*): A unique name for this connector.
* **URI** (*required*): The URI of the Milvus instance, for example: `https://12345.serverless.gcp-us-west1.cloud.zilliz.com`.
* **DB Name**: The name of the database in the instance. The default is `default` if not otherwise specified.
* **Collection Name** (*required*): The name of the collection in the database.
* **Username** (*required*): The username to access the Milvus instance.
* **Password** (*required*): The password corresponding to the username to access the instance.
