This page was recently updated. What do you think about it? Let us know!.

Batch process all your records to store structured outputs in a Chroma account.

The requirements are as follows.

  • A Chroma server. See Deployment.

    For example, here is a video about how to deploy a Chroma server to AWS:

  • The Chroma server’s hostname or IP address, and the server’s port number.

  • If you are not connecting to the server through HTTP, the path to the server instance.

  • The name of the tenant that you want to access on the server.

  • The name of the database that you want to access in the tenant.

  • The name of the collection that you want to access in the database.

The Chroma connector dependencies:

CLI, Python
pip install "unstructured-ingest[chroma]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • CHROMA_HOST - The , represented by --host (CLI) or host (Python).
  • CHROMA_PORT - The , represented by --port (CLI) or port (Python).
  • CHROMA_TENANT - The name of the tenant that you want to access on the Chroma server, represented by --tenant (CLI) or tenant (Python).
  • CHROMA_DATABASE - The name of the database that you want to access in the tenant, represented by --database (CLI) or database (Python).
  • CHROMA_COLLECTION - The name of the collection that you want to access in the database, represented by --collection-name (CLI) or collection_name (Python).

Additional settings include:

  • --path (CLI) or path (Python): The location where Chroma is persisted if you are not connecting through HTTP.
  • --settings (CLI) or settings (Python): A dictionary of settings to communicate with the Chroma server, for example: '{"persist_directory":"./chroma-persist"}'.
  • --headers (CLI) or headers (Python): A dictionary of headers to send to the Chroma server, for example: '{"Authorization":"Basic()"}'.
  • --ssl (CLI) or ssl (Python): True to use SSL for the connection.

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector: