A common embedding configuration is a critical component that allows for dynamic selection of embedders and their associated parameters to create vectors from data. This configuration provides the flexibility to choose from various embedding models and fine-tune parameters to optimize the quality and characteristics of the resulting vectors. It enables users to tailor the embedding process to the specific needs of their data and downstream applications, ensuring that the generated vectors effectively capture semantic relationships and contextual information within the dataset.

Configs

  •   api_key: The API key to use, if one is required to generate the embeddings through an API service, such as OpenAI.

  •   aws_access_key_id: The AWS access key ID to be used for AWS-based embedders, such as Amazon Bedrock.

  •   aws_region: The AWS Region ID to be used for AWS-based embedders, such as Amazon Bedrock.

  •   aws_secret_access_key: The AWS secret access key to be used for AWS-based embedders, such as Amazon Bedrock.

  •   embedding_provider: The embedding provider to use while doing embedding. Available values include langchain-openai, langchain-huggingface, langchain-aws-bedrock, langchain-vertexai, langchain-voyageai, and octoai.

  •   embedding_api_key: The API key to use, if one is required to generate the embeddings through an API service, such as OpenAI.

  •   embedding_aws_access_key_id: The AWS access key ID to be used for AWS-based embedders, such as Amazon Bedrock.

  •   embedding_aws_region: The AWS Region ID to be used for AWS-based embedders, such as Amazon Bedrock.

  •   embedding_aws_secret_access_key: The AWS secret access key to be used for AWS-based embedders, such as Amazon Bedrock.

  •   embedding_model_name: The specific model to use for the embedding provider, if necessary.

  •   model_name: The specific model to use for the embedding provider, if necessary.

  •   provider: The embedding provider to use while doing embedding. Available values include langchain-openai, langchain-huggingface, langchain-aws-bedrock, langchain-vertexai, langchain-voyageai, and octoai.

  The default model_name values unless otherwise specified are:

  • langchain-openai: text-embedding-ada-002

  • langchain-huggingface: sentence-transformers/all-MiniLM-L6-v2

  • langchain-aws-bedrock: None

  • langchain-vertexai: textembedding-gecko@001

  • langchain-voyageai: None

  • mixedbread-ai: mixedbread-ai/mxbai-embed-large-v1

  • octoai: thenlper/gte-large