> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Couchbase

[Couchbase](https://couchbase.com) is a Distributed NoSQL Cloud Database. Couchbase embraces AI with coding assistance for developers, and vector search for their applications.

Batch process all your records to store structured outputs in a Couchbase database.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Couchbase Capella clusters are supported.
* For [Unstructured Ingest](/open-source/ingestion/overview), Couchbase Capella clusters and local Couchbase server deployments are supported.

<iframe width="560" height="315" src="https://www.youtube.com/embed/9-RIBmIdi70" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

For Couchbase Capella, you will need:

* A [Couchbase Capella account](https://docs.couchbase.com/cloud/get-started/create-account.html#sign-up-free-tier).
* A [Couchbase Capella cluster](https://docs.couchbase.com/cloud/get-started/create-account.html#getting-started).
* A [bucket](https://docs.couchbase.com/cloud/clusters/data-service/manage-buckets.html#add-bucket),
  [scope](https://docs.couchbase.com/cloud/clusters/data-service/scopes-collections.html#create-scope),
  and [collection](https://docs.couchbase.com/cloud/clusters/data-service/scopes-collections.html#create-collection)
  on the cluster.
* The cluster's [public connection string](https://docs.couchbase.com/cloud/get-started/connect.html#connect-from-sdk-cbsh-cli-or-ide).
* The [cluster access name (username) and secret (password)](https://docs.couchbase.com/cloud/clusters/manage-database-users.html#create-database-credentials).
* [Incoming IP address allowance](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) for the cluster.

  To get Unstructured's IP address ranges, go to
  [https://assets.p6m.u10d.net/publicitems/ip-prefixes.json](https://assets.p6m.u10d.net/publicitems/ip-prefixes.json)
  and allow all of the `ip_prefix` fields' values that are listed.

  <Note>These IP address ranges are subject to change. You can always find the latest ones in the preceding file.</Note>

For a local Couchbase server, you will need:

* [Installation of a local Couchbase server](https://docs.couchbase.com/server/current/getting-started/start-here.html).
* [Connection details](https://docs.couchbase.com/server/current/guides/connect.html) to the local Couchbase server.

To learn more about how to set up a Couchbase cluster and play with data, refer to this [tutorial](https://developer.couchbase.com/tutorial-quickstart-flask-python).

The Couchbase DB connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[couchbase]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

These environment variables are required for the Couchbase Connector:

* `CB_CONN_STR` - The Connection String for the Couchbase server, represented by `--connection-string` (CLI) or `connection_string` (Python).
* `CB_USERNAME` - The username for the Couchbase server, represented by `--username` (CLI) or `username` (Python).
* `CB_PASSWORD` - The password for the Couchbase server, represented by `--password` (CLI) or `password` (Python).
* `CB_BUCKET` - The name of the bucket in the Couchbase server, represented by `--bucket` (CLI) or `bucket` (Python).
* `CB_SCOPE` - The name of the scope in the bucket, represented by `--scope` (CLI) or `scope` (Python).
* `CB_COLLECTION` - The name of the collection in the scope, represented by `--collection` (CLI) or `collection` (Python).

Additional available settings include:

* `--collection-id` (CLI) or `collection_id` in `CouchbaseDownloaderConfig` (Python) - Optional for the source connector. The\
  unique key of the ID field in the collection. The default is `id` if not otherwise specified.
  [Learn more](https://docs.couchbase.com/server/current/learn/services-and-indexes/indexes/indexing-and-query-perf.html#introduction-document-keys).

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library.  The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --strategy hi_res \
      --chunk-elements \
      --embedding-provider huggingface \
      --num-processes 2 \
      --verbose \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    couchbase \
      --connection-string $CB_CONN_STR \
      --username $CB_USERNAME \
      --password $CB_PASSWORD \
      --bucket $CB_BUCKET \
      --scope $CB_SCOPE \
      --collection $CB_COLLECTION \
      --num-processes 2 \
      --batch-size 80
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.couchbase import (
      CouchbaseAccessConfig,
      CouchbaseConnectionConfig,
      CouchbaseUploadStagerConfig,
      CouchbaseUploaderConfig
  )
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalConnectionConfig,
      LocalDownloaderConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=CouchbaseConnectionConfig(
              access_config=CouchbaseAccessConfig(
                  password=os.getenv("CB_PASSWORD"),
              ),
              connection_string=os.getenv("CB_CONN_STR"),
              username=os.getenv("CB_USERNAME"),
              bucket=os.getenv("CB_BUCKET"),
              scope=os.getenv("CB_SCOPE"),
              collection=os.getenv("CB_COLLECTION")
          ),
          stager_config=CouchbaseUploadStagerConfig(),
          uploader_config=CouchbaseUploaderConfig(batch_size=100)
      ).run()
  ```
</CodeGroup>

To understand how [vector search](https://www.couchbase.com/products/vector-search/) works in Couchbase, refer to this [tutorial](https://developer.couchbase.com/tutorial-python-langchain-pdf-chat) and the [Couchbase docs](https://docs.couchbase.com/cloud/vector-search/vector-search.html)

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>
