> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Teradata

Batch process all your records to store structured outputs in Teradata.

The requirements are as follows.

* A Teradata Vantage system that can be accessed by its host name or IP address.

  For example, a Teradata Vantage system in Teradata ClearScape Analytics Experience includes:

  * A Teradata ClearScape Analytics Experience account.
  * An environment in the account.
  * A Teradata Vantage database in the environment.
  * The name and password for a Teradata user who has the appropriate access to the database.

  [Learn how to create these in Teradata ClearScape Analytics Experience](https://developers.teradata.com/quickstarts/get-access-to-vantage/clearscape-analytics-experience/getting-started-with-csae/).

* The system's corresponding host name or IP address.

  For example, you can get these values from Teradata ClearScape Analytics Experience as follows:

  1. Sign in to your Teradata ClearScape Analytics Experience account.<br />
  2. On the sidebar, under **Environments**, click the name of the database's corresponding environment.<br />
  3. Under **Connection details for Vantage database**, use the **Host** value.<br />

* The name of the target database in the system. To get a list of available databases in the system, you can run a Teradata SQL query such as the following:

  ```sql theme={null}
  SELECT DatabaseName 
  FROM DBC.DatabasesV 
  ORDER BY DatabaseName;
  ```

* The name of the target table in the database. To get a list of available tables in a database, you can run a Teradata SQL query such as the following, replacing `<database-name>` with the name of the target database:

  ```sql theme={null}
  SELECT TableName
  FROM DBC.TablesV
  WHERE DatabaseName = '<database-name>' AND TableKind = 'T'
  ORDER BY TableName;
  ```

  When Unstructured writes rows to a table, the table's columns must have a schema that is compatible with Unstructured.
  Unstructured cannot provide a schema that is guaranteed to work for everyone in all circumstances.
  This is because these schemas will vary based on
  your source files' types; how you want Unstructured to partition, chunk, and generate embeddings;
  any custom post-processing code that you run; and other factors.

  In any case, note the following about table schemas:

  * The following columns are always required by Unstructured: `record_id` and `element_id`.
  * The following columns are optional for Unstructured, but highly recommended: `text` and `type`.
  * The rest of the columns are optional and typically will be output by Unstructured as part of the `metadata` field.
  * If Unstructured is generating vector embeddings, the `embeddings` column is also required.

    <Warning>
      The destination connector outputs Unstructured-generated [embeddings](/concepts/embedding) that are not directly compatible
      with Teradata Enterprise Vector Store. To use embeddings with Teradata Enterprise Vector Store, Unstructured
      recommends that you choose from among the following options:

      * Define a column in your target table named `embeddings` that is of type `VARCHAR(64000)`, to store the
        Unstructured-generated embeddings. After Unstructured adds its embeddings to your `embeddings` column,
        choose from among Teradata's options to convert the `embeddings` column's `VARCHAR` values to the
        Teradata [VECTOR Data Type](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Vector-Store-User-Guide/Vector-Store-Components-and-Features/VECTOR-Data-Type)
        yourself.
      * Omit any columns named `embedding`, `message`, or `num_tokens` from your target table. Then choose from
        among Teradata's options (such as [AI\_TextEmbeddings](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Vector-Store-User-Guide/Vector-Store-Components-and-Features/In-Database-Analytic-Functions-for-Vector-Store))
        to have Teradata generate the embeddings for you, instead of having Unstructured generate them.
    </Warning>

  Here is an example table schema that is compatible with Unstructured. It includes all of the required and recommended columns, as
  well as a few additional columns that are typically output by Unstructured as part of the `metadata` field. Be sure to replace
  `<database-name>` with the name of the target database and `<table-name>` with the name of the target table (by Unstructured convention,
  the table name is typically `elements`, but this is not a requirement).

  ```sql theme={null}
  CREATE SET TABLE "<database-name>"."<table-name>" (
    "id" VARCHAR(64) NOT NULL,
    PRIMARY KEY ("id"),
    "record_id" VARCHAR(64),
    "element_id" VARCHAR(64),
    "text" VARCHAR(32000) CHARACTER SET UNICODE,
    "type" VARCHAR(50),
    "embeddings" VARCHAR(64000), -- Add this column only if Unstructured is generating vector embeddings.
    "last_modified" VARCHAR(50),
    "languages" VARCHAR(200),
    "file_directory" VARCHAR(500),
    "filename" VARCHAR(255),
    "filetype" VARCHAR(50),
    "record_locator" VARCHAR(1000),
    "date_created" VARCHAR(50),
    "date_modified" VARCHAR(50),
    "date_processed" VARCHAR(50),
    "permissions_data" VARCHAR(1000),
    "filesize_bytes" INTEGER,
    "parent_id" VARCHAR(64)
  )
  ```

* For the source connector, the name of the primary key column in the table (for example, a column named `id`, typically defined as `"id" VARCHAR(64) NOT NULL, PRIMARY KEY ("id")`).

* For the source connector, the names of any specific columns to fetch from the table. By default, all columns are fetched unless otherwise specified.

* For the destination connector, the name of the column in the table that uniquely identifies each record for Unstructured to perform any necessary record updates. By default convention, Unstructured expects this field to be named `record_id`.

* The name of the Teradata user who has the appropriate access to the target database.

  For example, you can get this from Teradata ClearScape Analytics Experience as follows:

  1. Sign in to your Teradata ClearScape Analytics account.<br />
  2. On the sidebar, under **Environments**, click the name of the database's corresponding environment.<br />
  3. Under **Connection details for Vantage database**, use the **Username** value.<br />

  The Teradata SQL command to get a list of available users is as follows:

  ```sql theme={null}
  SELECT UserName 
  FROM DBC.UsersV 
  ORDER BY UserName;
  ```

* The password for the user, which was set up when the user was created.

  If the user has forgotten their password, the Teradata SQL command to change a user's password is as follows, replacing `<user-name>` with the name of the user and `<new-password>` with the new password:

  ```sql theme={null}
  MODIFY USER <user-name> SET PASSWORD = '<new-password>';
  ```

  To change a user's password, you must be an administrator (such as the `DBC` user or another user with `DROP USER` privileges).

<Info>
  The `record_id`, `element_id`, and `id` fields are closely related, but each has a distinct purpose. For more information, see [How connectors use record IDs, element IDs, and IDs](/api-reference/record-element-id).
</Info>

The Teradata connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[teradata]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

These environment variables:

* `TERADATA_HOST` - The host name, represented by `--host` (CLI) or `host` (Python).
* `TERADATA_PORT` - The port number, represented by `--dbs-port` (CLI) or `dbs_port` (Python). This is optional, and the default is `1025` if not otherwise specified.
* `TERADATA_USERNAME` - The name of the user who has access to the database, represented by `--user` (CLI) or `user` (Python).
* `TERADATA_PASSWORD` - The user's password, represented by `--password` (CLI) or `password` (Python).
* `TERADATA_DATABASE` - The name of the database, represented by `--database` (CLI) or `database` (Python). If not otherwise specified, the default database name is used. To get the name of the default database, you can run the Teradata SQL command `SELECT DATABASE;`.
* `TERADATA_TABLE` - The name of the table, represented by `--table-name` (CLI) or `table_name` (Python).
* `TERADATA_ID_COLUMN` - For the source connector, the name of the column that uniquely identifies each record in the table, represented by `--id-column` (CLI) or `id_column` (Python).
* `TERADATA_RECORD_ID_KEY` - For the destination connector, the name of the column in the table that uniquely identifies each record, represented by `--record-id-key` (CLI) or `record_id_key` (Python).

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --num-processes 2 \
      --verbose \
      --strategy fast \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --metadata-include "$metadata_includes" \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    teradata \
      --host $TERADATA_HOST \
      # --dbs-port $TERADATA_PORT \
      --database $TERADATA_DATABASE \
      --user $TERADATA_USERNAME \
      --password $TERADATA_PASSWORD \
      --table-name $TERADATA_TABLE \
      --record-id-key $TERADATA_RECORD_ID_KEY \
      --batch-size 50
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.sql.teradata import(
      TeradataConnectionConfig,
      TeradataAccessConfig,
      TeradataUploaderConfig,
      TeradataUploadStagerConfig
  )
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=TeradataConnectionConfig(
              access_config=TeradataAccessConfig(
                  password=os.getenv("TERADATA_PASSWORD")
              ),
              host=os.getenv("TERADATA_HOST"),
              # dbs_port=os.getenv("TERADATA_PORT"),
              database=os.getenv("TERADATA_DATABASE"),
              user=os.getenv("TERADATA_USERNAME")
          ),
          stager_config=TeradataUploadStagerConfig(),
          uploader_config=TeradataUploaderConfig(
              batch_size=50,
              table_name=os.getenv("TERADATA_TABLE"),
              record_id_key=os.getenv("TERADATA_RECORD_ID_KEY")
          )
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>