us-east
. This is the bucket’s region.s3.us-east.cloud-object-storage.appdomain.cloud
. (Ignore the values of
Private and Direct). This is the bucket’s public endpoint.access_key_id
under cos_hmac_keys
, which represents the HMAC access key ID.secret_access_key
under cos_hmac_keys
, which represents the HMAC secret access key.access_key_id
under cos_hmac_keys
, which represents the HMAC access key ID.secret_access_key
under cos_hmac_keys
, which represents the HMAC secret access key.https://
.<catalog-name>
with the name of the target
catalog and <schema-name>
with the name of the target schema:
sent_from
and there is no
column named sent_from
in the table, the sent_from
element will be dropped upon record insertion. You should modify the preceding
sample table creation statement to add columns for any additional elements that you want to be included upon record
insertion.
To increase query performance, Iceberg uses hidden partitioning to
group similar rows together when writing. You can also
explicitly define partitions as part of the
preceding CREATE TABLE
statement.
CREATE TABLE
statement, or other SQL statements such as ALTER TABLE
, to set this behavior.) To get the
values for the specified environment variables, see the preceding instructions.
IBM_IAM_API_KEY
- An API key for the target IBM Cloud account, represented by --iam-api-key
(CLI) or iam_api_key
(Python).IBM_COS_ACCESS_KEY
- An HMAC access key ID for the target IBM Cloud Object Storage (COS) instance, represented by --access-key-id
(CLI) or access_key_id
(Python).IBM_COS_SECRET_ACCESS_KEY
- The associated HMAC secret access key ID for the target HMAC access key, represented by --secret-access-key
(CLI) or secret_access_key
(Python).IBM_ICEBERG_CATALOG_METASTORE_REST_ENDPOINT
- The metastore REST endpoint value for the target Apache Iceberg catalog in the target IBM watsonx.data data store instance, represented by --iceberg-endpoint
(CLI) or iceberg_endpoint
(Python). Do not include https://
in this value.IBM_COS_BUCKET_PUBLIC_ENDPOINT
- The target COS instance’s endpoint value, represented by --object-storage-endpoint
(CLI) or object_storage_endpoint
(Python).IBM_COS_BUCKET_REGION
- The target COS instance’s region short ID, represented by --object-storage-region
(CLI) or object_storage_region
(Python).IBM_ICEBERG_CATALOG
- The name of the target Iceberg catalog, represented by --catalog
(CLI) or catalog
(Python).IBM_ICEBERG_SCHEMA
- The name of the target namespace (also known as a schema) in the target catalog, represented by --namespace
(CLI) or namespace
(Python).IBM_ICEBERG_TABLE
- The name of the target table in the target schema, represented by --table
(CLI) or table
(Python).IBM_ICEBERG_TABLE_UNIQUE_RECORD_COLUMN
- The name of the column that uniquely identifies each record in the target table, represented by --record-id-key
(CLI) or record_id_key
(Python). The default is record_id
.--max-retries-connection
(CLI) or max_retries_connection
(Python) is an optional parameter that specifies the maximum number of retries when connecting to the catalog. Typically, an optimal setting is 15
. The default is 10
. If specified, it must be a number between 2
and 100
, inclusive.--max-retries
(CLI) or max_retries
(Python) is an optional parameter that specifies the number of times to retry uploading data. Typically, an optimal setting is 150
. The default is 50
. If specified, it must be a number between 2
and 500
, inclusive.--partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
--partition-by-api
(CLI) or partition_by_api
(Python), or explicitly specify partition_by_api=False
(Python).
Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
--partition-by-api
(CLI) or partition_by_api=True
(Python).
Unstructured also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.https://api.unstructuredapp.io/general/v0/general
, which is the API URL for the Unstructured Partition Endpoint. However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.If you do not have an API key, get one now.If the Unstructured API is self-hosted, the process
for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
For details, contact Unstructured Sales at
sales@unstructured.io.