record_id
with a text string data type.
Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
might be written to the table or, in some cases, the operation could fail altogether.s3://<bucket-name>[/<folder-name>]
.
record_id
with a text string data type.
Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
might be written to the table or, in some cases, the operation could fail altogether.az://<container-name>[/<folder-name>]
.
record_id
with a text string data type.
Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
might be written to the table or, in some cases, the operation could fail altogether.gs://<bucket-name>[/<folder-name>]
.
record_id
with a text string data type.
Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
might be written to the table or, in some cases, the operation could fail altogether.LANCEDB_URI
- The local path to the folder where the LanceDB data is stored, represented by --uri
(CLI) or uri
(Python).LANCEDB_TABLE
- The name of the target LanceDB table within the local data folder, represented by --table-name
(CLI) or table_name
(Python).LANCEDB_URI
- The URI for the target Amazon S3 bucket and any target folder path within that bucket. Use the format s3://<bucket-name>[/<folder-name>]
. This is represented by --uri
(CLI) or uri
(Python).LANCEDB_TABLE
- The name of the target LanceDB table within the Amazon S3 bucket, rrepresented by --table-name
(CLI) or table_name
(Python).AWS_ACCESS_KEY_ID
- The AWS access key ID for the AWS IAM entity that has access to the Amazon S3 bucket, represented by --aws-access-key-id
(CLI) or aws_access_key_id
(Python).AWS_SECRET_ACCESS_KEY
- The AWS secret access key for the AWS IAM entity that has access to the Amazon S3 bucket, represented by --aws-secret-access-key
(CLI) or aws_secret_access_key
(Python).LANCEDB_URI
- The URI for the target container within that Azure Blob Storage account and any target folder path within that container. Use the format az://<container-name>[/<folder-name>]
. This is represented by --uri
(CLI) or uri
(Python).LANCEDB_TABLE
- The name of the target LanceDB table within the Azure Blob Storage account, represented by --table-name
(CLI) or table_name
(Python).AZURE_STORAGE_ACCOUNT_NAME
- The name of the target Azure Blob Storage account, represented by --azure-storage-account-name
(CLI) or azure_storage_account_name
(Python).AZURE_STORAGE_ACCOUNT_KEY
- The access key for the Azure Blob Storage account, represented by --azure-storage-account-key
(CLI) or azure_storage_account_key
(Python).LANCEDB_URI
- The URI for the target Google Cloud Storage bucket and any target folder path within that bucket. Use the format gs://<bucket-name>[/<folder-name>]
. This is represented by --uri
(CLI) or uri
(Python).LANCEDB_TABLE
- The name of the target LanceDB table within the Google Cloud Storage bucket, represented by --table-name
(CLI) or table_name
(Python).GCS_SERVICE_ACCOUNT_KEY
- A single-line string that contains the contents of the downloaded service account key file for the Google Cloud service account
that has access to the Google Cloud Storage bucket, represented by --google-service-account-key
(CLI) or google_service_account_key
(Python).--partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
--partition-by-api
(CLI) or partition_by_api
(Python), or explicitly specify partition_by_api=False
(Python).
Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
--partition-by-api
(CLI) or partition_by_api=True
(Python).
Unstructured also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.https://api.unstructuredapp.io/general/v0/general
, which is the API URL for the Unstructured Partition Endpoint. However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.If you do not have an API key, get one now.If the Unstructured API is self-hosted, the process
for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
For details, contact Unstructured Sales at
sales@unstructured.io.