s3:ListBucket
and s3:GetObject
for that bucket. Learn how.
s3:PutObject
for that bucket. Learn how.
protocol://bucket/
(for example, s3://my-bucket/
).
If the target files are in a folder, the path to the target folder in the S3 bucket, formatted as protocol://bucket/path/to/folder/
(for example, s3://my-bucket/my-folder/
).
root
to that
specific username.
In this policy, replace the following:
<my-account-id>
with your AWS account ID.<my-bucket-name>
in two places with the name of your bucket.create-s3-bucket.yaml
. To change
the following bucket policy to restrict it to a specific user in the AWS account, change root
to that
specific username.
create-s3-bucket.sh
.
To change the following bucket policy to restrict it to a specific user in the AWS account, change root
to that
specific username.
In this script, replace the following:
<my-account-id>
with your AWS account ID.<my-unique-bucket-name>
with the name of your bucket.<us-east-1>
with your AWS Region..parquet
) file per file in the source location. For example, for a file in the source location named my-file.pdf
, an associated
file with the extension .parquet
is generated. Various kinds of file transactions can result in additional Parquet files being generated. These Parquet filenames are automatically generated by the Delta Lake engine and are not meant to be manually modified._delta_log
that contains metadata and change history about the .parquet
files. As Parquet files are added to, changed, or removed from
the specified bucket or folder path, the _delta_log
folder is updated with any related metadata and change history details._delta_log
folder (and its contents) describe a single, versioned Delta table. Because of this, Unstructured recommends the following usage best practices:
_delta_log
folder within a Delta table’s directory. This can lead to data loss or table corruption._delta_log
folder (and its contents) together as a unit.
Note that the copied or moved Delta table will
no longer be controlled by the original Delta Tables in S3 destination connector.AWS_S3_URL
- The path to the S3 bucket or folder, formatted as s3://my-bucket/
(if the files are in the bucket’s root) or s3://my-bucket/my-folder/
, represented by --table-uri
(CLI) or table_uri
(Python).AWS_ACCESS_KEY_ID
- The AWS access key ID for the authenticated AWS IAM user, represented by --aws-access-key-id
(CLI) or aws_access_key
(Python).AWS_SECRET_ACCESS_KEY
- The corresponding AWS secret access key, represented by --aws-secret-access-key
(CLI) or aws_secret_access_key
(Python).--partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
--partition-by-api
(CLI) or partition_by_api
(Python), or explicitly specify partition_by_api=False
(Python).
Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
--partition-by-api
(CLI) or partition_by_api=True
(Python).
Unstructured also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.https://api.unstructuredapp.io/general/v0/general
, which is the API URL for the Unstructured Partition Endpoint. However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.If you do not have an API key, get one now.If the Unstructured API is self-hosted, the process
for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
For details, contact Unstructured Sales at
sales@unstructured.io.