Connect Delta Table to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on your filesystem.

Make sure to have the Delta Table dependencies installed:

Shell
pip install "unstructured-ingest[delta-table]"

AWS credentials need to be available for use with the storage options. Specify the to the DeltaTable using the table-uri argument, and pass a dictionary of the options to use for the storage backend via storage_options.

For a full list of the options the Unstructured Ingest CLI accepts check unstructured-ingest delta-table --help.