Connect Airtable to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on your filesystem.

Make sure to have the Airtable dependencies installed:

Shell
pip install "unstructured-ingest[airtable]"

Before connecting your preprocessing pipeline to Airtable, obtain a personal access token to authenticate into Airtable. Check Airtable documentation for more info.

Unless otherwise specified, Unstructured will process all tables within each and every base within an Airtable org. Optionally, you can choose to specify the locations to ingest data from within Airtable using the --list-of-paths argument (list_of_paths in Python example). An Airtable path has the following structure: base_id/table_id(optional)/view_id(optional)/

Refer to Airtable documentation to learn how you can obtain ids in bulk:

Finally, make sure to set the --partition-by-api flag and pass in your API key with --api-key:

Additionally, if you’re using Unstructured Serverless API, your locally deployed Unstructured API, or an Unstructured API deployed on Azure or AWS, you also need to specify the API URL via the --partition-endpoint argument.