Connect Notion to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on your filesystem.

First, install the Notion dependencies as shown here:

pip install "unstructured-ingest[notion]"

Make sure to provide notion-api-key. To get the credentials for your Notion workspace, follow the steps described in Notion documentation.

Optionally, specify the following parameters:

  • page-ids: Notion page IDs to extract text from.
  • database-ids: Notion database IDs to extract text from.
#!/usr/bin/env bash

unstructured-ingest \
  notion \
    --api-key "<Notion api key>" \
    --output-dir $LOCAL_FILE_OUTPUT_DIR \
    --page-ids "<Comma delimited list of page ids to process>" \
    --database-ids "<Comma delimited list of database ids to process>" \
    --num-processes 2 \
    --verbose \
    --strategy hi_res

For a full list of the options the Unstructured Ingest CLI accepts check unstructured-ingest notion --help.