Connect Discord to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on your filesystem.

Make sure to have the Discord dependencies installed:

Shell
pip install "unstructured-ingest[discord]"

To ingests the contents of Discord channels, you need to supply the following information:

  • token: an authentication token used to access Discord API
  • channels: a list of discord channel ids to ingest from

Optionally you can set the number of days to go back in history of the channels via the period argument.

#!/usr/bin/env bash

unstructured-ingest \
  discord \
    --channels 12345678 \
    --token $DISCORD_TOKEN \
    --download-dir $LOCAL_FILE_DOWNLOAD_DIR \
    --output-dir $LOCAL_FILE_OUTPUT_DIR \
    --preserve-downloads \
    --verbose \
    --strategy hi_res

For a full list of the options the Unstructured Ingest CLI accepts check unstructured-ingest discord --help.