Connect Biomed to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on your filesystem.

This connector allows you to extract Biomedical documents from the supported FTP directories:

Make sure to have the Biomed dependencies installed:

Shell
pip install "unstructured-ingest[biomed]"

You need to provide the path, from which the documents should be downloaded. For example, to download the documents in the path: https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/07/, set the path parameter to oa_pdf/07/

#!/usr/bin/env bash

unstructured-ingest \
  biomed \
    --path "oa_pdf/07/07/sbaa031.073.PMC7234218.pdf" \
    --output-dir $LOCAL_FILE_OUTPUT_DIR \
    --num-processes 2 \
    --verbose \
    --preserve-downloads \
    --strategy hi_res

For a full list of the options the Unstructured Ingest CLI accepts check unstructured-ingest biomed --help.