Connect Biomed to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on your filesystem.

This connector allows you to extract Biomedical documents from the supported FTP directories:

Make sure to have the Biomed dependencies installed:

Shell
pip install "unstructured-ingest[biomed]"

You need to provide the path, from which the documents should be downloaded. For example, to download the documents in the path: https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/07/, set the path parameter to oa_pdf/07/

For a full list of the options the Unstructured Ingest CLI accepts check unstructured-ingest biomed --help.