Connect Biomed to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on your filesystem.

This connector allows you to extract Biomedical documents from the supported FTP directories:

Make sure to have the Biomed dependencies installed:

Shell
pip install "unstructured-ingest[biomed]"

You need to provide the path, from which the documents should be downloaded. For example, to download the documents in the path: https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/07/, set the path parameter to oa_pdf/07/

Make sure to set the --partition-by-api flag and pass in your API key with --api-key:

Additionally, if you’re using Unstructured Serverless API, your locally deployed Unstructured API, or an Unstructured API deployed on Azure or AWS, you also need to specify the API URL via the --partition-endpoint argument.