Source connectors
GitHub
Connect GitHub to your preprocessing pipeline, and batch process all your documents using unstructured-ingest
to store structured outputs locally on your filesystem.
First, install the GitHub dependencies as shown here.
Provide the GitHub repo URL (url
) to fetch the files from, e.g. "https://github.com/Unstructured-IO/unstructured"
or "Unstructured-IO/unstructured"
, and supply your GitHub access token (git-access-token
).
Learn more about GitHub authentication here.
Optionally, specify a branch and what file types to limit the ingestion to:
git-branch
: The branch to fetch files from. If not given, the default repository branch is used.git-file-glob
: A comma-separated list of file globs to limit which types of files are accepted, e.g.'*.html,*.txt'
For a full list of the options the Unstructured Ingest CLI accepts check unstructured-ingest github --help
.
Was this page helpful?