GitLab
Connect GitLab to your preprocessing pipeline, and batch process all your documents using unstructured-ingest
to store structured outputs locally on your filesystem.
First, install the GitLab dependencies as shown here.
pip install "unstructured-ingest[gitlab]"
Provide the GitLab repo URL (url
) to fetch the files from, e.g. "https://gitlab.com/gitlab-com/content-sites/docsy-gitlab"
,
and supply your GitLab access token (git-access-token
).
Learn more about GitLab authentication here.
Optionally, specify a branch and what file types to limit the ingestion to:
git-branch
: The branch to fetch files from. If not given, the default repository branch is used.git-file-glob
: A comma-separated list of file globs to limit which types of files are accepted, e.g.'*.html,*.txt'
Make sure to set the --partition-by-api
flag and pass in your API key with --api-key
:
Additionally, if you’re using Unstructured Serverless API, your locally deployed Unstructured API, or an Unstructured API
deployed on Azure or AWS, you also need to specify the API URL via the --partition-endpoint
argument.
Was this page helpful?