This page was recently updated. What do you think about it? Let us know!.

Connect GitLab to your preprocessing pipeline, and use the Unstructured CLI or Python to batch process all your documents and store structured outputs locally on your filesystem.

You will need:

The GitLab prerequisites:

  • A GitLab account.
  • The URL for the target GitLab repository.
  • A GitLab access token that allows access to the repository.
  • Optionally, the name of a specific branch to access in the repository. The repository’s default repository branch is used if not otherwise specified.
  • Optionally, a list of file globs to limit which types of files are accepted, for example *.html or *.txt.

The GitLab connector dependencies:

CLI, Python
pip install "unstructured-ingest[gitlab]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • GITLAB_REPO_URL - The URL for the target GitLab repository, represented by --url (CLI) or url (Python).
  • GITLAB_TOKEN - The GitLab access token that allows access to the repository, represented by --git-access-token (CLI) or access_token (Python).

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector: