This page was recently updated. What do you think about it? Let us know!.

Connect OneDrive to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

You will need:

The OneDrive prerequisites:

  • A OneDrive account.

  • Path to the target OneDrive folder.

  • The client ID, client secret, and tenant ID for the Azure app that is registered with Microsoft Graph and assigned the correct OneDrive authentication scopes in Microsoft Entra ID (formerly Azure Active Directory (Azure AD)). See Registering your app for Microsoft Graph and OneDrive authentication and sign-in.

  • The Entra ID principal name (typically your Entra ID email).

See also the OneDrive API documentation.

The OneDrive connector dependencies:

CLI, Python
pip install "unstructured-ingest[onedrive]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • ONEDRIVE_PATH - The path to the target OneDrive folder, represented by --path (CLI) or path (Python).
  • ONEDRIVE_CLIENT_ID - The client ID for the Azure app that is registered with Microsoft Graph and assigned the correct OneDrive authentication scopes in Microsoft Entra ID (formerly Azure Active Directory (Azure AD)), represented by --client-id (CLI) or client_id (Python).
  • ONEDRIVE_CLIENT_CRED - The client secret for the Azure app, represented by --client-cred (CLI) or client_cred (Python).
  • ONEDRIVE_TENANT - The tenant for the Azure app, represented by --tenant (CLI) or tenant (Python).
  • ONEDRIVE_USER_PNAME - The Entra ID principal name (typically your Entra ID email), represented by --user-pname (CLI) or user_pname (Python).
  • ONEDRIVE_AUTHORITY_URL - The authentication token provider for Microsoft apps (typically https://login.microsoftonline.com, the default if not otherwise specified), represented by --authority-url (CLI) or authority_url (Python).

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector: