> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SharePoint

Connect SharePoint to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

<Note>
  If you are setting up the SharePoint connector for the first time, you can skip past this note.

  Previous versions of the SharePoint connector relied on SharePoint app principals for authentication. Current versions of the
  SharePoint connector no longer support these SharePoint app principals. Microsoft deprecated support for Share Point app principals on November 27, 2023.
  SharePoint app principals will no longer work for SharePoint tenants that were created on or after November 1, 2024, and they will stop working
  for all SharePoint tenants as of April 2, 2026. [Learn more](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/retirement-announcement-for-azure-acs).

  Current versions of the SharePoint connector now rely on Microsoft Entra ID app registrations for authentication.

  To migrate from SharePoint app princpals to Entra ID app regisrations, replace the following settings in your existing SharePoint connector,
  as listed in the requirements following this note:

  * Replace the deprecated SharePoint app principal's application client ID value with your replacement Entra ID app registration's **Application (client) ID** value.
  * Replace the deprecated SharePoint app principal's client secret value with your replacement Entra ID app registration's **Client secret** value.
  * Add your replacement Entra ID app registration's **Directory (tenant) ID** value, token authority URL value, and the correct set of Microsoft Graph access permissions for SharePoint Online.

  If you need migration help, get assistance from our [Slack community](https://short.unstructured.io/pzw05l7), or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
</Note>

* A SharePoint Online plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online.
  [Learn more](https://www.microsoft.com/en-us/microsoft-365/SharePoint/compare-SharePoint-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).

* A OneDrive for business plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes OneDrive.
  (Even if you only plan to use SharePoint Online, you still need a plan that includes OneDrive, because the SharePoint connector is built on OneDrive technology.)
  [Learn more](https://www.microsoft.com/microsoft-365/onedrive/compare-onedrive-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
  OneDrive personal accounts, and Microsoft 365 Free, Basic, Personal, and Family plans are not supported.

* The SharePoint Online and OneDrive plans must share the same Microsoft Entra ID tenant.
  [Learn more](https://learn.microsoft.com/microsoft-365/enterprise/subscriptions-licenses-accounts-and-tenants-for-microsoft-cloud-offerings?view=o365-worldwide).

* The SharePoint Online site base URL. This URL must meet the following requirements:

  1. Starts with `https://`.
  2. Followed by any number of uppercase or lowercase letters, digits, or hyphens (`-`). This portion of the URL must contain at least
     one dot segment after the first domain label; for example, `.com`, `.sharepoint.com`, or `.co.uk`.
  3. Followed optionally by `/sites/` or `/teams/` and ends with any name that can include any number of
     uppercase or lowercase letters, digits, hyphens (`-`), or underscores (`_`); for example, `/sites/my-site` or `/teams/my_team`.

  Some examples of valid site URLs include:

  * `https://contoso.com`
  * `https://my-company.sharepoint.com`
  * `https://example.co.uk`
  * `https://contoso.sharepoint.com/sites/engineering`
  * `https://teams.example.com/teams/marketing`

  Additionally, note the following about collection-level URLs and all site URLs:

  * Site collection-level URLs typically have a format with a pattern similar to `https://<tenant>.sharepoint.com/sites/<site-collection-name>`.
  * Team collection-level URLs typically have a format with a pattern similar to `https://<tenant>.sharepoint.com/teams/<team-collection-name>`.
  * Root site collection-level URLs typically have a format with a pattern similar to `https://<tenant>.sharepoint.com`.
  * To process all sites within a SharePoint tenant, use a site URL with a pattern similar to `https://<tenant>-admin.sharepoint.com`.

  [Learn more](https://learn.microsoft.com/microsoft-365/community/query-string-url-tricks-sharepoint-m365).

* The display name of the SharePoint Online library to use. The default is `Documents`.

* The path to the SharePoint Online library to use. By default, the root of the target library is used.
  To start from a path other than the root, enter the path that you want to use, beginning from the root. For example, to use
  the **my-folder > my-subfolder** path in the target library, you would specify `my-folder/my-subfolder`.

  The following video shows how to get the site URL and a path within the site:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/E3fRwJU-KTc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Two types of authentication are supported: client credentials and a username and password. Both authentication types require a
  Microsoft Entra ID app registration. You will need to provide
  the **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Entra ID app registration, and the
  app registration must have the correct set of Microsoft Graph access permissions. These permissions include:

  * `Sites.ReadWrite.All` (if both reading and writing are needed)
  * `User.Read.All`
    [Learn more](https://learn.microsoft.com/answers/questions/2116616/service-principal-access-to-sharepoint-online).

  1. [Create an Entra ID app registration](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app?pivots=portal).
  2. [Add Graph access permissions to an app registration](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#add-permissions-to-an-application).
  3. [Grant consent for the added Graph permissions](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#grant-consent-for-the-added-permissions-for-the-enterprise-application).

  The following video shows how to create an Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/aBAY-LKLPSo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  The following video shows how to add the correct set of Graph access permissions to the Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/X7fnRYyxy0Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The token authority URL for your Microsoft Entra ID app registration. This is typically `https://login.microsoftonline.com`

* For username and password authentication, you must also provide the User Principal Name (UPN) and its password for the OneDrive account in the Microsoft Entra ID tenant. This UPN is typically the OneDrive account user's email address. To find a UPN:

  1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
     or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
  2. In the **Users** section, click **Active users**.
  3. Locate the user account in the list of active users.
  4. The UPN is displayed in the **Username** column.

  The following video shows how to get a UPN:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/H0yYfhfyCE0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

The SharePoint connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[sharepoint]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `SHAREPOINT_SITE_URL` - The SharePoint site URL, represented by `--site` (CLI) or `site` (Python).
* `SHAREPOINT_LIBRARY_NAME` - The display name of the SharePoint library to use, represented by `--library` (CLI) or `library` (Python). The default is `Documents`.
* `SHAREPOINT_SITE_PATH` - The path to use within the library, represented by `--path` (CLI) or `path` (Python). The default is the root of the target library. To use a different path, specify the correct path format as described previously in this article.
* `ENTRA_ID_APP_CLIENT_ID` - The **Application (client) ID** value for the Microsoft Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
* `ENTRA_ID_APP_TENANT_ID` - The **Directory (tenant) ID** value for the Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
* `ENTRA_ID_TOKEN_AUTHORITY_URL` - The token authority URL for the Entra ID app registration, represented by `--authority-url` (CLI) or `authority_url` (Python). The default is `https://login.microsoftonline.com`.
* `ENTRA_ID_APP_CLIENT_SECRET` - The **Client secret** value for the Entra ID app registration, represented by `--client-cred` (CLI) or `client_cred` (Python).
* `ENTRA_ID_USER_PRINCIPAL_NAME` - For username and password authentication, the User Principal Name (UPN) for the target OneDrive account in the Microsoft Entra ID tenant.
* `ENTRA_ID_USER_PASSWORD` - For username and password authentication, the password for the target UPN.

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  unstructured-ingest \
    sharepoint \
      --site $SHAREPOINT_SITE_URL \
      --library $SHAREPOINT_LIBRARY_NAME \
      --path $SHAREPOINT_SITE_PATH \
      --recursive \
      --client-id $ENTRA_ID_APP_CLIENT_ID \
      --tenant $ENTRA_ID_APP_TENANT_ID \
      --authority-url $ENTRA_ID_TOKEN_AUTHORITY_URL \
      --client-cred $ENTRA_ID_APP_CLIENT_SECRET \
      --user-pname $ENTRA_ID_USER_PRINCIPAL_NAME \
      --password $ENTRA_ID_USER_PASSWORD \
      --download-dir $LOCAL_FILE_DOWNLOAD_DIR \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --strategy hi_res \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.sharepoint import (
      SharepointIndexerConfig,
      SharepointDownloaderConfig,
      SharepointConnectionConfig,
      SharepointAccessConfig
  )

  from unstructured_ingest.processes.connectors.local import (
      LocalUploaderConfig
  )

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=SharepointIndexerConfig(
              path=os.getenv("SHAREPOINT_SITE_PATH"),
              recursive=True  # True to recursively download files in their respective folders.
          ),
          downloader_config=SharepointDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=SharepointConnectionConfig(
              access_config=SharepointAccessConfig(
                  client_cred=os.getenv("ENTRA_ID_APP_CLIENT_SECRET"),
                  password=os.getenv("ENTRA_ID_USER_PASSWORD"), # For username and password authentication.
              ),
              site=os.getenv("SHAREPOINT_SITE_URL"),
              library=os.getenv("SHAREPOINT_LIBRARY_NAME"),
              client_id=os.getenv("ENTRA_ID_APP_CLIENT_ID"),
              tenant=os.getenv("ENTRA_ID_APP_TENANT_ID"),
              authority_url=os.getenv("ENTRA_ID_TOKEN_AUTHORITY_URL"),
              user_pname=os.getenv("ENTRA_ID_USER_PRINCIPAL_NAME") # For username and password authentication.            
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "reprocess": True,
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>
