SharePoint
Connect SharePoint to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.
The requirements are as follows.
-
The SharePoint site URL.
- Site collection-level URLs typically have the format
https://<tenant>.sharepoint.com/sites/<site-collection-name>
. - Root site collection-level URLs typically have the format
https://<tenant>.sharepoint.com
. - To process all sites within a tenant, use a site URL of
https://<tenant>-admin.sharepoint.com
.
- Site collection-level URLs typically have the format
-
The path in the SharePoint site from which to start parsing files, for example
"Shared Documents"
. If the connector is to process all sites within the tenant, this filter will be applied to all site document libraries. -
A SharePoint app principal with its application (client) ID, client secret, and the appropriate access permissions.
Complete the steps in the following sections, depending on whether you want to access sites at the site collection level, the root site collection level, or all sites within a tenant.
Two of the main factors in the following sections are the scope of access and the level of administrative permissions required to create the app principal. Tenant-wide app principals offer the broadest access but require the highest level of administrative rights, while site collection app principals are more restricted but can be created by users with lower-level permissions.
Tenant-wide SharePoint app principals
Create a tenant-wide SharePoint app principal when you want the power and flexibility of a principal that can process all sites within a tenant.
SharePoint app principals that are created in the SharePoint admin center have tenant-wide scope and can potentially access all sites within the tenant. Only global or SharePoint administrators typically have access to the following URLs.
-
To create a tenant-wide SharePoint app principal and then get its client ID and client secret, go to the following URL:
https://<tenant>-admin.sharepoint.com/_layouts/15/appregnew.aspx
-
To add access permissions to a tenant-wide SharePoint app principal and then get its client ID and client secret, go to the following URL:
https://<tenant>.sharepoint.com/_layouts/15/appinv.aspx
-
Apply the following permissions XML to the tenant-wide SharePoint app principal:
Available
Right
settings includeRead
,Write
,Manage
, andFullControl
. To learn more, see Add-in permissions in SharePoint.
Learn how to complete these preceding steps. Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.
Root site collection-level SharePoint app principals
Create a root site collection-level SharePoint app principal when you want a principal that can only access a root site collection, for example with a URL
that has the format https://<tenant>.sharepoint.com
.
SharePoint app principals that are created at the root site collection level have a scope limited to the root site collection. Site collection administrators can usually access the following URLs.
-
To create a root site collection-level SharePoint app principal and then get its client ID and client secret, go to the following URL:
https://<tenant>.sharepoint.com/_layouts/15/appregnew.aspx
-
To add access permissions to a root site collection-level SharePoint app principal, go to the following URL:
https://<tenant>.sharepoint.com/_layouts/15/appinv.aspx
-
Apply the following permissions XML to the root site collection-level SharePoint app principal:
Available
Right
settings includeRead
,Write
,Manage
, andFullControl
. To learn more, see Add-in permissions in SharePoint.
Learn how to complete these preceding steps. Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.
Site collection-level SharePoint app principals
Create a site collection-level SharePoint app principal when you want a principal that can only access a specific site collection, for example with a URL
that has or starts with the format https://<tenant>.sharepoint.com/sites/<site-collection-name>
.
SharePoint app principals that are created at the site collection level have the most limited scope, restricted to the specific subsite and its subsites. Site owners or those with appropriate permissions on the subsite can access the following URLs.
-
To create a site collection-level SharePoint app principal, go to the following URL:
https://<tenant>.sharepoint.com/sites/<site-collection-name>/_layouts/15/appregnew.aspx
-
To add access permissions to a site collection-level SharePoint app principal, go to the following URL:
https://<tenant>.sharepoint.com/sites/<site-collection-name>/_layouts/15/appinv.aspx
-
Apply the following permissions XML to the site collection-level SharePoint app principal:
Available
Right
settings includeRead
,Write
,Manage
, andFullControl
. To learn more, see Add-in permissions in SharePoint.
Learn how to complete these preceding steps. Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.
The SharePoint connector dependencies:
You might also need to install additional dependencies, depending on your needs. Learn more.
The following environment variables:
SHAREPOINT_APP_CLIENT_ID
- The application (client) ID for the SharePoint app principal, represented by--client-id
(CLI) orclient_id
(Python).SHAREPOINT_APP_CLIENT_SECRET
- The client secret for the SharePoint app principal, represented by--client-cred
(CLI) orclient_cred
(Python).SHAREPOINT_SITE
- The SharePoint site URL, represented by--site
(CLI) orsite
(Python).SHAREPOINT_PATH
- The path in the SharePoint site from which to start parsing files, represented by--path
(CLI) orpath
(Python).
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.
This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.
For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the --partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
-
To do local file processing, omit
--partition-by-api
(CLI) orpartition_by_api
(Python), or explicitly specifypartition_by_api=False
(Python).Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) orapi_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) orpartition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)- The environment variables
UNSTRUCTURED_API_KEY
andUNSTRUCTURED_API_URL
-
To send files to Unstructured API services for processing, specify
--partition-by-api
(CLI) orpartition_by_api=True
(Python).Unstructured API services also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) orapi_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) orpartition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)- The environment variables
UNSTRUCTURED_API_KEY
andUNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.
Was this page helpful?