Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

First time creating a connector? Read this first.
Ingest your files into Unstructured from Google Drive.

Requirements

You will need:
  • A Google Cloud account.
  • The Google Drive API enabled in the account. Learn how.
  • Within the account, a Google Cloud service account and its related credentials.json key file or its contents in JSON format. Create a service account. Create credentials for a service account. To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself). To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace <path-to-downloaded-key-file> with the path to the credentials.json key file that you downloaded by following the preceding instructions.
    • For macOS or Linux:
      tr -d '\n' < <path-to-downloaded-key-file>
      
    • For Windows:
      (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
      
  • A Google Drive shared folder or shared drive.
  • Give the service account access to the shared folder or shared drive. To do this, share the folder or drive with the service account’s email address. Learn how. Learn more.
  • Get the shared folder’s ID or shared drive’s ID. This is a part of the URL for your Google Drive shared folder or shared drive, represented in the following URL as {folder_id}: https://drive.google.com/drive/folders/{folder-id}.

Document permissions metadata

The source connector outputs any permissions information that it can find in the source location about the processed source documents and associates that information with each corresponding element that is generated. This permissions information is output into the permissions_data field, which is within the data_source field under the element’s metadata field. This information lists the users or groups, if any, that have permissions to read, update, or delete the element’s associated source document.
Unstructured outputs document permissions metadata that is accurate only at the point in time when Unstructured ingested the corresponding document to which those permissions applied. Because this metadata is a point-in-time copy of the permissions in the source location, these metadata outputs that are sent to your destination location are not always guaranteed to match the current permissions in the source location.Whenever Unstructured performs incremental processing of documents for a workflow (in other words, if Reprocess All Files is turned off or set to false for a workflow), that worfklow will not output metadata for any document permissions that have been added, changed, or removed since the previous workflow run, unless the corresponding documents’ content has also been changed since the previous workflow run. This is because Unstructured performs incremental processing of documents only when documents’ content has changed—not when only the documents’ permissions have changed.This permissions metadata should not be used for runtime authorization or access control enforcement.
The following example shows what the output looks like. Ellipses indicate content that has been omitted from this example for brevity.
[
    {
        "...": "...",
        "metadata": {
            "...": "...",
            "data_source": {
                "...": "...",
                "permissions_data": [
                    {
                        "read": {
                            "users": [
                                "11111111111111111111"
                            ],
                            "groups": [
                                "22222222222222222222",
                                "33333333333333333333"
                            ]
                        }
                    },
                    {
                        "update": {
                            "users": [
                                "44444444444444444444",
                                "55555555555555555555"
                            ],
                            "groups": [
                                "66666666666666666666",
                            ]
                        }
                    },
                    {
                        "delete": {
                            "users": [
                                "77777777777777777777"
                            ],
                            "groups": [
                                "88888888888888888888"
                            ]
                        }
                    }
                ],
                "...": "..."
            }
        }
    }
]
To look up information about a particular Google Cloud user, use the user’s ID along with the Admin SDK API or the People API for Google Cloud. To look up information about a particular Google Cloud group, use the group’s ID along with the Admin SDK API or the Cloud Identity API for Google Cloud.

Examples

To create a Google Drive source connector, see the following examples. For more information on working with source connectors using the Unstructured API, see Source endpoints.
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import CreateSourceConnector

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type="google_drive",
                config={
                    "drive_id": "<drive-id>",
                    "service_account_key": "<service-account-key>",
                    "extensions": [
                        "<extension>",
                        "<extension>"
                    ],
                    "recursive": <True|False>
                }
            )
        )
    )

    print(response.source_connector_information)

Configuration settings

Replace the preceding placeholders as follows:
name
string
required
A unique name for this connector.
drive_id
string
The ID for the target Google Drive folder or drive.
service_account_key
string
The contents of the credentials.json key file as a single-line string.
extensions
string
Set one or more file extension values (such as pdf or docx) to process files with only those extensions. If not specified, all extensions are included.
Do not include the leading dot in the file extensions. For example, use pdf or docx instead of .pdf or .docx.
recursive
boolean
default:"false"
Set to true to recursively process data from subfolders within the target folder or drive.