If you’re new to Unstructured, read this note first.

Before you can create a source connector, you must first sign up for Unstructured and get your Unstructured API key. After you sign up, the Unstructured user interface (UI) appears, which you use to get the key. To learn how, watch this 40-second how-to video.

After you create the source connector, add it along with a destination connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on Workflow Endpoint quickstart, go directly to the quickstart notebook, or watch the two 4-minute video tutorials for the Unstructured Python SDK.

You can also create source connectors with the Unstructured user interface (UI). Learn how.

If you need help, reach out to the community on Slack, or contact us directly.

You are now ready to start creating a source connector! Keep reading to learn how.

Ingest your files into Unstructured from Google Drive.

The requirements are as follows.

  • A Google Cloud account.

  • The Google Drive API enabled in the account. Learn how.

  • Within the account, a Google Cloud service account and its related credentials.json key file or its contents in JSON format. Create a service account. Create credentials for a service account.

    To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself). To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace <path-to-downloaded-key-file> with the path to the credentials.json key file that you downloaded by following the preceding instructions.

    • For macOS or Linux:

      tr -d '\n' < <path-to-downloaded-key-file>
      
    • For Windows:

      (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
      
  • A Google Drive folder.

  • Give the service account access to the folder. To do this, share the folder with the service account’s email address. Learn how. Learn more.

  • Get the folder’s ID. This is a part of the URL for your Google Drive folder represented in the following URL as {folder_id}: https://drive.google.com/drive/folders/{folder-id}.

To create a Google Drive source connector, see the following examples.

import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import (
    CreateSourceConnector,
    SourceConnectorType,
    GoogleDriveSourceConnectorConfigInput
)

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type=SourceConnectorType.GOOGLE_DRIVE,
                config=GoogleDriveSourceConnectorConfigInput(
                    drive_id="<drive-id>",
                    service_account_key="<service-account-key>",
                    extensions=[
                        "<extension>",
                        "<extension>"
                    ],
                    recursive=<True|False>
                )
            )
        )
    )

    print(response.source_connector_information)

Replace the preceding placeholders as follows:

  • <name> (required) - A unique name for this connector.
  • <drive-id> - The ID for the target Google Drive folder.
  • <service-account-key> - The contents of the credentials.json key file as a single-line string.
  • For extensions, set one or more <extension> values (such as .pdf or .docx) to process files with only those extensions. The default is to include all extensions.
  • Set recursive to true to recursively process data from subfolders within the target folder. The default is false if not otherwise specified.