Ingest your files into Unstructured from Google Cloud Storage.

The requirements are as follows.

  • A Google Cloud service account. Create a service account.

  • A service account key for the service account. See Create a service account key in Create and delete service account keys.

    To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself). To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace <path-to-downloaded-key-file> with the path to the service account key file that you downloaded by following the preceding instructions.

    • For macOS or Linux:
      tr -d '\n' < <path-to-downloaded-key-file>
      
    • For Windows:
      (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
      
  • The URI for a Google Cloud Storage bucket. This URI consists of the target bucket name, plus any target folder within the bucket, expressed as gs://<bucket-name>[/folder-name]. Create a bucket.

    This bucket must have, at minimum, one of the following roles applied to the target Google Cloud service account:

    • Storage Object Viewer for bucket read access.
    • Storage Object Creator for bucket write access.
    • The Storage Object Admin role provides read and write access, plus access to additional bucket operations.

    To apply one of these roles to a service account for a bucket, see Add a principal to a bucket-level policy in Set and manage IAM policies on buckets.

To create or change a Google Cloud Storage source connector, see the following examples.

Replace the preceding placeholders as follows:

  • <name> (required) - A unique name for this connector.
  • <service-account-key> (required) - The contents of a service account key file, expressed as a single string without line breaks, for a Google Cloud service account that has the required access permissions to the bucket.
  • <remote-url> (required) - The URI for the Google Cloud Storage bucket and any target folder path within the bucket. This URI takes the format gs://<bucket-name>[/folder-name].
  • For recursive (source connector only), set to true to ingest data recursively from any subfolders, starting from the path specified by <remote-url>. The default is false if not otherwise specified.

To change a connector, replace <connector-id> with the source connector’s unique ID. To get this ID, see List source connectors.