This page was recently updated. What do you think about it? Let us know!.

Batch process all your records to store structured outputs in Google Cloud Service.

You will need:

The Google Cloud Storage prerequisites:

  • A Google Cloud service account. Create a service account.

  • A service account key for the service account. See Create a service account key in Create and delete service account keys.

    To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself). To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace <path-to-downloaded-key-file> with the path to the service account key file that you downloaded by following the preceding instructions.

    • For macOS or Linux:
      tr -d '\n' < <path-to-downloaded-key-file>
      
    • For Windows:
      (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
      
  • The URI for a Google Cloud Storage bucket. This URI consists of the target bucket name, plus any target folder within the bucket, expressed as gs://<bucket-name>[/folder-name]. Create a bucket.

    This bucket must have, at minimum, one of the following roles applied to the target Google Cloud service account:

    • Storage Object Viewer for bucket read access.
    • Storage Object Creator for bucket write access.
    • The Storage Object Admin role provides read and write access, plus access to additional bucket operations.

    To apply one of these roles to a service account for a bucket, see Add a principal to a bucket-level policy in Set and manage IAM policies on buckets.

The Google Cloud Storage connector dependencies:

CLI, Python
pip install "unstructured-ingest[gcs]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • GCS_SERVICE_ACCOUNT_KEY - The Google Cloud service account key for Google Cloud Storage, represented by --service-account-key (CLI) or service_account_key (Python).
  • GCS_REMOTE_URL - The Google Cloud Storage bucket URL, represented by --remote-url (CLI) or remote_url (Python).

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector: