If you’re new to Unstructured, read this note first.

Before you can create a destination connector, you must first sign up for Unstructured and get your Unstructured API key. After you sign up, the Unstructured user interface (UI) appears, which you use to get the key. To learn how, watch this 40-second how-to video.

After you create the destination connector, add it along with a source connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on Workflow Endpoint quickstart, go directly to the quickstart notebook, or watch the two 4-minute video tutorials for the Unstructured Python SDK.

You can also create destination connectors with the Unstructured user interface (UI). Learn how.

If you need help, reach out to the community on Slack, or contact us directly.

You are now ready to start creating a destination connector! Keep reading to learn how.

Send processed data from Unstructured to Google Cloud Storage.

The requirements are as follows.

  • A Google Cloud service account. Create a service account.

  • A service account key for the service account. See Create a service account key in Create and delete service account keys.

    To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself). To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace <path-to-downloaded-key-file> with the path to the service account key file that you downloaded by following the preceding instructions.

    • For macOS or Linux:
      tr -d '\n' < <path-to-downloaded-key-file>
      
    • For Windows:
      (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
      
  • The URI for a Google Cloud Storage bucket. This URI consists of the target bucket name, plus any target folder within the bucket, expressed as gs://<bucket-name>[/folder-name]. Create a bucket.

    This bucket must have, at minimum, one of the following roles applied to the target Google Cloud service account:

    • Storage Object Viewer for bucket read access.
    • Storage Object Creator for bucket write access.
    • The Storage Object Admin role provides read and write access, plus access to additional bucket operations.

    To apply one of these roles to a service account for a bucket, see Add a principal to a bucket-level policy in Set and manage IAM policies on buckets.

To create a Google Cloud Storage destination connector, see the following examples.

import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateDestinationRequest
from unstructured_client.models.shared import (
    CreateDestinationConnector,
    DestinationConnectorType,
    GCSDestinationConnectorConfigInput
)

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.destinations.create_destination(
        request=CreateDestinationRequest(
            create_destination_connector=CreateDestinationConnector(
                name="<name>",
                type=DestinationConnectorType.GCS,
                config=GCSDestinationConnectorConfigInput(
                    remote_url="<remote-url>",
                    service_account_key="<service-account-key>"
                )
            )
        )
    )

    print(response.destination_connector_information)

Replace the preceding placeholders as follows:

  • <name> (required) - A unique name for this connector.
  • <service-account-key> (required) - The contents of a service account key file, expressed as a single string without line breaks, for a Google Cloud service account that has the required access permissions to the bucket.
  • <remote-url> (required) - The URI for the Google Cloud Storage bucket and any target folder path within the bucket. This URI takes the format gs://<bucket-name>[/folder-name].
  • For recursive (source connector only), set to true to ingest data recursively from any subfolders, starting from the path specified by <remote-url>. The default is false if not otherwise specified.