Ingest your files into Unstructured from Confluence.

The requirements are as follows.

The following video provides related setup information for Confluence Cloud:

To create a Confluence source connector, see the following examples.

import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import (
    CreateSourceConnector,
    SourceConnectorType,
    ConfuenceSourceConnectorConfigInput
)

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type=SourceConnectorType.CONFLUENCE,
                config=ConfluenceSourceConnectorConfigInput(
                    url="<url>",
                    max_num_of_spaces=<max-num-of-spaces>,
                    max_num_of_docs_from_each_space=<max-num-of-docs-from-each-space>,
                    spaces=["<space-name>", "<space-name>"],
                    extract_images=<True|False>,
                    extract_files=<True|False>,

                    # For API token authentication:

                    username="<username>",
                    token="<api-token>",
                    cloud=<True|False>

                    # For personal access token (PAT) authentication:

                    token="<personal-access-token>",
                    cloud=False

                    # For password authentication:

                    username="<username>",
                    password="<password>",
                    cloud=<True|False>     
                )
            )
        )
    )

    print(response.source_connector_information)

Replace the preceding placeholders as follows:

  • <name> (required) - A unique name for this connector.
  • <url> (required) - The URL to the target Confluence Cloud instance.
  • <max-num-of-spaces> - The maximum number of Confluence spaces to access within the Confluence Cloud instance. The default is 500 unless otherwise specified.
  • <max-num-of-docs-from-each-space> - The maximum number of documents to access within each space. The default is 150 unless otherwise specified.
  • spaces is an array of strings, with each <space-name> specifying the name of a space to access, for example: ["luke","paul"]. By default, if no space names are specified, and the <max-num-of-spaces> is exceeded for the instance, be aware that you might get unexpected results.
  • extract_images - Set to true to download images and replace the HTML content with Base64-encoded images. The default is false if not otherwise specified.
  • extract_files - Set to true to download any embedded files in pages. The default is false if not otherwise specified.

For API token authentication:

  • <username> - The name or email address of the target user.
  • <api-token> - The user’s API token value.
  • For cloud, true if you are using Confluence Cloud. The default is false if not otherwise specified.

For personal access token (PAT) authentication:

  • <personal-access-token> - The target user’s PAT value.
  • cloud should always be false.

For password authentication:

  • <username> - The name or email address of the target user.
  • <password> - The user’s password.
  • For cloud, true if you are using Confluence Cloud. The default is false if not otherwise specified.