Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

First time creating a connector? Read this first.
Ingest your files into Unstructured from Confluence.

Requirements

You will need: The following video provides related setup information for Confluence Cloud:

Document permissions metadata

The source connector outputs any permissions information that it can find in the source location about the processed source documents and associates that information with each corresponding element that is generated. This permissions information is output into the permissions_data field, which is within the data_source field under the element’s metadata field. This information lists the users or groups, if any, that have permissions to read, update, or delete the element’s associated source document.
Unstructured outputs document permissions metadata that is accurate only at the point in time when Unstructured ingested the corresponding document to which those permissions applied. Because this metadata is a point-in-time copy of the permissions in the source location, these metadata outputs that are sent to your destination location are not always guaranteed to match the current permissions in the source location.Whenever Unstructured performs incremental processing of documents for a workflow (in other words, if Reprocess All Files is turned off or set to false for a workflow), that worfklow will not output metadata for any document permissions that have been added, changed, or removed since the previous workflow run, unless the corresponding documents’ content has also been changed since the previous workflow run. This is because Unstructured performs incremental processing of documents only when documents’ content has changed—not when only the documents’ permissions have changed.This permissions metadata should not be used for runtime authorization or access control enforcement.
The following example shows what the output looks like. Ellipses indicate content that has been omitted from this example for brevity.
[
    {
        "...": "...",
        "metadata": {
            "...": "...",
            "data_source": {
                "...": "...",
                "permissions_data": [
                    {
                        "read": {
                            "users": [
                                "11111:11111111-1111-1111-1111-111111111111"
                            ],
                            "groups": [
                                "22222222-2222-2222-2222-22222222",
                                "33333333-3333-3333-3333-33333333"
                            ]
                        }
                    },
                    {
                        "update": {
                            "users": [
                                "44444:44444444-4444-4444-4444-44444444",
                                "55555:55555555-5555-5555-5555-55555555"
                            ],
                            "groups": [
                                "66666666-6666-6666-6666-66666666",
                            ]
                        }
                    },
                    {
                        "delete": {
                            "users": [
                                "77777:77777777-7777-7777-7777-77777777"
                            ],
                            "groups": [
                                "88888888-8888-8888-8888-88888888"
                            ]
                        }
                    }
                ],
                "...": "..."
            }
        }
    }
]
To look up information about a particular Confluence user, use the user’s ID (also known as their account ID) from the preceding output to call the GET /wiki/rest/api/user operation in the Confluence REST API. To look up information about a particular Confluence group, use the group’s ID from the preceding output to call the GET /wiki/rest/api/group/by-id operation in the Confluence REST API.

Examples

To create a Confluence source connector, see the following examples. For more information on working with source connectors using the Unstructured API, see Source endpoints.
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import CreateSourceConnector

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type="confluence",
                config={
                    "url": "<url>",
                    "max_num_of_spaces": <max-num-of-spaces>,
                    "max_num_of_docs_from_each_space": <max-num-of-docs-from-each-space>,
                    "spaces": ["<space-key>", "<space-key>"],
                    "extract_images": <True|False>,
                    "extract_files": <True|False>,

                    # For API token authentication:
                    # "username": "<username>",
                    # "token": "<api-token>",
                    # "cloud": <True|False>,

                    # For personal access token (PAT) authentication:
                    # "token": "<personal-access-token>",
                    # "cloud": False,

                    # For password authentication:
                    # "username": "<username>",
                    # "password": "<password>",
                    # "cloud": <True|False>
                }
            )
        )
    )

    print(response.source_connector_information)

Configuration settings

Replace the preceding placeholders as follows:
name
string
required
A unique name for this connector.
url
string
required
The URL to the target Confluence Cloud instance.
max_num_of_spaces
integer
default:"500"
The maximum number of Confluence spaces to access within the Confluence Cloud instance.
max_num_of_docs_from_each_space
integer
default:"150"
The maximum number of documents to access within each space.
spaces
string
An array of space keys (not display names) specifying the spaces to access, for example: ["luke","paul"]. If no space keys are specified and max_num_of_spaces is exceeded for the instance, you might get unexpected results.
extract_images
boolean
default:"false"
Set to true to download images and replace the HTML content with Base64-encoded images.
extract_files
boolean
default:"false"
Set to true to download any embedded files in pages.
username
string
For username and API token authentication, or username and password authentication: the name or email address of the target user.
token
string
For username and API token authentication: the user’s API token value. For personal access token (PAT) authentication: the target user’s PAT value (set cloud to false).
cloud
boolean
default:"false"
For username and API token authentication, or username and password authentication: set to true if you are using Confluence Cloud.
password
string
For username and password authentication: the user’s password.