> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Confluence

<Note>
  First time creating a connector? [Read this first](/api-reference/workflow/connector-first-time-reqs).
</Note>

Ingest your files into Unstructured from Confluence.

## Requirements

You will need:

* A [Confluence Cloud account](https://www.atlassian.com/software/confluence/pricing) or
  [Confluence Data Center installation](https://confluence.atlassian.com/doc/installing-confluence-data-center-203603.html).

* The site URL for your [Confluence Cloud account](https://community.atlassian.com/t5/Confluence-questions/confluence-cloud-url/qaq-p/1157148) or
  [Confluence Data Center installation](https://confluence.atlassian.com/confkb/how-to-find-your-site-url-to-set-up-the-confluence-data-center-and-server-mobile-app-938025792.html).

* A user in your [Confluence Cloud account](https://confluence.atlassian.com/cloud/invite-edit-and-remove-users-744721624.html) or
  [Confluence Data Center installation](https://confluence.atlassian.com/doc/add-and-invite-users-138313.html).

* The user must have the correct permissions in your
  [Conflunce Cloud account](https://support.atlassian.com/confluence-cloud/docs/what-are-confluence-cloud-permissions-and-restrictions/) or
  [Confluence Data Center installation](https://confluence.atlassian.com/doc/permissions-and-restrictions-139557.html) to
  access the target spaces and pages.

* One of the following:

  * For Confluence Cloud or Confluence Data Center, the target user's name or email address, and password.
    [Change a Confluence Cloud user's password](https://support.atlassian.com/confluence-cloud/docs/change-your-confluence-password/).
    [Change a Confluence Data Center user's password](https://confluence.atlassian.com/doc/change-your-password-139416.html).
  * For Confluence Cloud only, the target user's name or email address, and API token.
    [Create an API token](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).
  * For Confluence Data Center only, the target user's personal access token (PAT).
    [Create a PAT](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html).

* Optionally, the keys (not display names) of the specific [spaces](https://support.atlassian.com/confluence-cloud/docs/navigate-spaces/) in the Confluence instance to access. To get a space's key,
  which is different from a space's display name, open the space in your web browser and look at the URL. The space's key is the part of the URL after `spaces/` but before the next `/` after that.

The following video provides related setup information for Confluence Cloud:

<iframe width="560" height="315" src="https://www.youtube.com/embed/3PsFJkcIotI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

## Document permissions metadata

The source connector outputs any permissions information that it can find in the source location about the processed source documents and associates that information with each
corresponding element that is generated. This permissions information is output into the `permissions_data` field, which is within the
`data_source` field under the element's `metadata` field. This information lists the users or groups, if any, that have
permissions to read, update, or delete the element's associated source document.

<Warning>
  The permissions metadata Unstructured outputs should not be used for runtime authorization
  or access control enforcement.

  Unstructured outputs document permissions metadata that is accurate only
  at the point in time when Unstructured ingested the corresponding document
  to which those permissions applied. Because this metadata is a
  point-in-time copy of the permissions in the source location, these
  metadata outputs that are sent to your destination location are not
  always guaranteed to match the current permissions in the source location.

  Also, be aware that Unstructured updates permission metadata for a document only when the document's content has changed.

  This is because Unstructured performs incremental processing of documents only when documents' *content* has changed—not when only the documents'
  *permissions* have changed. Whenever Unstructured performs incremental processing of documents for a
  workflow (in other words, if **Reprocess All Files** is turned off or set
  to false for a workflow), that worfklow will not output metadata for any
  document permissions that have been added, changed, or removed since the
  previous workflow run, *unless* the corresponding documents' content
  has also been changed since the previous workflow run.
</Warning>

The following example shows what the output looks like. Ellipses indicate content that has been omitted from this example for brevity.

```json theme={null}
[
    {
        "...": "...",
        "metadata": {
            "...": "...",
            "data_source": {
                "...": "...",
                "permissions_data": [
                    {
                        "read": {
                            "users": [
                                "11111:11111111-1111-1111-1111-111111111111"
                            ],
                            "groups": [
                                "22222222-2222-2222-2222-22222222",
                                "33333333-3333-3333-3333-33333333"
                            ]
                        }
                    },
                    {
                        "update": {
                            "users": [
                                "44444:44444444-4444-4444-4444-44444444",
                                "55555:55555555-5555-5555-5555-55555555"
                            ],
                            "groups": [
                                "66666666-6666-6666-6666-66666666",
                            ]
                        }
                    },
                    {
                        "delete": {
                            "users": [
                                "77777:77777777-7777-7777-7777-77777777"
                            ],
                            "groups": [
                                "88888888-8888-8888-8888-88888888"
                            ]
                        }
                    }
                ],
                "...": "..."
            }
        }
    }
]
```

To look up information about a particular Confluence user, use the user's ID (also known as their *account ID*) from the preceding output to call the
[GET /wiki/rest/api/user](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-users/#api-wiki-rest-api-user-get)
operation in the Confluence REST API.

To look up information about a particular Confluence group, use the group's ID from the preceding output to call the
[GET /wiki/rest/api/group/by-id](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-group/#api-wiki-rest-api-group-by-id-get)
operation in the Confluence REST API.

## Examples

To create a Confluence source connector, see the following examples.

For more information on working with source connectors using the Unstructured API, see [Source endpoints](/api-reference/api/source/source-apis).

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="confluence",
                  config={
                      "url": "<url>",
                      "max_num_of_spaces": <max-num-of-spaces>,
                      "max_num_of_docs_from_each_space": <max-num-of-docs-from-each-space>,
                      "spaces": ["<space-key>", "<space-key>"],
                      "extract_images": <True|False>,
                      "extract_files": <True|False>,

                      # For API token authentication:
                      # "username": "<username>",
                      # "token": "<api-token>",
                      # "cloud": <True|False>,

                      # For personal access token (PAT) authentication:
                      # "token": "<personal-access-token>",
                      # "cloud": False,

                      # For password authentication:
                      # "username": "<username>",
                      # "password": "<password>",
                      # "cloud": <True|False>
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "confluence",
      "config": {
          "url": "<url>",
          "max_num_of_spaces": <max-num-of-spaces>,
          "max_num_of_docs_from_each_space": <max-num-of-docs-from-each-space>,
          "spaces": ["<space-key>", "<space-key>"],
          "extract_images": "<true|false>",
          "extract_files": "<true|false>",

          # For API token authentication:

          "username": "<username>",
          "token": "<api-token>",
          "cloud": "<true|false>"

          # For personal access token (PAT) authentication:

          "token": "<personal-access-token>",
          "cloud": "false"

          # For password authentication:

          "username": "<username>",
          "password": "<password>",
          "cloud": "<true|false>"        
      }
  }'
  ```
</CodeGroup>

## Configuration settings

Replace the preceding placeholders as follows:

<ParamField body="name" type="string" required>
  A unique name for this connector.
</ParamField>

<ParamField body="url" type="string" required>
  The URL to the target Confluence Cloud instance.
</ParamField>

<ParamField body="max_num_of_spaces" type="integer" default="500">
  The maximum number of Confluence spaces to access within the Confluence Cloud instance.
</ParamField>

<ParamField body="max_num_of_docs_from_each_space" type="integer" default="150">
  The maximum number of documents to access within each space.
</ParamField>

<ParamField body="spaces" type="string">
  An array of space keys (not display names) specifying the spaces to access, for example: `["luke","paul"]`. If no space keys are specified and `max_num_of_spaces` is exceeded for the instance, you might get unexpected results.
</ParamField>

<ParamField body="extract_images" type="boolean" default="false">
  Set to `true` to download images and replace the HTML content with Base64-encoded images.
</ParamField>

<ParamField body="extract_files" type="boolean" default="false">
  Set to `true` to download any embedded files in pages.
</ParamField>

<ParamField body="username" type="string">
  For username and API token authentication, or username and password authentication: the name or email address of the target user.
</ParamField>

<ParamField body="token" type="string">
  For username and API token authentication: the user's API token value. For personal access token (PAT) authentication: the target user's PAT value (set `cloud` to `false`).
</ParamField>

<ParamField body="cloud" type="boolean" default="false">
  For username and API token authentication, or username and password authentication: set to `true` if you are using Confluence Cloud.
</ParamField>

<ParamField body="password" type="string">
  For username and password authentication: the user's password.
</ParamField>
