This page was recently updated. What do you think about it? Let us know!.

Connect Dropbox to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

  1. A Dropbox account.

  2. A Dropbox app for your Dropbox account. To create a Dropbox app, do the following:

    a) Sign in to the Dropbox Developers portal with the same credentials as your Dropbox account.
    b) Open your App Console.
    c) Click Create app.
    d) For Choose an API, select Scoped access.
    e) For Choose the type of access you need, select App folder.
    f) Enter a name for your app, and then click Create app.
    g) On the app’s Permissions tab, under Files and folders, check the boxes labelled files.content.read or files.content.write or both, depending on whether you want to read files, write files, or both. Then click Submit.
    h) On the app’s Settings tab, note the value of the App folder name field. This is the name of the app folder that Dropbox will create under the Apps top-level folder in your Dropbox account that the Dropbox app will use for access. If you change the value of App folder name field here, Dropbox will create an app folder with that name under the Apps top-level folder instead.
    i) Under OAuth 2, next to Generated access token, click Generate. Copy the value of this access token. You should only click Generate after you have completed all of the preceding steps first. This is because the access token is scoped to the specific app folder and settings at the time the access token is generated. If you change the app folder name or any of the permissions later, you should regenerate the access token.

    Access tokens are valid for only four hours after they are created. After this four-hour period, you can no longer use the expired access token. Dropbox does not allow the creation of access tokens that are valid for more than four hours.

    To replace an expired access token, you must first generate a refresh token for the corresponding access token. To learn how to generate an access token and its corresponding refresh token, see Replace an expired access token, later in this article.

    If you do not already have the corresponding refresh token for an existing access token, or if you lose a refresh token after you generate it, you must generate a new access token and its corresponding refresh token.

    Instead of continualy replacing expired access tokens yourself, you can have Unstructured do it for you as needed; just supply Unstructured with the refresh token along with the Dropbox app’s App key and App secret values. To learn how to supply these to Unstructured, look for mentions of “refresh token,” “app key,” and “app secret” in the connector settings later in this article.

  3. The app folder that your Dropbox app will use for access can be found in your Dropbox account under the Apps top-level folder. For example, if the value of the App folder name field above is my-folder, then the app folder that your Dropbox app will use for access can be found under https://dropbox.com/home/Apps/my-folder

    Your Dropbox app will not have access to upload or download files from the root of the app folder. Instead, you must create a subfolder inside of the app folder for your Dropbox app to upload or download files from. You will use the name of that subfolder when specifying your remote URL in the next step. For example, if your Dropbox app uses an app folder named my-folder for access within the Apps top-level folder, and you create a subfolder named data within the my-folder app folder, then the subfolder that your Dropbox app will upload and download files from can be found under https://dropbox.com/home/Apps/my-folder/data

  4. Note the remote URL to your subfolder inside of the app folder, which takes the format dropbox://<subfolder-name>. For example, if your Dropbox app uses an app folder named my-folder for access within the Apps top-level folder, and you create a subfolder named data within the my-folder app folder, then the remote URL is dropbox://data

Replace an expired access token

Dropbox app access tokens are valid for only four hours. After this time, you can no longer use the expired access token.

To have Unstructured automatically replace expired access tokens on your behalf, do the following:

  1. Get the app key and app secret values for your Dropbox app. To do this:

    a) Sign in to the Dropbox Developers portal with the same credentials as your Dropbox account.
    b) Open your App Console.
    c) Click your Dropbox app’s icon.
    d) On the Settings tab, next to App key, copy the value of the app key.
    e) Next to App secret, click Show, and then copy the value of the app secret.

  2. Use your web browser to browse to the following URL, replacing <app-key> with the app key for your Dropbox app:

    https://www.dropbox.com/oauth2/authorize?client_id=<app-key>&response_type=code&token_access_type=offline
    
  3. Click Continue.

  4. Click Allow.

  5. In the Access code generated tile, copy the access code that is shown.

  6. Use the curl utility in your Terminal or Command Prompt, or use a REST API client such as Postman, to make the following REST API call, replacing the following placeholders:

    • Replace <access-code> with the access code that you just copied.
    • Replace <app-key> with the app key for your Dropbox app.
    • Replace <app-secret> with the app secret for your Dropbox app.
    curl https://api.dropbox.com/oauth2/token \
    --data code=<access-code> \
    --data grant_type=authorization_code \
    --user <app-key>:<app-secret>
    
  7. In the response, copy the following two values:

    • The value of access_token (starting with the characters sl) is the new, valid access token.
    • The value of refresh_token is the refresh token that can be used to replace this access token much faster and easier next time. If you lose this refresh token, you must go back to Step 2.

    For the Unstructured UI, if you want Unstructured to use this refresh token to automatically replace the expired access token instead of replacing it yourself, then add the following values to your connector settings, and then stop here:

    • Add the refresh_token value to the connector settings Refresh token field.
    • Add the <app-key> value to the connector settings App key field.
    • Add the <app-secret> value to the connector settings App secret field.

    For the Unstructured API and Unstructured Ingest, if you want Unstructured to use this refresh token to automatically replace the expired access token instead of replacing it yourself, then add the following values to your connector settings, and then stop here:

    • Add the refresh_token value to the refresh_token parameter.
    • Add the <app-key> value to the app_key parameter.
    • Add the <app-secret> value to the connector settings app_secret parameter.
  8. If for some reason you need to manually replace the expired access token yourself instead of having Unstructured do it for you, you can use the refresh token that you just copied to get a new access token:

    • Replace <refresh-token> with the refresh token.
    • Replace <app-key> with the app key for your Dropbox app.
    • Replace <app-secret> with the app secret for your Dropbox app.
    curl https://api.dropbox.com/oauth2/token \
    --data refresh_token=<refresh-token> \
    --data grant_type=refresh_token \
    --data client_id=<app-key> \
    --data client_secret=<app-secret>
    
  9. In the response, copy the following two values:

    • The value of access_token (starting with the characters sl) is the new, valid access token. In the connector, replace the old, expired access token value with this new, valid access token value.

    • The value of refresh_token is the new, valid refresh token. To replace the expired access token yourself, go back to Step 8.

The Dropbox connector dependencies:

CLI, Python
pip install "unstructured-ingest[dropbox]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • DROPBOX_REMOTE_URL - The remote URL to the target subfolder inside of the app folder for the Dropbox app, represented by --remote-url (CLI) or remote_url (Python).
  • DROPBOX_ACCESS_TOKEN - The value of the access token for the Dropbox app that is associated with the target app folder, represented by --token (CLI) or token (Python). Provide this only if for some reason you do not want Unstructured to automatically refresh expired access tokens.

To have Unstructured automatically refresh expired Dropbox App access tokens on your behalf, do not provide an access token. Instead, provide the following environment variables:

  • DROPBOX_REFRESH_TOKEN - The value of the refresh token for the corresponding access token, represented by --refresh-token (CLI) or refresh_token (Python).
  • DROPBOX_APP_KEY - The app key for the Dropbox app, represented by --app-key (CLI) or app_key (Python).
  • DROPBOX_APP_SECRET - The app secret for the Dropbox app, represented by --app-secret (CLI) or app_secret (Python).

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

#!/usr/bin/env bash

unstructured-ingest \
  dropbox \
    --remote-url $DROPBOX_REMOTE_URL \
    --output-dir $LOCAL_FILE_OUTPUT_DIR \
    --refresh-token $DROPBOX_REFRESH_TOKEN \
    --app-key $DROPBOX_APP_KEY \
    --app-secret $DROPBOX_APP_SECRET \
    --num-processes 2 \
    --recursive \
    --verbose \
    --partition-by-api \
    --partition-endpoint $UNSTRUCTURED_API_URL \
    --api-key $UNSTRUCTURED_API_KEY \
    --strategy hi_res \
    --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the --partition-by-api option (CLI) or partition_by_api (Python) parameter to specify where files are processed:

  • To do local file processing, omit --partition-by-api (CLI) or partition_by_api (Python), or explicitly specify partition_by_api=False (Python).

    Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

    • --api-key $UNSTRUCTURED_API_KEY (CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY") (Python)
    • --partition-endpoint $UNSTRUCTURED_API_URL (CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL") (Python)
    • The environment variables UNSTRUCTURED_API_KEY and UNSTRUCTURED_API_URL
  • To send files to the Unstructured Partition Endpoint for processing, specify --partition-by-api (CLI) or partition_by_api=True (Python).

    Unstructured also requires an Unstructured API key and API URL, by adding the following:

    • --api-key $UNSTRUCTURED_API_KEY (CLI) or api_key=os.getenv("UNSTRUCTURED_API_KEY") (Python)
    • --partition-endpoint $UNSTRUCTURED_API_URL (CLI) or partition_endpoint=os.getenv("UNSTRUCTURED_API_URL") (Python)
    • The environment variables UNSTRUCTURED_API_KEY and UNSTRUCTURED_API_URL, representing your API key and API URL, respectively.

    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, for example, if you are using a self-hosted instance of the Unstructured API.

    The default API URL for Unstructured Ingest is https://api.unstructuredapp.io/general/v0/general, which is the API URL for the Unstructured Partition Endpoint.

    If you do not have an API key, get one now.

    If the Unstructured API is self-hosted, the process for generating Unstructured API keys, and the Unstructured API URL that you use, are different. For details, contact Unstructured Sales at sales@unstructured.io.