> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured API Quickstart - On-Demand Jobs

<Tip>
  This quickstart requires you to install the Unstructured Python SDK on your local machine. If you cannot or do not want to install
  or run anything on your local machine, you can skip over to the notebook [Unstructured API On-Demand Jobs Quickstart](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_On_Demand_Jobs_Quickstart.ipynb) instead.
  The notebook is remotely hosted on Google Colab and requires no local machine setup steps.
</Tip>

This quickstart shows how to use the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk)
to have Unstructured process local files by using the Unstructured API's *on-demand jobs* functionality. This functionality
is part of the Unstructured API's collection of [workflow operations](https://docs.unstructured.io/api-reference/workflow/overview).

On-demand jobs take one or more local files as input. Unstructured outputs the local files' contents as a series of Unstructured
[document elements and metadata](/ui/document-elements). This format is ideal for retrieval-augmented generation (RAG),
agentic AI, and model fine-tuning.

<Note>
  The on-demand jobs functionality is designed to work *only by processing local files*.

  To process files (and data) in remote file and blob storage, databases, and vector stores, you must use other
  workflow operations in the Unstructured API. To learn how, see for example the notebook
  [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb).
</Note>

## Requirements

To run this quickstart, you will need:

* An Unstructured account. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
  After you sign up, you are immediately signed in to your new **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

  <Note>
    If you already have an Unstructured **Pay-As-You-Go** or **Business SaaS** account, you are already signed up for Unstructured.
    Sign in to your existing Unstructured **Pay-As-You-Go** or **Business SaaS** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

    If you already have an Unstructured **dedicated instance** or **in-VPC** deployment, your sign-in link will be unique to your deployment.
    If you're not sure what your unique sign-in link is, see your Unstructured account administrator, or email Unstructured Support at
    [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

* An Unstructured API key, as follows:

  1. After you are signed in to your account, on the sidebar click **API Keys**.

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

  2. Click **Generate New Key**.

  3. Enter some meaningful display name for the key, and then click **Continue**.

  4. Next to the new key's name, click the **Copy** icon. The key's value is copied to your system's clipboard.
     If you lose this key, simply return to the list and click **Copy** again.

* Python 3.9 or higher installed on your local machine.

* A Python virtual environment is recommended for isolating and versioning Python project code dependencies on your local machine,
  but this is not required. This quickstart uses the popular Python package and project manager [uv](https://docs.astral.sh/uv/)
  for managing virtual environments. Installation and use of `uv` are described in the following steps.

* One or more local files for Unstructured to process. The following code example assumes that the local files
  you want to process are in a folder named `input`, which is in the same project directory as your Python code. Creation of this
  folder is described in the following steps. The files' types must be
  in the list of [supported file types](/api-reference/supported-file-types).

  <Note>
    Each on-demand job is limited to 10 files, and each file is limited to 10 MB in size.

    If you need to launch a series of on-demand jobs in rapid succession, you must wait at least one second between launch
    requests. Otherwise, you will receive a rate limit error.

    A maximum of 5 on-demand jobs can be running in your Unstructured account. If you launch a new on-demand job
    but 5 existing on-demand jobs are still running, the new on-demand job will remain in a scheduled state until one of the 5
    existing on-demand jobs is done running.
  </Note>

* A destination folder for Unstructured to send its processed results to. The following code example assumes that the
  destination folder is named `output` and is in the same project directory as your Python code. Creation of this
  folder is described in the following steps.

## Step 1: Create a Python virtual environment

In this step, you use `uv` to create a new Python project, and a virtual environment within this project, for this quickstart.

<Note>
  If you do not want to use `uv` or create a Python project, you can do the following instead:

  1. Install the Unstructured Python SDK globally on your local machine by running the following command:

     ```bash  theme={null}
     pip install unstructured-client
     ```

  2. Skip ahead to [Step 2](#step-2%3A-add-the-example-code).
</Note>

1. [Install uv](https://docs.astral.sh/uv/getting-started/installation) on your local machine, if it is not already installed.

2. Create a new, blank folder on your local machine for this quickstart, and then switch to this new folder.
   This example creates a folder named `unstructured_api_quickstart` within your current working directory
   and then switches to this new folder:

   ```bash  theme={null}
   mkdir unstructured_api_quickstart
   cd unstructured_api_quickstart
   ```

3. Create a new `uv` project for this quickstart by running the following command from within the new folder:

   ```bash  theme={null}
   uv init
   ```

4. Create a new virtual environment within this project by running the following command:

   ```bash  theme={null}
   uv venv
   ```

5. Activate the virtual environment by running the following command:

   ```bash  theme={null}
   source .venv/bin/activate
   ```

6. Install the Unstructured Python SDK into the virtual environment, by running the following commands:

   ```bash  theme={null}
   uv add unstructured-client
   ```

## Step 2: Add the example code

1. In the same directory as the project's `main.py` code file, add two folders named `input` and `output`.
   Your project directory should now look like this:

   ```text  theme={null}
   unstructured_api_quickstart/
     ├── .venv/
     ├── input/  <- Upload your input files here.
     ├── output/ <- Unstructured will download its output files here.
     ├── .gitignore
     ├── .python-version
     ├── main.py <- Your Python code will go here.
     ├── pyproject.toml
     ├── README.md
     └── uv.lock
   ```

   <Note>
     If you do not want to use `uv` or create a Python project, just create a blank file named `main.py` and two folders
     named `input` and `output` that are all in the same local directory instead, for example:

     ```text  theme={null}
     <parent-directory>/
       ├── input/  <- Upload your input files here.
       ├── output/ <- Unstructured will download its output files here.
       └── main.py <- Your Python code will go here.
     ```
   </Note>

2. Upload the files you want Unstructured to process into the new `input` folder.

3. Overwrite the contents of the `main.py` file with the following code. In this code, replace `<your-unstructured-api-key>`
   (in the `main()` function, near the end of the following code) with the value of your Unstructured API key. Then save this file.

```python  theme={null}
from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateJobRequest
from unstructured_client.models.operations import DownloadJobOutputRequest
from unstructured_client.models.shared import BodyCreateJob, InputFiles, JobInformation
import json, os, time
from typing import Optional


def run_on_demand_job(
        client: UnstructuredClient,
        input_dir: str,
        job_template_id: Optional[str] = None, 
        job_nodes: Optional[list[dict[str, object]]] = None
) -> tuple[str, list[dict[str, str]]]:
    """Runs an Unstructured on-demand job.

    Arguments:
    - client {UnstructuredClient}: The initialized Unstructured API client to use.
    - input_dir {str}: The directory that contains the input files.
    - job_template_id: {Optional[str]}: If this job is to use a workflow template, the ID of the workflow template to use.
    - job_nodes {Optional[list[dict[str, object]]]}: If this job is to use a custom workflow definition, the list of custom workflow nodes to use.

    Raises:
    - ValueError: If neither the job template ID nor job nodes (and not both) are specified.
        
    Returns:
    - job_id {str}: The ID of the on-demand job.
    - job_input_file_ids {list[str]}: The input file IDs of the on-demand job.
    - job_output_node_files {list[dict[str, str]]}: The output node files of the on-demand job.
    """
    request_data = {}
    files = []

    for filename in os.listdir(input_dir):
        full_path = os.path.join(input_dir, filename)

        # Skip non-files (for example, directories).
        if not os.path.isfile(full_path):
            continue

        files.append(
            (
                InputFiles(
                    content=open(full_path, "rb"),
                    file_name=filename,
                    content_type="application/pdf"
                )
            )
        )

    if job_template_id is not None:
        request_data = json.dumps({"template_id": job_template_id})
    elif job_nodes is not None:
        request_data = json.dumps({"job_nodes": job_nodes})
    else:
        raise ValueError(f"Must specify a job template ID or job nodes (but not both).")
        exit(1)

    response = client.jobs.create_job(
        request=CreateJobRequest(
            body_create_job=BodyCreateJob(
                request_data=request_data,
                input_files=files
            )
        )
    )

    job_id = response.job_information.id
    job_input_file_ids = response.job_information.input_file_ids
    job_output_node_files = response.job_information.output_node_files

    return job_id, job_input_file_ids, job_output_node_files


def poll_for_job_status(client: UnstructuredClient, job_id: str) -> JobInformation:
    """Keeps checking a job's status until the job is completed.

    Arguments:
    - client {UnstructuredClient}: The initialized Unstructured API client to use.
    - job_id {str}: The job ID to check the status of.

    Returns:
    - job {JobInformation}: Information about the Unstructured job.
    """
    while True:
        response = client.jobs.get_job(
            request={
                "job_id": job_id
            }
        )

        job = response.job_information

        if job.status == "SCHEDULED":
            print("Job is scheduled, polling again in 10 seconds...")
            time.sleep(10)
        elif job.status == "IN_PROGRESS":
            print("Job is in progress, polling again in 10 seconds...")
            time.sleep(10)
        else:
            print("Job is completed.")
            break

    return job


def download_job_output(
        client: UnstructuredClient,
        job_id: str,
        job_input_file_ids: list[str],
        output_dir: str
) -> None:
    """Downloads the output of an Unstructured job.

    Arguments:
    - client {UnstructuredClient}: The initialized Unstructured API client to use.
    - job_id {str}: The job ID to download the output from.
    - job_input_file_ids {list[str]}: The input file IDs of the job.
    - output_dir {str}: The directory to download the output into.
    """
    for job_input_file_id in job_input_file_ids:
        print(f"Attempting to get processed results from file_id '{job_input_file_id}'...")

        response = client.jobs.download_job_output(
            request=DownloadJobOutputRequest(
                job_id=job_id,
                file_id=job_input_file_id
            )
        )

        output_path = os.path.join(output_dir, f"{job_input_file_id}.json")

        with open(output_path, "w") as f:
            json.dump(response.any, f, indent=4)

        print(f"Saved output for file_id '{job_input_file_id}' to '{output_path}'.\n")


def main():
    # API key and source/destination folder paths.
    UNSTRUCTURED_API_KEY = "<your-unstructured-api-key>"
    INPUT_FOLDER_PATH = "./input"
    OUTPUT_FOLDER_PATH = "./output"

    # On-demand job settings.
    job_template_id = "hi_res_and_enrichment"
    job_nodes = [] # Applies only if the job is to use a custom workflow definition.

    # Internal tracking variables.
    job_id = ""
    job_input_file_ids = []
    job_output_node_files = []

    with UnstructuredClient(api_key_auth=UNSTRUCTURED_API_KEY) as client:
        print("-" * 80)
        print(f"Attempting to run the on-demand job, ingesting the input files from '{INPUT_FOLDER_PATH}'...")
        job_id, job_input_file_ids, job_output_node_files = run_on_demand_job(
            client = client,
            input_dir = INPUT_FOLDER_PATH,
            job_template_id = job_template_id
        )

        print(f"Job ID: {job_id}\n")
        print("Input file details:\n")

        for job_input_file_id in job_input_file_ids:
            print(job_input_file_id)

        print("\nOutput node file details:\n")

        for output_node_file in job_output_node_files:
            print(output_node_file)

        print("-" * 80)
        print("Polling for job status...")

        job = poll_for_job_status(client, job_id)
        
        print(f"Job details:\n---\n{job.model_dump_json(indent=4)}")
    
        if job.status != "COMPLETED":
            print("Job did not complete successfully. Stopping this script without downloading any output.")
            exit(1)

        print("-" * 80)
        print("Attempting to download the job output...")
        download_job_output(client, job_id, job_input_file_ids, OUTPUT_FOLDER_PATH)
        
        print("-" * 80)
        print(f"Script completed. Check the output folder '{OUTPUT_FOLDER_PATH}' for the results.")
        exit(0)


if __name__ == "__main__":
    main()
```

## Step 3: Run the code and view the results

<Note>
  Each on-demand job is limited to 10 files, and each file is limited to 10 MB in size.

  If you need to launch a series of on-demand jobs in rapid succession, you must wait at least one second between launch
  requests. Otherwise, you will receive a rate limit error.

  A maximum of 5 on-demand jobs can be running in your Unstructured account. If you launch a new on-demand job
  but 5 existing on-demand jobs are still running, the new on-demand job will remain in a scheduled state until one of the 5
  existing on-demand jobs is done running.
</Note>

1. Run the code in the `main.py` file, by running the following command:

   ```bash  theme={null}
   uv run main.py
   ```

   <Note>
     If you do not want to use `uv` or create a Python project, you can run the code by running the following command instead:

     ```bash  theme={null}
     python main.py
     ```
   </Note>

2. After the code finishes running, look in the `output` folder to see Unstructured's results.

## Next steps

This quickstart showed you how to use the Unstructured Python SDK to process local files by using Unstructured's *on-demand jobs* functionality.

To learn more about how to use Unstructured to process local files, see the example notebook
[Unstructured API On-Demand Jobs Walkthrough](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_On_Demand_Jobs_Walkthrough.ipynb), running remotely in Google Colab. This notebook requires no local setup.

To learn how to process files (and data) in remote file and blob storage, databases, and vector stores, see for example the notebook
[Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
also running remotely in Google Colab. This notebook requires no local setup.
