Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

Complete the requirements before you begin. You can also learn about Unstructured’s partitioning before you begin.
1

Create the on-demand job

Replace INPUT_DIR with the path to your local directory of files to process. The response includes the job ID.
Each on-demand job is limited to 10 files, and each file is limited to 10 MB in size.If you need to launch a series of on-demand jobs in rapid succession, you must wait at least one second between launch requests. Otherwise, you will receive a rate limit error.A maximum of 5 on-demand jobs can be running in your Unstructured account. If you launch a new on-demand job but 5 existing on-demand jobs are still running, the new on-demand job will remain in a scheduled state until one of the 5 existing on-demand jobs is done running.
Save and run this script:
#!/usr/bin/env bash

INPUT_DIR="/full/path/to/your/directory"

form_args=()
for filepath in "$INPUT_DIR"/*; do
    [ -f "$filepath" ] || continue
    filename=$(basename "$filepath")
    mimetype=$(file --mime-type -b "$filepath")
    form_args+=(--form "input_files=@${filepath};filename=${filename};type=${mimetype}")
done

response=$(curl --request POST --location \
  "$UNSTRUCTURED_API_URL/api/v1/jobs/" \
  --header "accept: application/json" \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --form 'request_data={"job_nodes":[{"name":"Partitioner","type":"partition","subtype":"vlm","settings":{"is_dynamic":true,"allow_fast":true}}]}' \
  "${form_args[@]}")

echo "Job ID: $(echo "$response" | jq -r '.id')"
echo "Input file IDs: $(echo "$response" | jq -c '.input_file_ids')"
This script requires jq to parse the JSON response.
2

Poll for job status

Replace JOB_ID with the job ID from the previous step. This script polls every 10 seconds and stops when the job completes.
Save and run this script:
#!/usr/bin/env bash

JOB_ID="<job-id>"

while true; do
    job=$(curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY")

    status=$(echo "$job" | jq -r '.status')
    echo "Job status: $status"

    if [ "$status" = "COMPLETED" ]; then
        echo "Job completed."
        echo "Output node file IDs: $(echo "$job" | jq -c '[.output_node_files[].file_id]')"
        break
    elif [ "$status" = "FAILED" ] || [ "$status" = "STOPPED" ]; then
        echo "Job did not complete successfully: $status"
        exit 1
    fi

    sleep 10
done
This script requires jq to parse the JSON response.
3

Download the job output

Replace JOB_ID, INPUT_FILE_IDS, OUTPUT_NODE_FILE_IDS, and OUTPUT_DIR with your values from the previous steps.
Save and run this script:
#!/usr/bin/env bash

JOB_ID="<job-id>"
INPUT_FILE_IDS=("<input-file-id>")         # From Step 1
OUTPUT_NODE_FILE_IDS=("<output-file-id>")  # From Step 2
OUTPUT_DIR="/full/path/to/your/output/directory"

mkdir -p "$OUTPUT_DIR"

all_file_ids=()
for file_id in "${INPUT_FILE_IDS[@]}" "${OUTPUT_NODE_FILE_IDS[@]}"; do
    printf '%s\n' "${all_file_ids[@]}" | grep -qxF "$file_id" || all_file_ids+=("$file_id")
done

for file_id in "${all_file_ids[@]}"; do
    curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID/download?file_id=$file_id" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
      --output "$OUTPUT_DIR/$file_id.json"
    echo "Saved: $OUTPUT_DIR/$file_id.json"
done

Complete end-to-end script

Replace INPUT_DIR and OUTPUT_DIR with your directory paths, then save and run this script.
This script requires jq to parse JSON responses.
#!/usr/bin/env bash

INPUT_DIR="/full/path/to/your/input/directory"
OUTPUT_DIR="/full/path/to/your/output/directory"

# Step 1: Create the on-demand job.
form_args=()
for filepath in "$INPUT_DIR"/*; do
    [ -f "$filepath" ] || continue
    filename=$(basename "$filepath")
    mimetype=$(file --mime-type -b "$filepath")
    form_args+=(--form "input_files=@${filepath};filename=${filename};type=${mimetype}")
done

response=$(curl --request POST --location \
  "$UNSTRUCTURED_API_URL/api/v1/jobs/" \
  --header "accept: application/json" \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --form 'request_data={"job_nodes":[{"name":"Partitioner","type":"partition","subtype":"vlm","settings":{"is_dynamic":true,"allow_fast":true}}]}' \
  "${form_args[@]}")

JOB_ID=$(echo "$response" | jq -r '.id')
input_file_ids=$(echo "$response" | jq -r '.input_file_ids[]')
echo "Job ID: $JOB_ID"

# Step 2: Poll until the job completes.
while true; do
    job=$(curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY")

    status=$(echo "$job" | jq -r '.status')
    echo "Job status: $status"

    if [ "$status" = "COMPLETED" ]; then
        echo "Job completed."
        break
    elif [ "$status" = "FAILED" ] || [ "$status" = "STOPPED" ]; then
        echo "Job did not complete successfully: $status"
        exit 1
    fi

    sleep 10
done

output_node_file_ids=$(echo "$job" | jq -r '.output_node_files[].file_id')

# Step 3: Download the job output.
mkdir -p "$OUTPUT_DIR"

all_file_ids=()
for file_id in $input_file_ids $output_node_file_ids; do
    printf '%s\n' "${all_file_ids[@]}" | grep -qxF "$file_id" || all_file_ids+=("$file_id")
done

for file_id in "${all_file_ids[@]}"; do
    curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID/download?file_id=$file_id" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
      --output "$OUTPUT_DIR/$file_id.json"
    echo "Saved: $OUTPUT_DIR/$file_id.json"
done

What’s next?