Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

Complete the requirements before you begin. You can also learn about Unstructured’s partitioning and structured data extraction before you begin.
1

Create the on-demand job

Replace EXTRACTION_PROMPT with your extraction prompt, and INPUT_DIR with the path to your local directory of files to process. The response includes the job ID.
Each on-demand job is limited to 10 files, and each file is limited to 10 MB in size.If you need to launch a series of on-demand jobs in rapid succession, you must wait at least one second between launch requests. Otherwise, you will receive a rate limit error.A maximum of 5 on-demand jobs can be running in your Unstructured account. If you launch a new on-demand job but 5 existing on-demand jobs are still running, the new on-demand job will remain in a scheduled state until one of the 5 existing on-demand jobs is done running.
Save and run this script:
#!/usr/bin/env bash

EXTRACTION_PROMPT="<your-extraction-prompt>"
INPUT_DIR="/full/path/to/your/directory"

form_args=()
for filepath in "$INPUT_DIR"/*; do
    [ -f "$filepath" ] || continue
    filename=$(basename "$filepath")
    mimetype=$(file --mime-type -b "$filepath")
    form_args+=(--form "input_files=@${filepath};filename=${filename};type=${mimetype}")
done

request_data=$(jq -n --arg prompt "$EXTRACTION_PROMPT" '{
  "job_nodes": [
    {"name":"Partitioner","type":"partition","subtype":"vlm","settings":{"is_dynamic":true,"allow_fast":true}},
    {"name":"Extractor","type":"structured_data_extractor","subtype":"llm","settings":{"schema_to_extract":{"extraction_guidance":$prompt},"output_mode":"extracted_data_only","provider":"openai","model":"gpt-5-mini"}}
  ]
}')

response=$(curl --request POST --location \
  "$UNSTRUCTURED_API_URL/api/v1/jobs/" \
  --header "accept: application/json" \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --form "request_data=$request_data" \
  "${form_args[@]}")

echo "Job ID: $(echo "$response" | jq -r '.id')"
echo "Input file IDs: $(echo "$response" | jq -c '.input_file_ids')"
This script requires jq to parse the JSON response.
2

Poll for job status

Replace JOB_ID with the job ID from the previous step. This script polls every 10 seconds and stops when the job completes.
Save and run this script:
#!/usr/bin/env bash

JOB_ID="<job-id>"

while true; do
    job=$(curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY")

    status=$(echo "$job" | jq -r '.status')
    echo "Job status: $status"

    if [ "$status" = "COMPLETED" ]; then
        echo "Job completed."
        echo "Output node file IDs: $(echo "$job" | jq -c '[.output_node_files[].file_id]')"
        break
    elif [ "$status" = "FAILED" ] || [ "$status" = "STOPPED" ]; then
        echo "Job did not complete successfully: $status"
        exit 1
    fi

    sleep 10
done
This script requires jq to parse the JSON response.
3

Download the job output

Replace JOB_ID, INPUT_FILE_IDS, OUTPUT_NODE_FILE_IDS, and OUTPUT_DIR with your values from the previous steps.
Save and run this script:
#!/usr/bin/env bash

JOB_ID="<job-id>"
INPUT_FILE_IDS=("<input-file-id>")         # From Step 1
OUTPUT_NODE_FILE_IDS=("<output-file-id>")  # From Step 2
OUTPUT_DIR="/full/path/to/your/output/directory"

mkdir -p "$OUTPUT_DIR"

all_file_ids=()
for file_id in "${INPUT_FILE_IDS[@]}" "${OUTPUT_NODE_FILE_IDS[@]}"; do
    printf '%s\n' "${all_file_ids[@]}" | grep -qxF "$file_id" || all_file_ids+=("$file_id")
done

for file_id in "${all_file_ids[@]}"; do
    curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID/download?file_id=$file_id" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
      --output "$OUTPUT_DIR/$file_id.json"
    echo "Saved: $OUTPUT_DIR/$file_id.json"
done

Complete end-to-end script

Replace EXTRACTION_PROMPT, INPUT_DIR, and OUTPUT_DIR with your values, then save and run this script.
This script requires jq to parse JSON responses.
#!/usr/bin/env bash

EXTRACTION_PROMPT="<your-extraction-prompt>"
INPUT_DIR="/full/path/to/your/input/directory"
OUTPUT_DIR="/full/path/to/your/output/directory"

# Step 1: Create the on-demand job.
form_args=()
for filepath in "$INPUT_DIR"/*; do
    [ -f "$filepath" ] || continue
    filename=$(basename "$filepath")
    mimetype=$(file --mime-type -b "$filepath")
    form_args+=(--form "input_files=@${filepath};filename=${filename};type=${mimetype}")
done

request_data=$(jq -n --arg prompt "$EXTRACTION_PROMPT" '{
  "job_nodes": [
    {"name":"Partitioner","type":"partition","subtype":"vlm","settings":{"is_dynamic":true,"allow_fast":true}},
    {"name":"Extractor","type":"structured_data_extractor","subtype":"llm","settings":{"schema_to_extract":{"extraction_guidance":$prompt},"output_mode":"extracted_data_only","provider":"openai","model":"gpt-5-mini"}}
  ]
}')

response=$(curl --request POST --location \
  "$UNSTRUCTURED_API_URL/api/v1/jobs/" \
  --header "accept: application/json" \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --form "request_data=$request_data" \
  "${form_args[@]}")

JOB_ID=$(echo "$response" | jq -r '.id')
input_file_ids=$(echo "$response" | jq -r '.input_file_ids[]')
echo "Job ID: $JOB_ID"

# Step 2: Poll until the job completes.
while true; do
    job=$(curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY")

    status=$(echo "$job" | jq -r '.status')
    echo "Job status: $status"

    if [ "$status" = "COMPLETED" ]; then
        echo "Job completed."
        break
    elif [ "$status" = "FAILED" ] || [ "$status" = "STOPPED" ]; then
        echo "Job did not complete successfully: $status"
        exit 1
    fi

    sleep 10
done

output_node_file_ids=$(echo "$job" | jq -r '.output_node_files[].file_id')

# Step 3: Download the job output.
mkdir -p "$OUTPUT_DIR"

all_file_ids=()
for file_id in $input_file_ids $output_node_file_ids; do
    printf '%s\n' "${all_file_ids[@]}" | grep -qxF "$file_id" || all_file_ids+=("$file_id")
done

for file_id in "${all_file_ids[@]}"; do
    curl --request GET --silent --location \
      "$UNSTRUCTURED_API_URL/api/v1/jobs/$JOB_ID/download?file_id=$file_id" \
      --header "accept: application/json" \
      --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
      --output "$OUTPUT_DIR/$file_id.json"
    echo "Saved: $OUTPUT_DIR/$file_id.json"
done

What’s next?