> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create workflow

> Create a new workflow, either custom or auto, and configure its settings.

<Note>
  This endpoint creates a workflow that persists until it is explicitly deleted (a *long-lived workflow*). To create a workflow that exists only for the duration of a single job run using local files as input, use the [create job endpoint](/api-reference/api/job/create-job) instead.
</Note>

## Body

<ParamField body="name" type="string" required>
  Workflow name.
</ParamField>

<ParamField body="workflow_type" type="string" required>
  Execution mode. `auto` uses sensible default workflow settings to enable you to get good-quality results faster. `custom` enables you to fine-tune the workflow settings to get very specific results.
</ParamField>

<Note>
  The workflow types `advanced`, `basic`, and `platinum` are non-operational and will be removed in a future release.
</Note>

<Note>
  Workflows with no `source_id` use a local file source. Local-source workflows must set `workflow_type` to `custom`, cannot be set to run on a repeating schedule, and cannot be run from Unstructured Pipelines (though they can be run via the API or Python SDK).
</Note>

<ParamField body="source_id" type="string">
  ID of the source connector.
</ParamField>

<ParamField body="destination_id" type="string">
  ID of the destination connector.
</ParamField>

<ParamField body="workflow_nodes" type="array">
  Processing pipeline stages. Each node requires `id` (string, UUID) and `node_type` (string), and supports optional `node_subtype` (string), `config` (object), and `params` (object).

  For more information on workflow nodes, see [Workflow nodes](/api-reference/workflow/nodes/overview).
</ParamField>

<ParamField body="template_id" type="string">
  ID of a pre-built workflow template to use as the basis for the workflow.
</ParamField>

<ParamField body="schedule" type="string">
  Repeating run schedule. Valid values and their cron equivalents:

  | Value              | cron           | Description                                          |
  | ------------------ | -------------- | ---------------------------------------------------- |
  | `every 15 minutes` | `*/15 * * * *` | Every 15 minutes.                                    |
  | `every hour`       | `0 * * * *`    | At the first minute of every hour.                   |
  | `every 2 hours`    | `0 */2 * * *`  | At the first minute of every second hour.            |
  | `every 4 hours`    | `0 */4 * * *`  | At the first minute of every fourth hour.            |
  | `every 6 hours`    | `0 */6 * * *`  | At the first minute of every sixth hour.             |
  | `every 8 hours`    | `0 */8 * * *`  | At the first minute of every eighth hour.            |
  | `every 10 hours`   | `0 */10 * * *` | At the first minute of every tenth hour.             |
  | `every 12 hours`   | `0 */12 * * *` | At the first minute of every twelfth hour.           |
  | `daily`            | `0 0 * * *`    | At the first minute of every day.                    |
  | `weekly`           | `0 0 * * 0`    | At the first minute of every Sunday.                 |
  | `monthly`          | `0 0 1 * *`    | At the first minute of the first day of every month. |

  If omitted, the workflow does not automatically run on a repeating schedule.

  Workflows with a local source cannot be set to run on a repeating schedule.
</ParamField>

<ParamField body="reprocess_all" type="boolean">
  Default: `false`. If `true`, reprocesses all documents in the source location on every run. If `false`, the workflow excludes from future processing any files Unstructured determines are unchanged since the last time the workflow ran.

  Unstructured determines if a document has changed based on the document version. For each workflow, Unstructured maintains a record of documents (and their versions, if present) processed by that workflow. Each document record consists of:

  * A `record_id` derived from the document name and path.
  * A `record_version` derived from either the document Etag (if the source provider generates one) or the source provider's native version identifier.

  When you set `reprocess_all` to `false` for a source connector that supports `reprocess_all`, Unstructured uses this list of records to determine whether or not to process each document:

  * If the `record_id` does not exist in the workflow records, Unstructured processes the document.
  * If the `record_id` exists, but the `record_version` has changed, or there is no `record_version`, Unstructured processes the document.

  The following table lists out the possible `record-id` and `record_version` combinations, and the action Unstructured takes in each case:

  | `record_id` | `record_version` | Action              |
  | ----------- | ---------------- | ------------------- |
  | Exists      | Unchanged        | Do not process file |
  | Exists      | Changed          | Process file        |
  | Exists      | (none)           | Process file        |
  | New         | (Does not apply) | Process file        |

  <Note>
    Renaming a document results in a new `record_id`; Unstructured will then reprocess the renamed document when the workflow runs.
  </Note>

  The following table lists the source connectors that support the `reprocess_all` setting. The **Record version base** column specifies the versioning information Unstructured uses to generated the corresponding record version for each processed document.

  Source connectors that do not support `reprocess_all` reprocess every document in the source location each time the workflow runs.

  | Connector                                                                | `record_version` base |
  | ------------------------------------------------------------------------ | --------------------- |
  | [Amazon S3](/api-reference/workflow/sources/s3)                          | ETag                  |
  | [Azure Blob Storage](/api-reference/workflow/sources/azure-blob-storage) | ETag                  |
  | [Box](/api-reference/workflow/sources/box)                               | Provider version ID   |
  | [Dropbox](/api-reference/workflow/sources/dropbox)                       | Provider version ID   |
  | [Elastisearch](/api-reference/workflow/sources/elasticsearch)            | Provider version ID   |
  | [Google Cloud Storage](/api-reference/workflow/sources/google-cloud)     | ETag                  |
  | [Google Drive](/api-reference/workflow/sources/google-drive)             | Provider version ID   |
  | [Microsoft OneDrive](/api-reference/workflow/sources/onedrive)           | Provider version ID   |
  | [Microsoft SharePoint](/api-reference/workflow/sources/sharepoint)       | Provider version ID   |

  Additional considerations to take into account when setting `reprocess_all` to `false`:

  * Unstructured only adds document records for documents that it successfully processes. Documents that failed to process will be reprocessed the next time the workflow is run.
  * Because S3 ETags are content-based, changing the metadata on an S3 object will not result in it being reprocessed.
  * For source providers that support the S3 protocol, be aware that deleting an object and then reuploading it to the source location will maintain the same `record_id`, but may result in a different `record_version` being generated. This is especially true of multipart uploads. This results in Unstructured reprocessing the document.
  * For source providers that offer Key Management Services (KMS), be aware that server-side encryption can change document ETags. This results in the the `record_version` of a document changing, and Unstructured reprocessing the document.
  * If you clone or recreate a source connector, the resulting connector does not include the document processing history of the previous connector.
  * Changing a workflow's configuration does not automatically result in Unstructured reprocessing all documents. For example, changing chunker, embedder, enrichment, or partitioner settings may not result in reprocessing all document. To reprocess all documents using new workflow settings, set `reprocess_all` to `true` for at least the next workflow run.
</ParamField>

## Response

<ResponseField name="id" type="string" required>
  Unique identifier for the workflow.
</ResponseField>

<ResponseField name="name" type="string" required>
  Workflow name.
</ResponseField>

<ResponseField name="workflow_type" type="string" required>
  Workflow type: `custom` or `auto`.
</ResponseField>

<ResponseField name="status" type="string" required>
  Workflow state: `active`, `inactive`, or `paused`.
</ResponseField>

<ResponseField name="created_at" type="string" required>
  ISO 8601 timestamp when the workflow was created.
</ResponseField>

<ResponseField name="source_id" type="string">
  Source connector ID.
</ResponseField>

<ResponseField name="destination_id" type="string">
  Destination connector ID.
</ResponseField>

<ResponseField name="schedule" type="string">
  Repeating run schedule.
</ResponseField>

<ResponseField name="dag_nodes" type="array">
  Workflow processing pipeline nodes.
</ResponseField>

<ResponseField name="updated_at" type="string">
  ISO 8601 timestamp when the workflow was last updated.
</ResponseField>

<RequestExample>
  ```bash cURL theme={null}
  curl --request POST \
    --url "${UNSTRUCTURED_API_URL}/api/v1/workflows/" \
    --header "unstructured-api-key: ${UNSTRUCTURED_API_KEY}" \
    --header "Content-Type: application/json" \
    --data '{
      "name": "my-workflow",
      "workflow_type": "auto",
      "source_id": "7f3e2a1b-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
      "destination_id": "1a2b3c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d",
      "schedule": "daily"
    }'
  ```

  ```python Python SDK theme={null}
  import os
  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateWorkflowRequest
  from unstructured_client.models.shared import CreateWorkflow, WorkflowType

  client = UnstructuredClient(
      api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
      server_url=os.getenv("UNSTRUCTURED_API_URL"),
  )

  response = client.workflows.create_workflow(
      request=CreateWorkflowRequest(
          create_workflow=CreateWorkflow(
              name="my-workflow",
              workflow_type=WorkflowType.AUTO,
              source_id="7f3e2a1b-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
              destination_id="1a2b3c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d",
              schedule="daily",
          )
      )
  )
  print(response)
  ```

  ```python Python SDK (async) theme={null}
  import asyncio
  import os
  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateWorkflowRequest
  from unstructured_client.models.shared import CreateWorkflow, WorkflowType

  async def create_workflow():
      client = UnstructuredClient(
          api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
          server_url=os.getenv("UNSTRUCTURED_API_URL"),
      )
      response = await client.workflows.create_workflow_async(
          request=CreateWorkflowRequest(
              create_workflow=CreateWorkflow(
                  name="my-workflow",
                  workflow_type=WorkflowType.AUTO,
                  source_id="7f3e2a1b-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
                  destination_id="1a2b3c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d",
                  schedule="daily",
              )
          )
      )
      print(response)

  asyncio.run(create_workflow())
  ```
</RequestExample>

<ResponseExample>
  ```json Response theme={null}
  {
    "id": "9b8c7d6e-5f4a-3b2c-1d0e-9f8a7b6c5d4e",
    "name": "my-workflow",
    "workflow_type": "auto",
    "status": "active",
    "source_id": "7f3e2a1b-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
    "destination_id": "1a2b3c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d",
    "schedule": "daily",
    "dag_nodes": null,
    "created_at": "2026-04-29T10:00:00Z",
    "updated_at": null
  }
  ```
</ResponseExample>
