Workflows
To use the Unstructured Platform Workflow Endpoint to manage workflows, do the following:
- To get a list of available workflows, use the
UnstructuredClient
object’sworkflows.list_workflows
function (for the Python SDK) or theGET
method to call the/workflows
endpoint (forcurl
or Postman). Learn more. - To get information about a workflow, use the
UnstructuredClient
object’sworkflows.get_workflow
function (for the Python SDK) or theGET
method to call the/workflows/<workflow-id>
endpoint (forcurl
or Postman)use theGET
method to call the/workflows/<workflow-id>
endpoint. Learn more. - To create a workflow, use the
UnstructuredClient
object’sworkflows.create_workflow
function (for the Python SDK) or thePOST
method to call the/workflows
endpoint (forcurl
or Postman). Learn more. - To run a workflow manually, use the
UnstructuredClient
object’sworkflows.run_workflow
function (for the Python SDK) or thePOST
method to call the/workflows/<workflow-id>/run
endpoint (forcurl
or Postman). Learn more. - To update a workflow, use the
UnstructuredClient
object’sworkflows.update_workflow
function (for the Python SDK) or thePUT
method to call the/workflows/<workflow-id>
endpoint (forcurl
or Postman). Learn more. - To delete a workflow, use the
UnstructuredClient
object’sworkflows.delete_workflow
function (for the Python SDK) or theDELETE
method to call the/workflows/<workflow-id>
endpoint (forcurl
or Postman). Learn more.
The following examples assume that you have already met the requirements and understand the basics of working with the Unstructured Platform Workflow Endpoint.
Create a workflow
To create a workflow, use the UnstructuredClient
object’s workflows.create_workflow
function (for the Python SDK) or
the POST
method to call the /workflows
endpoint (for curl
or Postman).
In the CreateWorkflow
object (for the Python SDK) or
the request body (for curl
or Postman),
specify the settings for the workflow, as follows:
Replace the preceding placeholders as follows:
-
<name>
(required) - A unique name for this workflow. -
<source-connector-id>
(required) - The ID of the target source connector. To get the ID, use theUnstructuredClient
object’ssources.list_sources
function (for the Python SDK) or theGET
method to call the/sources
endpoint (forcurl
or Postman). Learn more. -
<destination-connector-id>
(required) - The ID of the target destination connector. To get the ID, use theUnstructuredClient
object’sdestinations.list_destinations
function (for the Python SDK) or theGET
method to call the/destinations
endpoint (forcurl
or Postman). Learn more. -
<TYPE>
(for the Python SDK) or<type>
(forcurl
or Postman) (required) - The workflow optimization type. Learn more. Available values includeADVANCED
,BASIC
,PLATINUM
, andCUSTOM
(for the Python SDK) oradvanced
,basic
,platinum
, andcustom
(forcurl
or Postman). If<TYPE>
is set toCUSTOM
(for the Python SDK), or if<type>
is set tocustom
(forcurl
or Postman), you must add aworfklow_nodes
array. For instructions, see Custom workflow DAG nodes. -
<schedule-timeframe>
- The repeating automatic run schedule, specified as a predefined phrase. The available predefined phrases are:every 15 minutes
(forcurl
or Postman): Every 15 minutes (cron expression:*/15 * * * *
).every hour
: At the first minute of every hour (cron expression:0 * * * *
).every 2 hours
: At the first minute of every second hour (cron expression:0 */2 * * *
).every 4 hours
: At the first minute of every fourth hour (cron expression:0 */4 * * *
).every 6 hours
: At the first minute of every sixth hour (cron expression:0 */6 * * *
).every 8 hours
: At the first minute of every eighth hour (cron expression:0 */8 * * *
).every 10 hours
: At the first minute of every tenth hour (cron expression:0 */10 * * *
).every 12 hours
: At the first minute of every twelfth hour (cron expression:0 */12 * * *
).daily
: At the first minute of every day (cron expression:0 0 * * *
).weekly
: At the first minute of every Sunday (cron expression:0 0 * * 0
).monthly
: At the first minute of the first day of every month (cron expression:0 0 1 * *
).
If
schedule
is not specified, the workflow does not automatically run on a repeating schedule.
Update a workflow
To update information about a workflow, use the UnstructuredClient
object’s workflows.update_workflow
function (for the Python SDK) or
the PUT
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman), replacing
<workflow-id>
with the workflow’s unique ID. To get this ID, see List workflows.
In the request body, specify the settings for the workflow. For the specific settings to include, see Create a workflow.
Custom workflow DAG nodes
If WorkflowType
is set to CUSTOM
(for the Python SDK), or if workflow_type
is set to custom
(for curl
or Postman), you must also specify the settings for the workflow’s
directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes
array.
- A Source node is automatically created when you specify the
source_id
value outside of the
workflow_nodes
array. - A Destination node is automatically created when you specify the
destination_id
value outside of theworkflow_nodes
array. - You can specify Partitioner, Chunker, Enrichment, and Embedder nodes.
- The order of the nodes in the
workflow_nodes
array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array. - Be sure to specify nodes in the allowed order. The following DAG placements are all allowed:
Partitioner node
A Partitioner node has a type
of partition
and a subtype
of unstructured_api
. The strategy
setting
determines the partition strategy to use.
Fast strategy
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "partition"
istype=WorkflowNodeType.PARTITION
.true
andfalse
areTrue
andFalse
.
High Res strategy
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "partition"
istype=WorkflowNodeType.PARTITION
.true
andfalse
areTrue
andFalse
.
VLM strategy
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "partition"
istype=WorkflowNodeType.PARTITION
.true
andfalse
areTrue
andFalse
.null
isNone
.
Allowed values for provider
and model
include:
-
"provider": "anthropic"
"model": "claude-3-5-sonnet-20241022"
-
"provider": "anthropic""openai"
"model": "gpt-4o"
-
"provider": "bedrock"
"model": "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
"model": "us.anthropic.claude-3-opus-20240229-v1:0"
"model": "us.anthropic.claude-3-haiku-20240307-v1:0"
"model": "us.anthropic.claude-3-sonnet-20240229-v1:0"
"model": "us.amazon.nova-pro-v1:0"
"model": "us.amazon.nova-lite-v1:0"
"model": "us.meta.llama3-2-90b-instruct-v1:0"
"model": "us.meta.llama3-2-11b-instruct-v1:0"
Chunker node
Chunk by Character strategy
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "chunk"
istype=WorkflowNodeType.CHUNK
.true
andfalse
areTrue
andFalse
.null
isNone
.
Chunk by Title strategy
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "chunk"
istype=WorkflowNodeType.CHUNK
.true
andfalse
areTrue
andFalse
.null
isNone
.
Chunk by Page strategy
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "chunk"
istype=WorkflowNodeType.CHUNK
.true
andfalse
areTrue
andFalse
.null
isNone
.
Chunk by Similarity strategy
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "chunk"
istype=WorkflowNodeType.CHUNK
.true
andfalse
areTrue
andFalse
.null
isNone
.
Enrichment node
Image Description task
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "prompter"
istype=WorkflowNodeType.PROMPTER
.
Allowed values for <subtype>
include:
openai_image_description
anthropic_image_description
bedrock_image_description
Table Description task
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "prompter"
istype=WorkflowNodeType.PROMPTER
.
Allowed values for <subtype>
include:
openai_table_description
anthropic_table_description
bedrock_table_description
Table to HTML task
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "prompter"
istype=WorkflowNodeType.PROMPTER
.
Named Entity Recognition (NER) task
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "prompter"
istype=WorkflowNodeType.PROMPTER
.
Embedder node
The following example is for curl
and Postman. For the Python SDK:
- The
name
,type
,subtype
, andsettings
parameters must be followed by an equals character (=
) instead of a colon character (:
). "type": "embed"
istype=WorkflowNodeType.EMBED
.
Allowed values for subtype
and model_name
include:
-
"subtype": "azure_openai"
"model_name": "text-embedding-3-small"
"model_name": "text-embedding-3-large"
"model_name": "text-embedding-ada-002"
-
"subtype": "bedrock"
"model_name": "amazon.titan-embed-text-v2:0"
"model_name": "amazon.titan-embed-text-v1"
"model_name": "amazon.titan-embed-image-v1"
"model_name": "cohere.embed-english-v3"
"model_name": "cohere.embed-multilingual-v3"
-
"subtype": "togetherai"
"model_name": "togethercomputer/m2-bert-80M-2k-retrieval"
"model_name": "togethercomputer/m2-bert-80M-8k-retrieval"
"model_name": "togethercomputer/m2-bert-80M-32k-retrieval"
Was this page helpful?