UnstructuredClient
object’s workflows.list_workflows
function (for the Python SDK) or
the GET
method to call the /workflows
endpoint (for curl
or Postman). Learn more.UnstructuredClient
object’s workflows.get_workflow
function (for the Python SDK) or
the GET
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman)use the GET
method to call the /workflows/<workflow-id>
endpoint. Learn more.UnstructuredClient
object’s workflows.create_workflow
function (for the Python SDK) or
the POST
method to call the /workflows
endpoint (for curl
or Postman). Learn more.UnstructuredClient
object’s workflows.run_workflow
function (for the Python SDK) or
the POST
method to call the /workflows/<workflow-id>/run
endpoint (for curl
or Postman). Learn more.UnstructuredClient
object’s workflows.update_workflow
function (for the Python SDK) or
the PUT
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman). Learn more.UnstructuredClient
object’s workflows.delete_workflow
function (for the Python SDK) or
the DELETE
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman). Learn more.UnstructuredClient
object’s workflows.create_workflow
function (for the Python SDK) or
the POST
method to call the /workflows
endpoint (for curl
or Postman).
In the CreateWorkflow
object (for the Python SDK) or
the request body (for curl
or Postman),
specify the settings for the workflow, as follows:
Python SDK (remote source and remote destination)
Python SDK (local source and local destination)
source_id
or destination_id
value. Also, the workflow_type
must be set to CUSTOM
.curl
or Postman. Learn how.Python SDK (local source and remote destination)
destination_id
value, and do not specify a source_id
value. Also, the workflow_type
must be set to CUSTOM
.curl
or Postman. Learn how.Python SDK (async) (remote source and remote destination)
Python SDK (async) (local source and local destination)
source_id
or destination_id
value. Also, the workflow_type
must be set to CUSTOM
.curl
or Postman. Learn how.Python SDK (async) (local source and remote destination)
destination_id
value, and do not specify a source_id
value. Also, the workflow_type
must be set to CUSTOM
.curl
or Postman. Learn how.curl (remote source and remote destination)
curl (local source and local destination)
source_id
or destination_id
value. Also, the workflow_type
must be set to custom
.curl
(or Postman). Learn how.curl (local source and remote destination)
destination_id
value, and do not specify a source_id
value. Also, the workflow_type
must be set to custom
.curl
(or Postman). Learn how.Postman (remote source and remote destination)
unstructured-api-key
, Value: {{UNSTRUCTURED_API_KEY}}
accept
, Value: application/json
Postman (local source and local destination)
source_id
or destination_id
value. Also, the workflow_type
must be set to custom
.curl
). Learn how.unstructured-api-key
, Value: {{UNSTRUCTURED_API_KEY}}
accept
, Value: application/json
Postman (local source and remote destination)
destination_id
value, and do not specify a source_id
value. Also, the workflow_type
must be set to custom
.curl
). Learn how.unstructured-api-key
, Value: {{UNSTRUCTURED_API_KEY}}
accept
, Value: application/json
<name>
(required) - A unique name for this workflow.
<source-connector-id>
(required) - The ID of the target source connector. To get the ID,
use the UnstructuredClient
object’s sources.list_sources
function (for the Python SDK) or
the GET
method to call the /sources
endpoint (for curl
or Postman). Learn more.
<destination-connector-id>
(required) - The ID of the target destination connector. To get the ID,
use the UnstructuredClient
object’s destinations.list_destinations
function (for the Python SDK) or
the GET
method to call the /destinations
endpoint (for curl
or Postman). Learn more.
<TYPE>
(for the Python SDK) or <type>
(for curl
or Postman) (required) - The workflow type. Available values include CUSTOM
(for the Python SDK) and custom
(for curl
or Postman).
If <TYPE>
is set to CUSTOM
(for the Python SDK), or if <type>
is set to custom
(for curl
or Postman), you must add a workflow_nodes
array. For instructions, see Custom workflow DAG nodes.
ADVANCED
, BASIC
, and PLATINUM
(for the Python SDK) and
advanced
, basic
, and platinum
(for curl
or Postman) are non-operational and planned to be fully removed in a future release.The ability to create an automatic workflow type is currently not available but is planned to be added in a future release.<schedule-timeframe>
- The repeating automatic run schedule, specified as a predefined phrase. The available predefined phrases are:
every 15 minutes
(for curl
or Postman): Every 15 minutes (cron expression: */15 * * * *
).every hour
: At the first minute of every hour (cron expression: 0 * * * *
).every 2 hours
: At the first minute of every second hour (cron expression: 0 */2 * * *
).every 4 hours
: At the first minute of every fourth hour (cron expression: 0 */4 * * *
).every 6 hours
: At the first minute of every sixth hour (cron expression: 0 */6 * * *
).every 8 hours
: At the first minute of every eighth hour (cron expression: 0 */8 * * *
).every 10 hours
: At the first minute of every tenth hour (cron expression: 0 */10 * * *
).every 12 hours
: At the first minute of every twelfth hour (cron expression: 0 */12 * * *
).daily
: At the first minute of every day (cron expression: 0 0 * * *
).weekly
: At the first minute of every Sunday (cron expression: 0 0 * * 0
).monthly
: At the first minute of the first day of every month (cron expression: 0 0 1 * *
).schedule
is not specified, the workflow does not automatically run on a repeating schedule.
Workflows with a local source cannot be set to run on a repeating schedule.
UnstructuredClient
object’s workflows.update_workflow
function (for the Python SDK) or
the PUT
method to call the /workflows/<workflow-id>
endpoint (for curl
or Postman), replacing
<workflow-id>
with the workflow’s unique ID. To get this ID, see List workflows.
In the request body, specify the settings for the workflow. For the specific settings to include, see
Create a workflow.
Python SDK
Python SDK (async)
curl
Postman
unstructured-api-key
, Value: {{UNSTRUCTURED_API_KEY}}
accept
, Value: application/json
WorkflowType
is set to CUSTOM
(for the Python SDK), or if workflow_type
is set to custom
(for curl
or Postman), you must also specify the settings for the workflow’s
directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes
array.
source_id
value outside of theworkflow_nodes
array.
destination_id
value outside of the
workflow_nodes
array.
workflow_nodes
array will be the same order that these nodes appear in the DAG,
with the first node in the array added directly after the Source node. The Destination node
follows the last node in the array.
type
of partition
.
Learn about the available partitioning strategies.
Python SDK
curl, Postman
settings
include:
strategy
: Required. The partitioning strategy to use. This field must be set to auto
.
provider
: Optional. If the Auto partitioning strategy needs to use the VLM partitioning strategy, then use the specified VLM provider. Allowed values include auto
, openai
, anthropic
, and bedrock
. The default value is anthropic
.
provider_api_key
: Optional. If specified, use a non-default API key for calls to the specified VLM provider as needed. The default is none, which means to rely on using Unstructured’s internal default API key for the VLM provider.
model
: Optional. If the Auto partitioning strategy needs to use the VLM partitioning strategy, then use the specified VLM. The default value is claude-3-5-sonnet-20241022
.
openai
, available values for model
are gpt-4o
and gpt-4o-mini
.
anthropic
, available values for model
are claude-3-5-sonnet-20241022
and claude-3-7-sonnet-20250219
.
bedrock
, available values for model
are:
us.amazon.nova-lite-v1:0
us.amazon.nova-pro-v1:0
us.anthropic.claude-3-opus-20240229-v1:0
us.anthropic.claude-3-haiku-20240307-v1:0
us.anthropic.claude-3-sonnet-20240229-v1:0
us.anthropic.claude-3-5-sonnet-20241022-v2:0
us.meta.llama3-2-11b-instruct-v1:0
us.meta.llama3-2-90b-instruct-v1:0
output_format
: Output. The format of the response. Allowed values include text/html
and application/json
. The default is text/html
.
prompt.text
: Optional. If the Auto partitioning strategy needs to use the VLM partitioning strategy, then use the specified prompt when calling the specified VLM. The default value is none, which means to rely on using Unstructured’s internal default prompt when calling the VLM.
format_html
: Optional. If the Auto partitioning strategy needs to use the VLM partitioning strategy, true (the default) to apply Beautiful Soup’s prettify
method to the HTML that is generated by the VLM partitioner, which for example adds indentation for better readability.
unique_element_ids
: Optional. True (the default) to assign UUIDs to element IDs, which guarantees their uniqueness. This is useful for example when using them as primary keys in a database. False to assign a SHA-256 of the element’s text as its element ID.
is_dynamic
: Required. True to enable dynamic routing of pages to Fast, High Res, or VLM as needed for better overall performance and cost savings.
allow_fast
: Required. True to allow routing of pages to Fast as needed for better overall performance and cost savings.
Python SDK
curl, Postman
settings
include:
provider
: Optional. Use the specified VLM provider. Allowed values include auto
, openai
, anthropic
, and bedrock
. The default value is anthropic
.
provider_api_key
: Optional. If specified, use a non-default API key for calls to the specified VLM provider as needed. The default is none, which means to rely on using Unstructured’s internal default API key for the VLM provider.
model
: Optional. If the Auto partitioning strategy needs to use the VLM partitioning strategy, then use the specified VLM. The default value is claude-3-5-sonnet-20241022
.
openai
, available values for model
are gpt-4o
and gpt-4o-mini
.
anthropic
, available values for model
are claude-3-5-sonnet-20241022
and claude-3-7-sonnet-20250219
.
bedrock
, available values for model
are:
us.amazon.nova-lite-v1:0
us.amazon.nova-pro-v1:0
us.anthropic.claude-3-opus-20240229-v1:0
us.anthropic.claude-3-haiku-20240307-v1:0
us.anthropic.claude-3-sonnet-20240229-v1:0
us.anthropic.claude-3-5-sonnet-20241022-v2:0
us.meta.llama3-2-11b-instruct-v1:0
us.meta.llama3-2-90b-instruct-v1:0
output_format
: Output. The format of the response. Allowed values include text/html
and application/json
. The default is text/html
.
prompt.text
: Optional. Use the specified prompt when calling the specified VLM. The default value is none, which means to rely on using Unstructured’s internal default prompt when calling the VLM.
format_html
: Optional. True (the default) to apply Beautiful Soup’s prettify
method to the HTML that is generated by the VLM partitioner, which for example adds indentation for better readability.
unique_element_ids
: Optional. True (the default) to assign UUIDs to element IDs, which guarantees their uniqueness. This is useful for example when using them as primary keys in a database. False to assign a SHA-256 of the element’s text as its element ID.
is_dynamic
: Required. False to use the VLM strategy.
allow_fast
: Optional. True (the default) to allow routing of pages to Fast as needed for better overall performance and cost savings.
Python SDK
curl, Postman
strategy
: Required. The partitioning strategy to use. This field must be set to hi_res
.
include_page_breaks
: Optional. True to include page breaks in the output if supported by the file type. The default is false.
pdf_infer_table_structure
: Optional. True for any Table
elements extracted from a PDF to include an additional metadata field, text_as_html
, where the value (string) is a just a transformation of the data into an HTML table. The default is false.
exclude_elements
: Optional. A list of any Unstructured element types to exclude from the output. The default is none. Available values include:
FigureCaption
NarrativeText
ListItem
Title
Address
Table
PageBreak
Header
Footer
UncategorizedText
Image
Formula
EmailAddress
xml_keep_tags
: Optional. True to retain any XML tags in the output. False (the default) to just extract the text from any XML tags instead.
encoding
: Optional. The encoding method used to decode the text input. The default is utf-8
.
ocr_languages
: Optional. A list of languages present in the input, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages. The default is [ 'eng' ]
. See the language codes list.
extract_image_block_types
: Optional. A list of the Unstructured element types for use in extracting image blocks as Base64 encoded data stored in metadata
fields. Available values include Image
and Table
. The default is [ 'Image', 'Table' ]
.
infer_table_structure
: Optional. True to have any table elements extracted from a PDF to include an additional metadata
field named text_as_html
, containing an HTML <table>
transformation. The default is false.
Python SDK
curl, Postman
settings
include:
strategy
: Required. The partitioning strategy to use. This field must be set to fast
.
pdf_infer_table_structure
: Optional. Although this field is listed, it applies only to the hi_res
strategy and will not work if set to true. The default is false.
exclude_elements
: Optional. A list of any Unstructured element types to exclude from the output. The default is none. Available values include:
FigureCaption
NarrativeText
ListItem
Title
Address
Table
PageBreak
Header
Footer
UncategorizedText
Image
Formula
EmailAddress
xml_keep_tags
: Optional. True to retain any XML tags in the output. False (the default) to just extract the text from any XML tags instead.
encoding
: Optional. The encoding method used to decode the text input. The default is utf-8
.
ocr_languages
: Optional. A list of languages present in the input, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages. The default is [ 'eng' ]
. See the language codes list.
extract_image_block_types
: Optional. A list of the Unstructured element types for use in extracting image blocks as Base64 encoded data stored in metadata
fields. Available values include Image
and Table
. The default is [ 'Image', 'Table' ]
.
infer_table_structure
: Optional. True to have any table elements extracted from a PDF to include an additional metadata
field named text_as_html
, containing an HTML <table>
transformation. The default is false.
type
of chunk
.
Learn about the available chunking strategies.
Python SDK
curl, Postman
settings
include:
unstructured_api_url
: Optional. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API URL for the chunker.unstructured_api_key
: Optional. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API key for the chunker.include_orig_elements
: Optional. True to have the elements that are used to form a chunk appear in .metadata.orig_elements
for that chunk. The default is false.new_after_n_chars
: Optional. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.max_characters
: Optional. The absolute maximum number of characters in a chunk. The default is none.overlap
: Optional. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.overlap_all
: Optional. True to apply overlap to “normal” chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.contextual_chunking_strategy
: Optional. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include v1
. The default is none.Python SDK
curl, Postman
settings
include:
unstructured_api_url
: Optional. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API URL for the chunker.unstructured_api_key
: Optional. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API key for the chunker.- multipage_sections
: Optional. … The default is false.combine_text_under_n_chars
: Optional. Combines elements from a section into a chunk until a section reaches a length of this many characters. The default is none.include_orig_elements
: Optional. True to have the elements that are used to form a chunk appear in .metadata.orig_elements
for that chunk. The default is false.new_after_n_chars
: Optional. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.max_characters
: Optional. The absolute maximum number of characters in a chunk. The default is none.overlap
: Optional. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.overlap_all
: Optional. True to apply overlap to “normal” chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.contextual_chunking_strategy
: Optional. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include v1
. The default is none.Python SDK
curl, Postman
settings
include:
unstructured_api_url
: Optional. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API URL for the chunker.unstructured_api_key
: Optional. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API key for the chunker.- include_orig_elements
: Optional. … The default is false.include_orig_elements
: Optional. True to have the elements that are used to form a chunk appear in .metadata.orig_elements
for that chunk. The default is false.new_after_n_chars
: Optional. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.max_characters
: Optional. The absolute maximum number of characters in a chunk. The default is none.overlap
: Optional. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.overlap_all
: Optional. True to apply overlap to “normal” chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.contextual_chunking_strategy
: Optional. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include v1
. The default is none.Python SDK
curl, Postman
settings
include:
unstructured_api_url
: Optional. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API URL for the chunker.unstructured_api_key
: Optional. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured’s internal default API key for the chunker.include_orig_elements
: Optional. True to have the elements that are used to form a chunk appear in .metadata.orig_elements
for that chunk. The default is false.new_after_n_chars
: Optional. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.max_characters
: Optional. The absolute maximum number of characters in a chunk. The default is none.overlap
: Optional. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.overlap_all
: Optional. True to apply overlap to “normal” chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.contextual_chunking_strategy
: Optional. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include v1
. The default is none.similarity_threshold
: Optional. The minimum similarity that text in consecutive elements must have to be included in the same chunk. This must be a value between 0.0
and 1.0
, exclusive (0.01
to 0.99
). The default is none.type
of prompter
.
Learn about the available enrichments.
Python SDK
curl, Postman
<subtype>
include:
openai_image_description
anthropic_image_description
bedrock_image_description
Python SDK
curl, Postman
<subtype>
include:
openai_table_description
anthropic_table_description
bedrock_table_description
Python SDK
curl, Postman
Python SDK
curl, Postman
prompt_interface_overrides.prompt.user
: Optional. Any alternative prompt to use with the underlying NER model. The default is none, which means to rely on using Unstructured’s internal default prompt when calling the NER model.
The internal default prompt is as follows, which you can override by providing an alternative prompt:
PERSON
, ORGANIZATION
, LOCATION
, and so on).
works_for
, based_in
, has_role
, and so on).
<subtype>
include:
openai_ner
anthropic_ner
type
of embed
.
Learn about the available embedding providers and models.
Python SDK
curl, Postman
subtype
and model_name
include:
"subtype": "azure_openai"
"model_name": "text-embedding-3-small"
"model_name": "text-embedding-3-large"
"model_name": "text-embedding-ada-002"
"subtype": "bedrock"
"model_name": "amazon.titan-embed-text-v2:0"
"model_name": "amazon.titan-embed-text-v1"
"model_name": "amazon.titan-embed-image-v1"
"model_name": "cohere.embed-english-v3"
"model_name": "cohere.embed-multilingual-v3"
"subtype": "togetherai"
"model_name": "togethercomputer/m2-bert-80M-32k-retrieval"
"subtype": "voyageai"
"model_name": "voyage-3"
"model_name": "voyage-3-large"
"model_name": "voyage-3-lite"
"model_name": "voyage-code-3"
"model_name": "voyage-finance-2"
"model_name": "voyage-law-2"
"model_name": "voyage-code-2"
"model_name": "voyage-multimodal-3"