Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

If WorkflowType is set to CUSTOM (for the Python SDK), or if workflow_type is set to custom (for curl or Postman), you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array.
  • A Source node is automatically created when you specify the source_id value outside of the
    workflow_nodes array.
  • A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array.
  • You can specify Partitioner, Enrichment, Chunker, and Embedder nodes.
    Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and generative OCR optimizations, only for workflows that are configured as follows:
    • With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is added.
    • With a Partitioner node set to use the VLM partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is needed (or allowed).
    Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible for processing with the following partitioning strategies:
    • High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
    • VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
    Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:
    • With a Partitioner node set to use the Fast partitioning strategy.
    • With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.
    Unstructured never produces generative OCR optimizations for workflows with a Partitioner node set to use the Fast partitioning strategy.
  • The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array.
  • Be sure to specify nodes in the allowed order. The following DAG placements are all allowed:
For workflows that use Chunker and enrichment nodes together, the Chunker node should be placed after all enrichment nodes. Placing the Chunker node before any enrichment nodes could cause incomplete or no enrichment results to be generated.