Skip to main content
Type: chunk Subtype: chunk_by_table_merging

Usage guidance

Use this strategy when documents contain long tables that span multiple pages. It only acts on Table elements with a metadata.text_as_html field on adjacent pages. It pairs well with a table2html enrichment earlier in the workflow.

Settings

table_merging_provider
string
required
The LLM provider to use for merge decisions. Allowed values: anthropic, openai, bedrock, vertexai. Default: none.
table_merging_model
string
The model name for the selected provider. Defaults to the provider’s default. For a full list of the models available in Unstructured, see Available models.
confidence_threshold
string
Minimum LLM confidence required to merge a table pair. Allowed values: low, medium, high. Default: medium.
table_merging_max_concurrency
integer
Maximum number of merge requests to run at once. Minimum: 1. Default: 4.
chunk_by_table_merging_chunker_workflow_node = WorkflowNode(
    name="Chunker",
    subtype="chunk_by_table_merging",
    type="chunk",
    settings={
        "table_merging_provider": "<provider>",
        "table_merging_model": "<model>",
        "confidence_threshold": "<low|medium|high>",
        "table_merging_max_concurrency": <table-merging-max-concurrency>
    }
)