chunk
Subtype: chunk_by_character
Settings
If specified, overrides the default API URL used for chunker calls. Default: none (uses Unstructured’s internal default).
If specified, overrides the default API key used for chunker calls. Default: none (uses Unstructured’s internal default).
If
true, the elements used to form a chunk appear in .metadata.orig_elements for that chunk. Default: false.Soft maximum length of a chunk in characters. Closes a section after reaching approximately this length. Default: none.
Hard maximum number of characters in a chunk. Default: none.
Number of trailing characters from the prior text-split chunk to prepend to each subsequent chunk formed by splitting an oversized element. Default: none.
If
true, applies overlap to chunks formed by combining whole elements, not just oversized ones. Use with caution — this can introduce noise into otherwise clean semantic units. Default: false.If
true, each table is placed in its own dedicated chunk, separate from any surrounding
text elements. If false, small tables may share a chunk with adjacent text, producing
mixed chunks that contain both text and table content.Regardless of this setting, narrative overlap is never prepended to table-only chunks,
and table overlap is never appended to following narrative chunks.Note that max_characters applies to a chunk’s visible .text content. When a table
joins a mixed chunk with isolate_table set to false, the table’s
metadata.text_as_html is not subject to the same size budget. If the table has a large
HTML representation, the serialized chunk payload may exceed max_characters. Use
isolate_table: true if you have strict payload size requirements.Default: true.If specified, prepends chunk-specific explanatory context to each chunk. Allowed value:
v1. Default: none.
