Type:Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
partition
Subtype: unstructured_api
Settings
Partitioning strategy to use. Must be set to
fast.If
true, includes page breaks in the output where supported by the file type. Default: false.Applies only to the
hi_res strategy and has no effect here. Default: false.List of Unstructured element types to exclude from the output. Default: none. Allowed values:
AddressEmailAddressFigureCaptionFooterFormulaHeaderImageListItemNarrativeTextPageBreakTableTitleUncategorizedText
If
true, retains XML tags in the output. If false, extracts only the text content from XML tags. Default: false.Encoding method used to decode the text input. Default:
utf-8.Languages present in the input, for use in partitioning, OCR, or both. Multiple values indicate the text could be in any of the specified languages. Default:
['eng']. See the language codes list.Unstructured element types for which image blocks are extracted as Base64-encoded data and stored in
metadata fields. Default: none. Allowed values:AbstractBulletedTextCaptionCodeSnippetCompositeElementFigureFigureCaptionFormFormKeysValuesFormulaHeaderImageListList-itemListItemNarrativeTextParagraphPictureTableTextThreadingTitleUncategorizedText
If
true, any table elements extracted from a PDF include an additional text_as_html metadata field containing an HTML <table> representation. Default: false.If
true, each element extracted from a PDF includes position information relative to its page. Default: false.
