Configs for Partitioning
-
additional_partition_args
: A JSON string representation of any values to pass through to thepartition
function. -
encoding
: The encoding method used to decode the text input. By default, UTF-8 will be used. -
ocr_languages
: The languages present in the document, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages. -
skip_infer_table_types
: List of document types that you want to skip table extraction with. -
strategy
: Default:auto
. The strategy to use for partitioning PDF and image files. Uses a layout detection model if set tohi_res
. Otherwise, partitioning simply extracts the text from the document and processes it.
Configs for the Process
-
api_key
: Ifpartition_by_api
is set toTrue
, requests that are sent to the Unstructured API will use this Unstructured API key to make authenticated calls. -
fields_include
: Fields to include in the output JSON. By default, the following fields are included:element_id
,text
,type
,metadata
, andembeddings
. -
flatten_metadata
: Default:False
. If set toTrue
, the hierarchical metadata structure is flattened to have all values exist at the top level. -
hi_res_model_name
: The model to use whenstrategy
is set tohi_res
. Available values arelayout_v1.0.0
(the default) andyolox
. -
metadata_exclude
: Values from themetadata
field to exclude from the output. -
metadata_include
: If provided, only the specified fields are preserved in themetadata
output. -
partition_by_api
: Default:False
. If set toTrue
, uses Unstructured to run partitioning. If set toFalse
, runs partitioning locally. -
partition_endpoint
: Ifpartition_by_api
is set toTrue
, partitioning requests are sent to this Unstructured API URL.