additional_partition_args
: A JSON string representation of any values to pass through to the partition
function.
encoding
: The encoding method used to decode the text input. By default, UTF-8 will be used.
ocr_languages
: The languages present in the document, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages.
skip_infer_table_types
: List of document types that you want to skip table extraction with.
strategy
: Default: auto
. The strategy to use for partitioning PDF and image files. Uses a layout detection model if set to hi_res
. Otherwise, partitioning simply extracts the text from the document and processes it.
api_key
: If partition_by_api
is set to True
, requests that are sent to the Unstructured API will use this Unstructured API key to make authenticated calls.
fields_include
: Fields to include in the output JSON. By default, the following fields are included: element_id
, text
, type
, metadata
, and embeddings
.
flatten_metadata
: Default: False
. If set to True
, the hierarchical metadata structure is flattened to have all values exist at the top level.
hi_res_model_name
: The model to use when strategy
is set to hi_res
. Available values are layout_v1.0.0
(the default) and yolox
.
metadata_exclude
: Values from the metadata
field to exclude from the output.
metadata_include
: If provided, only the specified fields are preserved in the metadata
output.
partition_by_api
: Default: False
. If set to True
, uses Unstructured to run partitioning. If set to False
, runs partitioning locally.
partition_endpoint
: If partition_by_api
is set to True
, partitioning requests are sent to this Unstructured API URL.