Ingest configuration
Partition configuration
A standard partition configuration is a collection of parameters designed to oversee document partitioning, whether executed through API integration or by the unstructured library on a local system. These parameters serve a dual role, encompassing those passed to the partition method for the initial segmentation of documents and those responsible for coordinating data after processing, including the dynamic metadata associated with each element.
Configs for Partitioning
-
,
additional_partition_args
: A JSON string representation of any values to pass through to thepartition
function. -
,
encoding
: The encoding method used to decode the text input. By default, UTF-8 will be used. -
,
ocr_languages
: The languages present in the document, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages. -
pdf_infer_table_structure
: Deprecated! Useskip_infer_table_types
to opt out of table extraction for any file type. IfFalse
andstrategy=hi_res
, noTable
elements will be extracted from PDF files regardless ofskip_infer_table_types
contents. -
,
skip_infer_table_types
: List of document types that you want to skip table extraction with. -
,
strategy
: Default:auto
. The strategy to use for partitioning PDF and image files. Uses a layout detection model if set tohi_res
. Otherwise, partitioning simply extracts the text from the document and processes it.
Configs for the Process
-
,
api_key
: Ifpartition_by_api
is set toTrue
, requests that are sent to the Unstructured API will use this Unstructured API key to make authenticated calls. -
,
fields_include
: Fields to include in the output JSON. By default, the following fields are included:element_id
,text
,type
,metadata
, andembeddings
. -
,
flatten_metadata
: Default:False
. If set toTrue
, the hierarchical metadata structure is flattened to have all values exist at the top level. -
,
hi_res_model_name
: The model to use whenstrategy
is set tohi_res
. Available values arelayout_v1.0.0
(the default) andyolox
. -
,
metadata_exclude
: Values from themetadata
field to exclude from the output. -
,
metadata_include
: If provided, only the specified fields are preserved in themetadata
output. -
,
partition_by_api
: Default:False
. If set toTrue
, uses Unstructured API services to run partitioning. If set toFalse
, runs partitioning locally. -
,
partition_endpoint
: Ifpartition_by_api
is set toTrue
, partitioning requests are sent to this Unstructured API URL.
Was this page helpful?