Pipeline
Authorizations
Body
The file to extract
The strategy to use for partitioning PDF/image. Options are fast, hi_res, auto. Default: auto
If file is gzipped, use this content type after unzipping
The format of the response. Supported formats are application/json and text/csv. Default: application/json.
If true, return coordinates for each element. Default: false
The encoding method used to decode the text input. Default: utf-8
The name of the inference model used when strategy is hi_res
If True, the output will include page breaks if the filetype supports it. Default: false
The languages present in the document, for use in partitioning and/or OCR
If True and strategy=hi_res, any Table Elements extracted from a PDF will include an additional metadata field, 'text_as_html', where the value (string) is a just a transformation of the data into an HTML <table>.
The document types that you want to skip table extraction with. Default: ['pdf', 'jpg', 'png']
If True, will retain the XML tags in the output. Otherwise it will simply extract the text from within the tags. Only applies to partition_xml.
Use one of the supported strategies to chunk the returned elements. Currently supports: by_title
If chunking strategy is set, combine elements until a section reaches a length of n chars. Default: max_characters
When True (the default), the elements used to form a chunk appear in .metadata.orig_elements
for that chunk. Only applies when chunking is specified using the chunking_strategy
argument.
If chunking strategy is set, cut off new sections after reaching a length of n chars (hard max). Default: 500
If chunking strategy is set, determines if sections can span multiple pages. Only applies to by_title chunking strategy.Default: true
If chunking strategy is set, cut off new sections after reaching a length of n chars (soft max). Default: max_characters (off)
A prefix of this many trailing characters from the prior text-split chunk is applied to second and later chunks formed from oversized elements by text-splitting. Default: None
When True, overlap is also applied to 'normal' chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. Default: None
The types of elements to extract, for use in extracting image blocks as base64 encoded data stored in metadata fields
Response
The response is of type any[]
.
Was this page helpful?