POST
/
general
/
v0
/
general
curl --request POST \
  --url https://api.unstructured.io/general/v0/general \
  --header 'Content-Type: multipart/form-data' \
  --header 'unstructured-api-key: <api-key>'
[
  "<any>"
]

Authorizations

unstructured-api-key
string
headerrequired

Body

multipart/form-data
files
string

The file to extract

strategy
string

The strategy to use for partitioning PDF/image. Options are fast, hi_res, auto. Default: auto

gz_uncompressed_content_type
string

If file is gzipped, use this content type after unzipping

output_format
string

The format of the response. Supported formats are application/json and text/csv. Default: application/json.

coordinates
boolean

If true, return coordinates for each element. Default: false

encoding
string

The encoding method used to decode the text input. Default: utf-8

hi_res_model_name
string

The name of the inference model used when strategy is hi_res

include_page_breaks
boolean

If True, the output will include page breaks if the filetype supports it. Default: false

languages
string[]

The languages present in the document, for use in partitioning and/or OCR

pdf_infer_table_structure
boolean

If True and strategy=hi_res, any Table Elements extracted from a PDF will include an additional metadata field, 'text_as_html', where the value (string) is a just a transformation of the data into an HTML <table>.

skip_infer_table_types
string[]

The document types that you want to skip table extraction with. Default: ['pdf', 'jpg', 'png']

xml_keep_tags
boolean

If True, will retain the XML tags in the output. Otherwise it will simply extract the text from within the tags. Only applies to partition_xml.

chunking_strategy
string

Use one of the supported strategies to chunk the returned elements. Currently supports: by_title

combine_under_n_chars
integer

If chunking strategy is set, combine elements until a section reaches a length of n chars. Default: max_characters

include_orig_elements
boolean

When True (the default), the elements used to form a chunk appear in .metadata.orig_elements for that chunk. Only applies when chunking is specified using the chunking_strategy argument.

max_characters
integer

If chunking strategy is set, cut off new sections after reaching a length of n chars (hard max). Default: 500

multipage_sections
boolean

If chunking strategy is set, determines if sections can span multiple pages. Only applies to by_title chunking strategy.Default: true

new_after_n_chars
integer

If chunking strategy is set, cut off new sections after reaching a length of n chars (soft max). Default: max_characters (off)

overlap
integer

A prefix of this many trailing characters from the prior text-split chunk is applied to second and later chunks formed from oversized elements by text-splitting. Default: None

overlap_all
boolean

When True, overlap is also applied to 'normal' chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. Default: None

extract_image_block_types
string[]

The types of elements to extract, for use in extracting image blocks as base64 encoded data stored in metadata fields

Response

200 - application/json

The response is of type any[].