Task

You want to get, save, or show the contents of elements that are represented as HTML, such as tables that are embedded in a PDF document.

Approach

Extract the contents of an element’s text_as_html JSON object, which is nested inside of its parent metadata object.

To run this example

You will need a document that is one of the document types that can output the text_as_html JSON object. For the list of applicable document types, see the entries in the table at the beginning of Partitioning where “Table Support” is “Yes.”

This example uses a PDF file with an embedded table.

Code

See also