Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

After partitioning, you can have Unstructured generate text-based summaries of detected tables. This summarization is done by using models offered through various model providers. Here is an example of the output of a detected table using GPT-4o. Note specifically the text field that is added. Line breaks have been inserted here for readability. The output will not contain these line breaks.
{
    "type": "Table",
    "element_id": "5713c0e90194ac7f0f2c60dd614bd24d",
    "text": "The table consists of 6 rows and 7 columns. The columns represent 
        inhibitor concentration (g), bc (V/dec), ba (V/dec), Ecorr (V), icorr 
        (A/cm\u00b2), polarization resistance (\u03a9), and corrosion rate 
        (mm/year). As the inhibitor concentration increases, the corrosion 
        rate generally decreases, indicating the effectiveness of the 
        inhibitor. Notably, the polarization resistance increases with higher 
        inhibitor concentrations, peaking at 6 grams before slightly 
        decreasing. This suggests that the inhibitor is most effective at 
        6 grams, significantly reducing the corrosion rate and increasing 
        polarization resistance. The data provides valuable insights into the 
        optimal concentration of the inhibitor for corrosion prevention.",
    "metadata": {
        "text_as_html": "<table>...<full results omitted for brevity>...</table>",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...//Z",
        "image_mime_type": "image/jpeg",
        "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
        "data_source": {}
    }
}
The image_base64 field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for documents or PDF pages that are partitioned by using the Fast or VLM strategy.
Here are two examples of the descriptions for detected tables. These descriptions are generated with GPT-4o by OpenAI: Description of a table with information about endoscopic datasets Description of a table with information about potentiodynamic polarization of stainless steel The generated table’s summary will overwrite any text that Unstructured had previously extracted from that table into the text field. The table’s original content is available in the image_base64 field.
The image_base64 field is generated only for documents or PDF pages that are partitioned by using the High Res strategy. This field is not generated for documents or PDF pages that are partitioned by using the Fast or VLM strategy.
For workflows that use chunking, note the following changes:
  • If a Table element must be chunked, the Table element is replaced by a set of related TableChunk elements.
  • Each of these TableChunk elements will contain a summary description only for its own element, as part of the element’s text field.
  • These TableChunk elements will not contain an image_base64 field.
Any embeddings that are produced after these summaries are generated will be based on the new text field’s contents.

Generate table descriptions

To have Unstructured generate image descriptions, do the following:

Learn more