Tables to HTML - Unstructured

After partitioning and chunking, you can have Unstructured generate representations of each detected table in HTML markup format.

This table-to-HTML output is done by using GPT-4o, provided through OpenAI.

Here is an example of the HTML markup output of a detected table using GPT-4o. Note specifically the text_as_html field that is added. Line breaks have been inserted here for readability. The output will not contain these line breaks.

{
    "type": "Table",
    "element_id": "31aa654088742f1388d46ea9c8878272",
    "text": "Inhibitor Polarization Corrosion be (V/dec) ba (V/dec) Ecorr (V) icorr 
        (AJcm?) concentration (g) resistance (Q) rate (mmj/year) 0.0335 0.0409 
        \u20140.9393 0.0003 24.0910 2.8163 1.9460 0.0596 .8276 0.0002 121.440 
        1.5054 0.0163 0.2369 .8825 0.0001 42121 0.9476 s NO 03233 0.0540 
        \u20140.8027 5.39E-05 373.180 0.4318 0.1240 0.0556 .5896 5.46E-05 
        305.650 0.3772 = 5 0.0382 0.0086 .5356 1.24E-05 246.080 0.0919",
    "metadata": {
        "text_as_html": "```html\n
            <table>\n
                <tr>\n<th>Inhibitor concentration (g)</th>\n
                    <th>bc (V/dec)</th>\n<th>ba (V/dec)</th>\n<th>Ecorr (V)</th>\n
                    <th>icorr (A/cm\u00b2)</th>\n<th>Polarization resistance (\u03a9)</th>\n
                    <th>Corrosion rate (mm/year)</th>\n
                </tr>\n  
                <tr>\n
                    <td>0</td>\n<td>0.0335</td>\n<td>0.0409</td>\n<td>\u22120.9393</td>\n
                    <td>0.0003</td>\n<td>24.0910</td>\n<td>2.8163</td>\n  
                </tr>\n
                <tr>\n   
                    <td>2</td>\n<td>1.9460</td>\n<td>0.0596</td>\n<td>\u22120.8276</td>\n<td>0.0002</td>\n<td>121.440</td>\n<td>1.5054</td>\n  
                </tr>\n
                <tr>\n
                    <td>4</td>\n<td>0.0163</td>\n<td>0.2369</td>\n<td>\u22120.8825</td>\n<td>0.0001</td>\n<td>42.121</td>\n<td>0.9476</td>\n  
                </tr>\n  
                <tr>\n
                    <td>6</td>\n<td>0.3233</td>\n<td>0.0540</td>\n<td>\u22120.8027</td>\n<td>5.39E-05</td>\n<td>373.180</td>\n<td>0.4318</td>\n  
                </tr>\n  
                <tr>\n
                    <td>8</td>\n<td>0.1240</td>\n<td>0.0556</td>\n<td>\u22120.5896</td>\n<td>5.46E-05</td>\n<td>305.650</td>\n<td>0.3772</td>\n  
                </tr>\n  
                <tr>\n
                    <td>10</td>\n<td>0.0382</td>\n<td>0.0086</td>\n<td>\u22120.5356</td>\n<td>1.24E-05</td>\n<td>246.080</td>\n<td>0.0919</td>\n
                </tr>\n
            </table>\n```",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...//Z",
        "image_mime_type": "image/jpeg",
        "filename": "embedded-images-tables.pdf",
        "data_source": {}
    }
}

Generate table-to-HTML output

To generate table-to-HTML output, in an Enrichment node in a workflow, for Model, select OpenAI (GPT-4o).

Make sure after you choose this provider and model, that Table to HTML is also selected.

You can change a workflow’s table description settings only through Custom workflow settings.

Table-to-HTML generation happens only when the Partitioner node in a workflow is set to use the High Res partitioning strategy and the workflow also contains a table-to-HTML enrichment node.

Setting the Partitioner node to use Auto, VLM, or Fast in a workflow that also contains a table-to-HTML enrichment node will not generate any table-to-HTML output, and it could also cause the workflow to stop running or produce unexpected results.

On this page

Generate table-to-HTML output

After partitioning and chunking, you can have Unstructured generate representations of each detected table in HTML markup format.

This table-to-HTML output is done by using GPT-4o, provided through OpenAI.

{
    "type": "Table",
    "element_id": "31aa654088742f1388d46ea9c8878272",
    "text": "Inhibitor Polarization Corrosion be (V/dec) ba (V/dec) Ecorr (V) icorr 
        (AJcm?) concentration (g) resistance (Q) rate (mmj/year) 0.0335 0.0409 
        \u20140.9393 0.0003 24.0910 2.8163 1.9460 0.0596 .8276 0.0002 121.440 
        1.5054 0.0163 0.2369 .8825 0.0001 42121 0.9476 s NO 03233 0.0540 
        \u20140.8027 5.39E-05 373.180 0.4318 0.1240 0.0556 .5896 5.46E-05 
        305.650 0.3772 = 5 0.0382 0.0086 .5356 1.24E-05 246.080 0.0919",
    "metadata": {
        "text_as_html": "```html\n
            <table>\n
                <tr>\n<th>Inhibitor concentration (g)</th>\n
                    <th>bc (V/dec)</th>\n<th>ba (V/dec)</th>\n<th>Ecorr (V)</th>\n
                    <th>icorr (A/cm\u00b2)</th>\n<th>Polarization resistance (\u03a9)</th>\n
                    <th>Corrosion rate (mm/year)</th>\n
                </tr>\n  
                <tr>\n
                    <td>0</td>\n<td>0.0335</td>\n<td>0.0409</td>\n<td>\u22120.9393</td>\n
                    <td>0.0003</td>\n<td>24.0910</td>\n<td>2.8163</td>\n  
                </tr>\n
                <tr>\n   
                    <td>2</td>\n<td>1.9460</td>\n<td>0.0596</td>\n<td>\u22120.8276</td>\n<td>0.0002</td>\n<td>121.440</td>\n<td>1.5054</td>\n  
                </tr>\n
                <tr>\n
                    <td>4</td>\n<td>0.0163</td>\n<td>0.2369</td>\n<td>\u22120.8825</td>\n<td>0.0001</td>\n<td>42.121</td>\n<td>0.9476</td>\n  
                </tr>\n  
                <tr>\n
                    <td>6</td>\n<td>0.3233</td>\n<td>0.0540</td>\n<td>\u22120.8027</td>\n<td>5.39E-05</td>\n<td>373.180</td>\n<td>0.4318</td>\n  
                </tr>\n  
                <tr>\n
                    <td>8</td>\n<td>0.1240</td>\n<td>0.0556</td>\n<td>\u22120.5896</td>\n<td>5.46E-05</td>\n<td>305.650</td>\n<td>0.3772</td>\n  
                </tr>\n  
                <tr>\n
                    <td>10</td>\n<td>0.0382</td>\n<td>0.0086</td>\n<td>\u22120.5356</td>\n<td>1.24E-05</td>\n<td>246.080</td>\n<td>0.0919</td>\n
                </tr>\n
            </table>\n```",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...//Z",
        "image_mime_type": "image/jpeg",
        "filename": "embedded-images-tables.pdf",
        "data_source": {}
    }
}

Generate table-to-HTML output

To generate table-to-HTML output, in an Enrichment node in a workflow, for Model, select OpenAI (GPT-4o).

Make sure after you choose this provider and model, that Table to HTML is also selected.

You can change a workflow’s table description settings only through Custom workflow settings.

Table-to-HTML generation happens only when the Partitioner node in a workflow is set to use the High Res partitioning strategy and the workflow also contains a table-to-HTML enrichment node.

On this page

Generate table-to-HTML output

​Generate table-to-HTML output

FAQ

​Generate table-to-HTML output

Generate table-to-HTML output

Generate table-to-HTML output