> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Image descriptions

<iframe width="560" height="315" src="https://www.youtube.com/embed/pMQm9ymM3N8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

After partitioning, you can have Unstructured generate text-based summaries of detected images.

This summarization is done by using models offered through various model providers.

Here is an example of the output of a detected image using GPT-4o. Note specifically the `text` field that is added.
In this `text` field, `type` indicates the kind of image that was detected (in this case, a `diagram`), and `description` is a summary of the image.
Line breaks have been inserted here for readability. The output will not contain these line breaks.

<img src="https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=16c8e1ac1fcbd00932c9f2b4ebfa2005" alt="Example of a diagram" width="1112" height="842" data-path="img/enriching/Diagram-Example.png" />

```json theme={null}
{
  "type": "Image",
  "element_id": "dd1fb72db7937725c9a781906098e6f8",
  "text": "{\n    
    \"type\": \"diagram\",\n    
    \"description\": \"User uploads a flowchart image via a Web Browser, which is then 
      converted to a Base64 Encoded Image. This image is sent to the Back-end System 
      (Node.js) where it is processed by the AI Model Adapter. The output undergoes 
      Validation and Rendering, resulting in Normalized Mermaid Code. AI Assisted 
      Editing is available through an AI Assistant, which allows for the Regenerated 
      Flowchart Image to be viewed again in the Web Browser.\\n\\n
      Text in the image:\\n
        - User\\n
        - Upload flowchart image\\n
        - Web Browser\\n
        - Base64 Encoded Image\\n
        - Back-end System (Node.js)\\n
        - AI Model Adapter\\n
        - Validation and Rendering\\n
        - Normalized Mermaid Code\\n
        - AI Assisted Editing\\n
        - AI Assistant\\n
        - Regenerated Flowchart Image\"\n
  }",
  "metadata": {
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_base64": "/9j...<full results omitted for brevity>...Q==",
    "image_mime_type": "image/jpeg",
    "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
    "data_source": {}
  }
}
```

For technical drawings, the `text` field will contain a `type` of `technical drawing`; `description` with `texts` containing text strings found in the drawing,
`tables` containing HTML representations of tables found in the drawing, and a `description` containing a summary of the drawing.
Here is an example. Line breaks have been inserted here for readability. The output will not contain these line breaks.

<img src="https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=e4d3fb249208b3166b0e79a9d54b4cdf" alt="Example of a technical drawing" width="1468" height="782" data-path="img/enriching/Technical-Drawing-Example.png" />

```json theme={null}
{
  "type": "Image",
  "element_id": "7877acdd762f2afc65b193fa89d8ef46",
  "text": "{\n  
    \"type\": \"technical drawing\",\n  
    \"description\": {\n    
      \"texts\": [\n
        \"RTD 1\",\n      
        \"RTD 2\",\n      
        \"01\",\n      
        \"18.50\\\" Cable Length\",\n      
        \"02\",\n      
        \"1/4\\\" Heat Shrink\",\n      
        \"6X Strip wires 0.100\\\" - 0.115\\\" before crimping\",\n      
        \"2X 1.50\",\n      
        \"22.25\\\" Cable Length\"\n    
      ],\n    
      \"tables\": "<table>
        <thead>
          <tr>
            <th>Item</th>
            <th>Quantity</th>
            <th>Part Number</th>
            <th>Description</th>
            <th>Supplier</th>
            <th>Supplier PN</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>1</td>
            <td>6</td>
            <td>002622</td>
            <td>Conn Socket 20-24AWG Gold</td>
            <td>Digikey</td>
            <td>WM7082CT-ND</td>
          </tr>
          <tr>
            <td>2</td>
            <td>1</td>
            <td>002647</td>
            <td>Conn Recept 16pos 3mm Dual Row</td>
            <td>Digikey</td>
            <td>WM2490-ND</td>
          </tr>
          <tr>
              <td>3</td>
              <td>2</td>
              <td>102961-01</td>
              <td>M12 Q/D Cable, Elbow, 4-Pole, 5m</td>
              <td>Automation Direct</td>
              <td>EVT222</td>
          </tr>
        </tbody>
      </table>",\n    
      \"description\": \"The technical drawing depicts a wiring setup involving two 
          RTDs (Resistance Temperature Detectors) labeled RTD 1 and RTD 2. Each RTD 
          is connected via cables with specified lengths: RTD 1 has an 18.50-inch 
          cable length, and RTD 2 has a 22.25-inch cable length. The drawing 
          includes annotations for stripping wires, indicating that six wires should 
          be stripped to a length between 0.100 inches and 0.115 inches before 
          crimping. There is a section labeled '1/4\\\" Heat Shrink' and a dimension 
          marked '2X 1.50'. The drawing uses numbered circles to reference specific 
          parts or steps in the process.\"\n  
      }\n
  }",
  "metadata": {
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_base64": "/9j...<full results omitted for brevity>...Q==",
    "image_mime_type": "image/jpeg",
    "filename": "Material-Callouts-c4655c0c.pDF",
    "data_source": {}
  }
}
```

For images of data plot graphs (for example bar charts, line charts, pie charts, and scatter plots),
the `text` field will contain a `type` of `graph`; `description` containing a summary of the graph,
`data` containing numerical values that can be used to regenerate the graph, and `texts` containing
the text strings found in the graph.
Here is an example. Line breaks have been inserted here for readability. The output will not contain these line breaks.

<img src="https://mintcdn.com/unstructured-53/MA9021_G4PBhH-to/img/enriching/Data-Plot-Graph-Example.png?fit=max&auto=format&n=MA9021_G4PBhH-to&q=85&s=dfd91ebf0612d8de5a5d2f71148c3653" alt="Example of a data-plot-graph" width="660" height="299" data-path="img/enriching/Data-Plot-Graph-Example.png" />

```json theme={null}
{
  "type": "Image",
  "element_id": "322b067931bc59555ae95dbb63ab9b78",
  "text": "{
    \"type\": \"graph\",
    \"description\": \"This is a comprehensive comparison chart showing 6 different 
      metrics across 7 machine learning models (RF, FFNN, LSTM, MHA, GCN, GCN+MHA, 
      MT+GCN+MHA) represented by different colored bars. The metrics include 
      PAVIS-TW-MSE, PAVIS-LS-MSE, PAVIS-EL@ACC 1, Trainable Parameters, Training 
      Time, and Inference Time. The models are categorized into paradigms: SNN (blue), 
      TNN (green), ReNN (orange), TReNN (purple), and MT-TReNN (red). Key trends show 
      RF and FFNN having higher MSE values but faster inference times, while more 
      complex models like MT+GCN+MHA achieve better accuracy but require more parameters 
      and training time.\", 
    \"data\": {
      \"PAVIS-TW-MSE\": {
        \"RF\": 1.222, \"FFNN\": 1.039, \"LSTM\": 0.817, \"MHA\": 0.790, \"GCN\": 0.800, 
          \"GCN+MHA\": 0.768, \"MT+GCN+MHA\": 0.760
      }, 
      \"PAVIS-LS-MSE\": {
        \"RF\": 0.763, \"FFNN\": 0.878, \"LSTM\": 0.649, \"MHA\": 0.611, \"GCN\": 0.596, 
        \"GCN+MHA\": 0.551, \"MT+GCN+MHA\": 0.549
      }, 
      \"PAVIS-EL@ACC 1\": {
        \"RF\": 79.54, \"FFNN\": 77.05, \"LSTM\": 80.69, \"MHA\": 82.10, \"GCN\": 83.39, 
        \"GCN+MHA\": 85.95, \"MT+GCN+MHA\": 85.98
      }, 
      \"Trainable Parameters\": {
        \"RF\": 328050, \"FFNN\": 237790, \"LSTM\": 264516, \"MHA\": 318941, \"GCN\": 221886, 
        \"GCN+MHA\": 324018, \"MT+GCN+MHA\": 99638
      }, 
      \"Training Time\": {
        \"RF\": 17968, \"FFNN\": 1447, \"LSTM\": 1985, \"MHA\": 2502, \"GCN\": 1620, 
        \"GCN+MHA\": 2808, \"MT+GCN+MHA\": 748
      }, 
      \"Inference Time\": {
        \"RF\": 190, \"FFNN\": 35, \"LSTM\": 75, \"MHA\": 79, \"GCN\": 69, 
        \"GCN+MHA\": 108, \"MT+GCN+MHA\": 18
      }
    }, 
    \"texts\": \"PAVIS - TW - MSE (↓) PAVIS - LS - MSE (↓) PAVIS - EL@ACC 1 (↑) Paradigm 
      SNN TNN ReNN TReNN MT-TReNN 1.222 1.039 0.817 0.790 0.800 0.768 0.760 RF FFNN LSTM 
      MHA GCN GCN+MHA MT+GCN+MHA 0.763 0.878 0.649 0.611 0.596 0.551 0.549 RF FFNN LSTM 
      MHA GCN GCN+MHA MT+GCN+MHA 79.54 77.05 80.69 82.10 83.39 85.95 85.98 RF FFNN LSTM 
      MHA GCN GCN+MHA MT+GCN+MHA Trainable Parameters (↓) Training Time (ms/epoch) (↓) 
      Inference Time (ms) (↓) 328050 237790 264516 318941 221886 324018 99638 RF FFNN LSTM 
      MHA GCN GCN+MHA MT+GCN+MHA 17968 1447 1985 2502 1620 2808 748 RF FFNN LSTM 
      MHA GCN GCN+MHA MT+GCN+MHA 190 35 75 79 69 108 18 RF FFNN 
      LSTM MHA GCN GCN+MHA MT+GCN+MHA\"
  }",
  "metadata": {
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 17,
    "image_base64": "/9j...<full results omitted for brevity>...//Z",
    "image_mime_type": "image/jpeg",
    "filename": "250713305v1-48da34f2.pdf",
    "data_source": {}
  }
}
```

<Note>
  The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/concepts/partitioning) by using the High Res strategy. This field is not generated for
  documents or PDF pages that are partitioned by using the Fast or VLM strategy.
</Note>

For workflows that use [chunking](/concepts/chunking), note the following changes:

* Each `Image` element is replaced by a `CompositeElement` element.
* This `CompositeElement` element will contain the image's summary description as part of the element's `text` field.
* This `CompositeElement` element will not contain an `image_base64` field.

Here are three examples of the descriptions for detected images. These descriptions are generated with GPT-4o by OpenAI:

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8a0b06395070f17ab506d0fcaf16215a" alt="Description of an image showing a scatter plot graph" width="2970" height="458" data-path="img/enriching/Image-Description-1.png" />

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=267621de3b05f57c3c028dedb956d627" alt="Description of an image showing the Matthews Correlation Coefficient for different VQA datasets" width="2970" height="442" data-path="img/enriching/Image-Description-2.png" />

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c79e8f0ac5f35e1712ae4518d0b5636e" alt="Description of an image showing three scatter plots" width="2982" height="374" data-path="img/enriching/Image-Description-3.png" />

Any embeddings that are produced after these summaries are generated will be based on the `text` field's contents.

## Generate image descriptions

To have Unstructured generate image descriptions, do the following:

* For **Unstructured UI** users, add an [Enrichment node](/ui/workflows#custom-workflow-node-types) of type **NER**
  to an Unstructured [custom workflow](/ui/workflows#create-a-custom-workflow).
* For **Unstructured API** users, add an [Image Description task](/api-reference/workflow/nodes/enrichment/enrichment-image-description). You add this task
  as either as an object in a `workflow_nodes` array
  (for curl) or as a `WorkflowNode` in a `WorkflowNodes` collection (for Python). This object or collection applies whenever you
  [create a workflow](/api-reference/api/workflow/create-workflow),
  [update a workflow](/api-reference/api/workflow/update-workflow), or
  [create an on-demand workflow job](/api-reference/api/job/create-job).
