After partitioning and chunking, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as named entity recognition (NER). You can also have Unstructured generate a list of relationships between the entities that are recognized. This NER is done by using models offered through these providers:

GPT-4o, provided through OpenAI.
Claude 3.5 Sonnet, provided through Anthropic.

Here is an example of a list of recognized entities and their entity types, along with a list of relationships between those entities and their relationship types, using GPT-4o. Note specifically the entities field that is added to the metadata field.

{
    "type": "CompositeElement",
    "element_id": "bc8333ea0d374670ff0bd03c6126e70d",
    "text": "SECTION. 3\n\nThe Senate of the United States shall be composed of two Senators from each State, 
        [chosen by the Legislature there- of,]* for six Years; and each Senator shall have one Vote.\n\n
        Immediately after they shall be assembled in Consequence of the first Election, they shall be divided
        as equally as may be into three Classes. The Seats of the Senators of the first Class shall be vacated
        at the Expiration of the second Year, of the second Class at the Expiration of the fourth Year, and of
        the third Class at the Expiration of the sixth Year, so that one third may be chosen every second Year;
        [and if Vacan- cies happen by Resignation, or otherwise, during the Recess of the Legislature of any
        State, the Executive thereof may make temporary Appointments until the next Meeting of the Legislature,
        which shall then fill such Vacancies.]*\n\nC O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
    "metadata": {
        "filename": "constitution.pdf",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 2,
        "entities": {
            "items": [
                {
                    "entity": "Senate",
                    "type": "ORGANIZATION"
                },
                {
                    "entity": "United States",
                    "type": "LOCATION"
                },
                {
                    "entity": "Senator",
                    "type": "ROLE"
                },
                {
                    "entity": "State",
                    "type": "LOCATION"
                },
                {
                    "entity": "Legislature",
                    "type": "ORGANIZATION"
                },
                {
                    "entity": "Executive",
                    "type": "ROLE"
                },
                {
                    "entity": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
                    "type": "DOCUMENT"
                }
            ],
            "relationships": [
                {
                    "from": "Senate",
                    "relationship": "based_in",
                    "to": "United States"
                },
                {
                    "from": "Senator",
                    "relationship": "has_role",
                    "to": "Senate"
                },
                {
                    "from": "Legislature",
                    "relationship": "has_office_in",
                    "to": "State"
                },
                {
                    "from": "Executive",
                    "relationship": "has_role",
                    "to": "State"
                },
                {
                    "from": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
                    "relationship": "dated",
                    "to": "DATE"
                }
            ]
        }
    }
}

Here is another example of some of the entities, their entity types, and relationships that are recognized for a given paragraph of text. This information is generated by GPT-4o by OpenAI: Named entity recognition for information in a paragraph of text

Named entity recognition for information in a paragraph of text

By default, the following entity types are supported for NER:

PERSON
ORGANIZATION
LOCATION
DATE
TIME
EVENT
MONEY
PERCENT
FACILITY
PRODUCT
ROLE
DOCUMENT
DATASET

By default, the following entity relationships are supported for NER:

PERSON - ORGANIZATION: works_for, affiliated_with, founded
PERSON - LOCATION: born_in, lives_in, traveled_to
ORGANIZATION - LOCATION: based_in, has_office_in
Entity - DATE: occurred_on, founded_on, died_on, published_in
PERSON - PERSON: married_to, parent_of, colleague_of
PRODUCT - ORGANIZATION: developed_by, owned_by
EVENT - LOCATION: held_in, occurred_in
Entity - ROLE: has_title, acts_as, has_role
DATASET - PERSON: mentions
DATASET - DOCUMENT: located_in
PERSON - DATASET: published
DOCUMENT - DOCUMENT: referenced_in, contains
DOCUMENT - DATE: dated
PERSON - DOCUMENT: published

You can add, rename, or delete items in this list of default entity types and default entity relationship types. You can also add any clarifying instructions to the prompt that is used to run NER. To do this, see the next section.

Generate a list of entities and their relationships

To generate a list of recognized entities and their relationships, in an Enrichment node in a workflow, specify the following:

You can change a workflow’s NER settings only through Custom workflow settings.Entities are only recognized when the Partitioner node in a workflow is also set to use the High Res partitioning strategy. Learn more.

Select Text.
For Model, select either OpenAI (GPT-4o) or Anthropic (Claude 3.5 Sonnet).
The selected model will follow a default set of instructions (called a prompt) to perform NER using a set of predefined entity types and relationships. To experiment with running the default prompt against some sample data, click Edit, and then click Run Prompt. The selected Model uses the Prompt to run NER on the Input sample and shows the results in the Output. Look specifically at the response_json field for the entities that were recognized and their relationships.
To customize the prompt, change the contents of Prompt.
For best results, Unstructured strongly recommends that you limit your changes only to certain portions of the default prompt, specifically:
- Adding, renaming, or deleting items in the list of predefined types (such as PERSON, ORGANIZATION, LOCATION, and so on).
- Adding, renaming, or deleting items in the list of predefined relationships (such as works_for, based_in, has_role, and so on).
- As needed, adding any clarifying instructions only between these two lines:
  ... Provide the entities and their corresponding types as a structured JSON response. (Add any clarifying instructions here only.) [START OF TEXT] ...
- Changing any other portions of the default prompt could produce unexpected results.
To experiment with different data, change the contents of Input sample. For best results, Unstructured strongly recommends that the JSON structure in Input sample be preserved.
When you are satisfied with the Model and Prompt that you want to use, click Save.

Unstructured UI

Getting started with the UI

Using the UI

Concepts

Named entity recognition (NER)

Generate a list of entities and their relationships

Unstructured UI

Getting started with the UI

Using the UI

Concepts

​Generate a list of entities and their relationships

Generate a list of entities and their relationships