Named entity recognition (NER)
After partitioning and chunking, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as named entity recognition (NER).
This NER is done by using models offered through these providers:
- GPT-4o, provided through OpenAI.
- Claude 3.5 Sonnet, provided through Anthropic.
Here is an example of a list of recognized entities and their types using GPT-4o. Note specifically the entities
field that is added.
Generate a list of entities and their types
To generate a list of recognized entities and their types, in the Task drop-down list of an Enrichment node in a workflow, specify the following:
You can change a workflow’s NER settings only through Custom workflow settings.
Entities are only recognized when the Partitioner node in a workflow is also set to use the High Res partitioning strategy. Learn more.
-
Select Named Entity Recognition (NER). By default, OpenAI’s GPT-4o will follow a default set of instructions (called a prompt) to perform NER using a set of predefined entity types.
-
To use Anthropic’s Claude 3.5 Sonnet to perform NER instead, or to customize the prompt, click Edit.
-
To switch to using Anthropic’s Claude 3.5 Sonnet, click Anthropic (Claude 3.5 Sonnet).
-
To experiment with running the default prompt against some sample data, click Run Prompt. The selected Model uses the Prompt to run NER on the Input sample and shows the results in the Output. Look specifically at the
response_json
field for the entities that were recognized and their types. -
To customize the prompt, change the contents of Prompt.
For best results, Unstructured strongly recommends that you limit your changes only to certain portions of the default prompt, specifically:
-
Adding, renaming, or deleting items in the list of predefined types (such as
PERSON
,ORGANIZATION
,LOCATION
, and so on). -
As needed, adding any clarifying instructions only between these two lines:
-
Changing any other portions of the default prompt could produce unexpected results.
-
-
To experiment with different data, change the contents of Input sample. For best results, Unstructured strongly recommends that the JSON structure in Input sample be preserved.
-
When you are satisfied with the Model and Prompt that you want to use, click Save.
Was this page helpful?