Skip to content

edsnlp.metrics.ner

We provide several metrics to evaluate the performance of Named Entity Recognition (NER) components. Let's look at an example and see how they differ. We'll use the following two documents: a reference document (ref) and a document with predicted entities (pred).

pred

ref

La patiente a une fièvre aigüe

La patiente a une fièvre aigüe.

Let's create matching documents in EDS-NLP using the following code snippet:

from edsnlp.data.converters import MarkupToDocConverter

conv = MarkupToDocConverter(preset="md", span_setter="entities")

pred = conv("[La](PER) [patiente](PER) a une [fièvre aiguë](DIS).")
ref = conv("La [patiente](PER) a [une fièvre](DIS) aiguë.")

NerExactMetric

Bases: NerMetric

The eds.ner_exact metric scores the extracted entities (that may be overlapping or nested) by looking in the spans returned by a given SpanGetter object and comparing predicted spans to gold spans for exact boundary and label matches.

Let's view these elements as collections of (span → label) and count how many of the predicted spans match the gold spans exactly (and vice versa):

pred

ref

La
patiente
fièvre aiguë

patiente
une fièvre

Precision, Recall and F1 (micro-average and per‐label) are computed as follows:

  • Precision: p = |matched items of pred| / |pred|
  • Recall: r = |matched items of ref| / |ref|
  • F1: f = 2 / (1/p + 1/f)

Examples

from edsnlp.metrics.ner import NerExactMetric

metric = NerExactMetric(span_getter=conv.span_setter, micro_key="micro")
metric([ref], [pred])
# Out: {
#   'micro': {'f': 0.4, 'p': 0.33, 'r': 0.5, 'tp': 1, 'support': 2, 'positives': 3},
#   'PER': {'f': 0.67, 'p': 0.5, 'r': 1, 'tp': 1, 'support': 1, 'positives': 2},
#   'DIS': {'f': 0.0, 'p': 0.0, 'r': 0.0, 'tp': 0, 'support': 1, 'positives': 1},
# }

Parameters

PARAMETER DESCRIPTION
span_getter

The span getter to use to extract the spans from the document

TYPE: SpanGetterArg

micro_key

The key to use to store the micro-averaged results for spans of all types

TYPE: str DEFAULT: 'micro'

filter_expr

The filter expression to use to filter the documents. Evaluated with doc as the variable.

TYPE: Optional[str] DEFAULT: None

NerTokenMetric

Bases: NerMetric

The eds.ner_token metric scores the extracted entities that may be overlapping or nested by looking in doc.ents, and doc.spans, and comparing the predicted and gold entities at the token level.

Assuming we use the eds (or fr or en) tokenizer, in the above example, there are 3 annotated tokens in the reference, and 4 annotated tokens in the prediction. Let's view these elements as sets of (token, label) and count how many of the predicted tokens match the gold tokens exactly (and vice versa):

pred

ref

La
patiente
fièvre
aiguë

patiente
une
fièvre

Precision, Recall and F1 (micro-average and per‐label) are computed as follows:

  • Precision: p = |matched items of pred| / |pred|
  • Recall: r = |matched items of ref| / |ref|
  • F1: f = 2 / (1/p + 1/f)

Examples

from edsnlp.metrics.ner import NerTokenMetric

metric = NerTokenMetric(span_getter=conv.span_setter, micro_key="micro")
metric([ref], [pred])
# Out: {
#   'micro': {'f': 0.57, 'p': 0.5, 'r': 0.67, 'tp': 2, 'support': 3, 'positives': 4},
#   'PER': {'f': 0.67, 'p': 0.5, 'r': 1, 'tp': 1, 'support': 1, 'positives': 2},
#   'DIS': {'f': 0.5, 'p': 0.5, 'r': 0.5, 'tp': 1, 'support': 2, 'positives': 2}
# }

Parameters

PARAMETER DESCRIPTION
span_getter

The span getter to use to extract the spans from the document

TYPE: SpanGetterArg

micro_key

The key to use to store the micro-averaged results for spans of all types

TYPE: str DEFAULT: 'micro'

filter_expr

The filter expression to use to filter the documents. Will be evaluated with doc as the variable name, so you can use doc.ents, doc.spans, etc.

TYPE: Optional[str] DEFAULT: None

NerOverlapMetric

Bases: NerMetric

The eds.ner_overlap metric scores the extracted entities that may be overlapping or nested by looking in the spans returned by a given SpanGetter object and counting a prediction as correct if it overlaps by at least the given Dice‐coefficient threshold with a gold span of the same label.

This metric is useful for evaluating NER systems where the exact boundaries do not matter too much, but the presence of the entity at the same spot is important. For instance, you may not want to penalize a system that forgets determiners if the rest of the entity is correctly identified.

Let's view these elements as sets of (span → label) and count how many of the predicted spans match the gold spans by at least the given Dice coefficient (and vice versa):

pred

ref

La
patiente
fièvre aiguë

patiente
une fièvre

Precision, Recall and F1 (micro-average and per‐label) are computed as follows:

  • Precision: p = |matched items of pred| / |pred|
  • Recall: r = |matched items of ref| / |ref|
  • F1: f = 2 / (1/p + 1/f)

Overlap threshold

The threshold is the minimum Dice coefficient to consider two spans as overlapping. Setting it to 1.0 will yield the same results as the eds.ner_exact metric, while setting it to a near-zero value (e.g., like 1e-14) will match any two spans that share at least one token.

Examples

from edsnlp.metrics.ner import NerOverlapMetric

metric = NerOverlapMetric(
    span_getter=conv.span_setter, micro_key="micro", threshold=0.5
)
metric([ref], [pred])
# Out: {
#   'micro': {'f': 0.8, 'p': 0.67, 'r': 1.0, 'tp': 2, 'support': 2, 'positives': 3},
#   'PER': {'f': 0.67, 'p': 0.5, 'r': 1.0, 'tp': 1, 'support': 1, 'positives': 2},
#   'DIS': {'f': 1.0, 'p': 1.0, 'r': 1.0, 'tp': 1, 'support': 1, 'positives': 1}
# }

Parameters

PARAMETER DESCRIPTION
span_getter

The span getter to use to extract the spans from the document

TYPE: SpanGetterArg

micro_key

The key to use to store the micro-averaged results for spans of all types

TYPE: str DEFAULT: 'micro'

filter_expr

The filter expression to use to filter the documents

TYPE: Optional[str] DEFAULT: None

threshold

The threshold on the Dice coefficient to consider two spans as overlapping

TYPE: float DEFAULT: 0.5

dice [source]

Compute the Dice coefficient between two spans