Skip to content

Trainable Span Qualifier

The eds.span_qualifier component is a trainable qualifier predictor. In EDS-NLP, we call span attributes "qualifiers". In this context, the span qualification task consists in assigning values (boolean, strings or any complex object) to attributes/extensions of spans such as:

  • span._.negation,
  • span._.date.mode
  • span._.cui

In the rest of this page, we will refer to a pair of (qualifier, value) as a "binding". For instance, the binding ("_.negation", True) means that the qualifier negation of the span is (or should be, when predicted) set to True.

Architecture

The model performs span classification by:

  1. Calling a word pooling embedding such as eds.span_pooler to compute a single embedding for each span
  2. Computing logits for each possible binding using a linear layer
  3. Splitting these bindings into groups of exclusive values such as

    • event=start and event=stop
    • negated=False and negated=True

    Note that the above groups are not exclusive, but the values within each group are.

  4. Applying the best scoring binding in each group to each span

Examples

To create a span qualifier component, you can use the following code:

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(
    eds.span_qualifier(
        # To embed the spans, we will use a span pooler
        embedding=eds.span_pooler(
            pooling_mode="mean",  # mean pooling
            span_getter=["ents", "sc"],
            # that will use a transformer to embed the doc words
            embedding=eds.transformer(
                model="prajjwal1/bert-tiny",
                window=128,
                stride=96,
            ),
        ),
        # For every span embedded by the span pooler
        # (doc.ents and doc.spans["sc"]), we will predict both
        # span._.negation and span._.event_type
        qualifiers=["_.negation", "_.event_type"],
    ),
    name="qualifier",
)

To infer the values of the qualifiers, you can use the pipeline post_init method:

nlp.post_init(gold_data)

To train the model, refer to the Training tutorial.

You can inspect the bindings that will be used for training and prediction

print(nlp.pipes.qualifier.bindings)
# list of (qualifier name, span labels or True if all, values)
# Out: [
#   ('_.negation', True, [True, False]),
#   ('_.event_type', True, ['start', 'stop'])
# ]

You can also change these values and update the bindings by calling the update_bindings method. Don't forget to retrain the model if new values are added !

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object

TYPE: PipelineProtocol

name

Name of the component

TYPE: str

embedding

The word embedding component

TYPE: SpanEmbeddingComponent

qualifiers

The qualifiers to predict or train on. If a dict is given, keys are the qualifiers and values are the labels for which the qualifier is allowed, or True if the qualifier is allowed for all labels.

TYPE: QualifiersArg

keep_none

If False, skip spans for which a qualifier returns None. If True (default), the None values will be learned and predicted, just as any other value.

TYPE: bool DEFAULT: False