Trainable Span Qualifier

The eds.span_qualifier component is a trainable qualifier, predictor of span attributes. In this context, the span qualification task consists in assigning values (boolean, strings or any complex object) to attributes/extensions of spans such as:

span.label_,
span._.negation,
span._.date.mode
etc.

Architecture

The underlying eds.span_multilabel_classifier.v1 model performs span classification by:

Pooling the words embedding (mean, max or sum) into a single embedding per span
Computing logits for each possible binding (i.e. qualifier-value assignment)
Splitting these bindings into independent groups such as
- event=start and event=stop
- negated=False and negated=True
Learning or predicting a combination amongst legal combination of these bindings. For instance in the second group, we can't have both negated=True and negated=False so the combinations are [(1, 0), (0, 1)]
Assigning bindings on spans depending on the predicted results

Step by step

Initialization

During the initialization of the pipeline, the span_qualifier component will gather all spans that match on_ents and on_span_groups patterns (or candidate_getter function). It will then list all possible values for each qualifier of the qualifiers list and store every possible (qualifier, value) pair (i.e. binding).

For instance, a custom qualifier negation with possible values True and False will result in the following bindings [("_.negation", True), ("_.negation", False)], while a custom qualifier event with possible values start, stop, and start-stop will result in the following bindings [("_.event", "start"), ("_.event", "stop"), ("_.event", "start-stop")].

Training

During training, the span_qualifier component will gather spans on the documents in a mini-batch and evaluate each binding on each span to build a supervision matrix. This matrix will be feed it to the underlying model (most likely a eds.span_multilabel_classifier.v1). The model will compute logits for each entry of the matrix and compute a cross-entropy loss for each group of bindings sharing the same qualifier.

Prediction

During prediction, the span_qualifier component will gather spans on a given document and evaluate each binding on each span using the underlying model. Using the same binding exclusion and label constraint mechanisms as during training, scores will be computed for each binding and the best legal combination of bindings will be selected. Finally, the selected bindings will be assigned to the spans.

Examples

Let us define the pipeline and train it. We provide utils to train the model using an API, but you can use a spaCy's config file as well.

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(
    "eds.transformer",
    name="transformer",
    config=dict(
        model="prajjwal1/bert-tiny",
        window=128,
        stride=96,
    ),
)
nlp.add_pipe(
    "eds.span_qualifier",
    name="qualifier",
    config={
        "embedding": {
            "@factory": "eds.span_pooler",
            "embedding": nlp.get_pipe("transformer"),
            "span_getter": ["ents", "sc"],
        },
        "qualifiers": ["_.negation", "_.event_type"],
    },
)

Parameters

PARAMETER	DESCRIPTION
`nlp`	Spacy vocabulary
`name`	Name of the component
`embedding`	The word embedding component TYPE: `SpanEmbeddingComponent`
`qualifiers`	The qualifiers to predict or train on. If a dict is given, keys are the qualifiers and values are the labels for which the qualifier is allowed, or True if the qualifier is allowed for all labels.
`keep_none`	If False, skip spans for which a qualifier returns None. If True (default), the None values will be learned and predicted, just as any other value. DEFAULT: `False`