Trainable Span Qualifier
The eds.span_qualifier
component is a trainable qualifier, predictor of span attributes. In this context, the span qualification task consists in assigning values (boolean, strings or any complex object) to attributes/extensions of spans such as:
span.label_
,span._.negation
,span._.date.mode
- etc.
Architecture
The underlying eds.span_multilabel_classifier.v1
model performs span classification by:
- Pooling the words embedding (
mean
,max
orsum
) into a single embedding per span - Computing logits for each possible binding (i.e. qualifier-value assignment)
-
Splitting these bindings into independent groups such as
event=start
andevent=stop
negated=False
andnegated=True
-
Learning or predicting a combination amongst legal combination of these bindings. For instance in the second group, we can't have both
negated=True
andnegated=False
so the combinations are[(1, 0), (0, 1)]
- Assigning bindings on spans depending on the predicted results
Step by step
Initialization
During the initialization of the pipeline, the span_qualifier
component will gather all spans that match on_ents
and on_span_groups
patterns (or candidate_getter
function). It will then list all possible values for each qualifier
of the qualifiers
list and store every possible (qualifier, value) pair (i.e. binding).
For instance, a custom qualifier negation
with possible values True
and False
will result in the following bindings [("_.negation", True), ("_.negation", False)]
, while a custom qualifier event
with possible values start
, stop
, and start-stop
will result in the following bindings [("_.event", "start"), ("_.event", "stop"), ("_.event", "start-stop")]
.
Training
During training, the span_qualifier
component will gather spans on the documents in a mini-batch and evaluate each binding on each span to build a supervision matrix. This matrix will be feed it to the underlying model (most likely a eds.span_multilabel_classifier.v1
). The model will compute logits for each entry of the matrix and compute a cross-entropy loss for each group of bindings sharing the same qualifier.
Prediction
During prediction, the span_qualifier
component will gather spans on a given document and evaluate each binding on each span using the underlying model. Using the same binding exclusion and label constraint mechanisms as during training, scores will be computed for each binding and the best legal combination of bindings will be selected. Finally, the selected bindings will be assigned to the spans.
Examples
Let us define the pipeline and train it. We provide utils to train the model using an API, but you can use a spaCy's config file as well.
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(
"eds.transformer",
name="transformer",
config=dict(
model="prajjwal1/bert-tiny",
window=128,
stride=96,
),
)
nlp.add_pipe(
"eds.span_qualifier",
name="qualifier",
config={
"embedding": {
"@factory": "eds.span_pooler",
"embedding": nlp.get_pipe("transformer"),
"span_getter": ["ents", "sc"],
},
"qualifiers": ["_.negation", "_.event_type"],
},
)
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | Spacy vocabulary
|
name | Name of the component
|
embedding | The word embedding component TYPE: |
qualifiers | The qualifiers to predict or train on. If a dict is given, keys are the qualifiers and values are the labels for which the qualifier is allowed, or True if the qualifier is allowed for all labels.
|
keep_none | If False, skip spans for which a qualifier returns None. If True (default), the None values will be learned and predicted, just as any other value. DEFAULT: |