Trainable Span Classifier
The eds.span_classifier
component is a trainable attribute predictor. In this context, the span classification task consists in assigning values (boolean, strings or any object) to attributes/extensions of spans such as:
span._.negation
,span._.date.mode
span._.cui
In the rest of this page, we will refer to a pair of (attribute, value) as a "binding". For instance, the binding ("_.negation", True)
means that the attribute negation
of the span is (or should be, when predicted) set to True
.
Architecture
The model performs span classification by:
- Calling a word pooling embedding such as
eds.span_pooler
to compute a single embedding for each span - Computing logits for each possible binding using a linear layer
-
Splitting these bindings into groups of exclusive values such as
event=start
andevent=stop
negated=False
andnegated=True
Note that the above groups are not exclusive, but the values within each group are.
-
Applying the best scoring binding in each group to each span
Examples
To create a span classifier component, you can use the following code:
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(
eds.span_classifier(
# To embed the spans, we will use a span pooler
embedding=eds.span_pooler(
pooling_mode="mean", # mean pooling
# that will use a transformer to embed the doc words
embedding=eds.transformer(
model="prajjwal1/bert-tiny",
window=128,
stride=96,
),
),
span_getter=["ents", "sc"],
# For every span embedded by the span pooler
# (doc.ents and doc.spans["sc"]), we will predict both
# span._.negation and span._.event_type
attributes=["_.negation", "_.event_type"],
),
name="span_classifier",
)
To infer the values of the attributes, you can use the pipeline post_init
method:
nlp.post_init(gold_data)
To train the model, refer to the Training tutorial.
You can inspect the bindings that will be used for training and prediction
print(nlp.pipes.attr.bindings)
# list of (attr name, span labels or True if all, values)
# Out: [
# ('_.negation', True, [True, False]),
# ('_.event_type', True, ['start', 'stop'])
# ]
You can also change these values and update the bindings by calling the update_bindings
method. Don't forget to retrain the model if new values are added !
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | Name of the component TYPE: |
embedding | The word embedding component TYPE: |
span_getter | How to extract the candidate spans and the attributes to predict or train on. TYPE: |
context_getter | What context to use when computing the span embeddings (defaults to the whole document). This can be:
TYPE: |
attributes | The attributes to predict or train on. If a dict is given, keys are the attributes and values are the labels for which the attr is allowed, or True if the attr is allowed for all labels. TYPE: |
keep_none | If False, skip spans for which a attr returns None. If True (default), the None values will be learned and predicted, just as any other value. TYPE: |