Trainable Span Classifier[source]
The eds.span_classifier
component is a trainable attribute predictor. In this context, the span classification task consists in assigning values (boolean, strings or any object) to attributes/extensions of spans such as:
span._.negation
,span._.date.mode
span._.cui
In the rest of this page, we will refer to a pair of (attribute, value) as a "binding". For instance, the binding ("_.negation", True)
means that the attribute negation
of the span is (or should be, when predicted) set to True
.
Architecture
The model performs span classification by:
- Calling a word pooling embedding such as
eds.span_pooler
to compute a single embedding for each span - Computing logits for each possible binding using a linear layer
-
Splitting these bindings into groups of exclusive values such as
event=start
andevent=stop
negated=False
andnegated=True
Note that the above groups are not exclusive, but the values within each group are.
-
Applying the best scoring binding in each group to each span
Examples
To create a span classifier component, you can use the following code:
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(
eds.span_classifier(
# To embed the spans, we will use a span pooler
embedding=eds.span_pooler(
pooling_mode="mean", # mean pooling
# that will use a transformer to embed the doc words
embedding=eds.transformer(
model="prajjwal1/bert-tiny",
window=128,
stride=96,
),
),
span_getter=["ents", "sc"],
# For every span embedded by the span pooler
# (doc.ents and doc.spans["sc"]), we will predict both
# span._.negation and span._.event_type
attributes=["_.negation", "_.event_type"],
),
name="span_classifier",
)
To infer the values of the attributes, you can use the pipeline post_init
method:
nlp.post_init(gold_data)
To train the model, refer to the Training tutorial.
You can inspect the bindings that will be used for training and prediction
print(nlp.pipes.attr.bindings)
# list of (attr name, span labels or True if all, values)
# Out: [
# ('_.negation', True, [True, False]),
# ('_.event_type', True, ['start', 'stop'])
# ]
You can also change these values and update the bindings by calling the update_bindings
method. Don't forget to retrain the model if new values are added !
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | Name of the component TYPE: |
embedding | The word embedding component TYPE: |
label_weights | The weight of each label for each attribute. The keys are the attribute names and the values are dictionaries with the labels as keys and the weights as values. For instance, TYPE: |
span_getter | How to extract the candidate spans and the attributes to predict or train on. TYPE: |
context_getter | What context to use when computing the span embeddings (defaults to the whole document). This can be:
TYPE: |
attributes | The attributes to predict or train on. If a dict is given, keys are the attributes and values are the labels for which the attr is allowed, or True if the attr is allowed for all labels. TYPE: |
keep_none | If False, skip spans for which a attr returns None. If True (default), the None values will be learned and predicted, just as any other value. TYPE: |