Trainable Span Classifier[source]

The eds.span_classifier component is a trainable attribute predictor. In this context, the span classification task consists in assigning values (boolean, strings or any object) to attributes/extensions of spans such as:

span._.negation,
span._.date.mode
span._.cui

In the rest of this page, we will refer to a pair of (attribute, value) as a "binding". For instance, the binding ("_.negation", True) means that the attribute negation of the span is (or should be, when predicted) set to True.

Architecture

The model performs span classification by:

Calling a word pooling embedding such as eds.span_pooler to compute a single embedding for each span
Computing logits for each possible binding using a linear layer
Splitting these bindings into groups of exclusive values such as
- event=start and event=stop
- negated=False and negated=True
Note that the above groups are not exclusive, but the values within each group are.
Applying the best scoring binding in each group to each span

Examples

To create a span classifier component, you can use the following code:

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(
    eds.span_classifier(
        # To embed the spans, we will use a span pooler
        embedding=eds.span_pooler(
            pooling_mode="mean",  # mean pooling
            # that will use a transformer to embed the doc words
            embedding=eds.transformer(
                model="prajjwal1/bert-tiny",
                window=128,
                stride=96,
            ),
        ),
        span_getter=["ents", "sc"],
        # For every span embedded by the span pooler
        # (doc.ents and doc.spans["sc"]), we will predict both
        # span._.negation and span._.event_type
        attributes=["_.negation", "_.event_type"],
    ),
    name="span_classifier",
)

To infer the values of the attributes, you can use the pipeline post_init method:

nlp.post_init(gold_data)

To train the model, refer to the Training tutorial.

You can inspect the bindings that will be used for training and prediction

print(nlp.pipes.attr.bindings)
# list of (attr name, span labels or True if all, values)
# Out: [
#   ('_.negation', True, [True, False]),
#   ('_.event_type', True, ['start', 'stop'])
# ]

You can also change these values and update the bindings by calling the update_bindings method. Don't forget to retrain the model if new values are added !

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object TYPE: `PipelineProtocol`
`name`	Name of the component TYPE: `str`
`embedding`	The word embedding component TYPE: `SpanEmbeddingComponent`
`label_weights`	The weight of each label for each attribute. The keys are the attribute names and the values are dictionaries with the labels as keys and the weights as values. For instance, `{"_.negation": {True: 1, False: 2}}` will give a weight of 1 to the `True` value of the `negation` attribute and 2 to the `False` value. DEFAULT: `None`
`span_getter`	How to extract the candidate spans and the attributes to predict or train on. TYPE: `SpanGetterArg` DEFAULT: `None`
`context_getter`	What context to use when computing the span embeddings (defaults to the whole document). This can be: a `SpanGetterArg` to retrieve contexts from a whole document. For example `{"section": "conclusion"}` to only use the conclusion as context (you must ensure that all spans produced by the `span_getter` argument do fall in the conclusion in this case) a callable, that gets a span and should return a context for this span. For instance, `lambda span: span.sent` to use the sentence as context. TYPE: `Optional[Union[Callable, SpanGetterArg]]` DEFAULT: `None`
`attributes`	The attributes to predict or train on. If a dict is given, keys are the attributes and values are the labels for which the attr is allowed, or True if the attr is allowed for all labels. TYPE: `AttributesArg` DEFAULT: `None`
`keep_none`	If False, skip spans for which a attr returns None. If True (default), the None values will be learned and predicted, just as any other value. TYPE: `bool` DEFAULT: `False`