Trainable Span Classifier[source]

The eds.span_classifier component is a trainable attribute predictor. In this context, the span classification task consists in assigning values (boolean, strings or any object) to attributes/extensions of spans such as:

span._.negation,
span._.date.mode
span._.cui

In the rest of this page, we will refer to a pair of (attribute, value) as a "binding". For instance, the binding ("_.negation", True) means that the attribute negation of the span is (or should be, when predicted) set to True.

Architecture

The model performs span classification by:

Calling a word pooling embedding such as eds.span_pooler to compute a single embedding for each span
Computing logits for each possible binding using a linear layer
Splitting these bindings into groups of exclusive values such as
- event=start and event=stop
- negated=False and negated=True
Note that the above groups are not exclusive, but the values within each group are.
Applying the best scoring binding in each group to each span

Examples

To create a span classifier component, you can use the following code:

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(
    eds.span_classifier(
        # To embed the spans, we will use a span pooler
        embedding=eds.span_pooler(
            pooling_mode="mean",  # mean pooling
            # that will use a transformer to embed the doc words
            embedding=eds.transformer(
                model="prajjwal1/bert-tiny",
                window=128,
                stride=96,
            ),
        ),
        span_getter=["ents", "sc"],
        # For every span embedded by the span pooler
        # (doc.ents and doc.spans["sc"]), we will predict both
        # span._.negation and span._.event_type
        attributes=["_.negation", "_.event_type"],
    ),
    name="span_classifier",
)

To infer the values of the attributes, you can use the pipeline post_init method:

nlp.post_init(gold_data)

To train the model, refer to the Training tutorial.

You can inspect the bindings that will be used for training and prediction

print(nlp.pipes.attr.bindings)
# list of (attr name, span labels or True if all, values)
# Out: [
#   ('_.negation', True, [True, False]),
#   ('_.event_type', True, ['start', 'stop'])
# ]

You can also change these values and update the bindings by calling the update_bindings method. Don't forget to retrain the model if new values are added !

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object TYPE: `PipelineProtocol`
`name`	Name of the component TYPE: `str`
`embedding`	The word embedding component TYPE: `SpanEmbeddingComponent`
`span_getter`	How to extract the candidate spans and the attributes to predict or train on. TYPE: `SpanGetterArg` DEFAULT: `None`
`context_getter`	What context to use when computing the span embeddings (defaults to the whole document). This can be: a `SpanGetterArg` to retrieve contexts from a whole document. For example `{"section": "conclusion"}` to only use the conclusion as context (you must ensure that all spans produced by the `span_getter` argument do fall in the conclusion in this case) a callable, that gets a span and should return a context for this span. For instance, `lambda span: span.sent` to use the sentence as context. TYPE: `Optional[Union[Callable, SpanGetterArg]]` DEFAULT: `None`
`attributes`	The attributes to predict or train on. If a dict is given, keys are the attributes and values are the labels for which the attr is allowed, or True if the attr is allowed for all labels. TYPE: `AttributesArg` DEFAULT: `None`
`keep_none`	If False, skip spans for which a attr returns None. If True (default), the None values will be learned and predicted, just as any other value. TYPE: `bool` DEFAULT: `False`