Skip to content

Trainable Span Classifier

The eds.span_classifier component is a trainable attribute predictor. In this context, the span classification task consists in assigning values (boolean, strings or any object) to attributes/extensions of spans such as:

  • span._.negation,
  • span._.cui

In the rest of this page, we will refer to a pair of (attribute, value) as a "binding". For instance, the binding ("_.negation", True) means that the attribute negation of the span is (or should be, when predicted) set to True.


The model performs span classification by:

  1. Calling a word pooling embedding such as eds.span_pooler to compute a single embedding for each span
  2. Computing logits for each possible binding using a linear layer
  3. Splitting these bindings into groups of exclusive values such as

    • event=start and event=stop
    • negated=False and negated=True

    Note that the above groups are not exclusive, but the values within each group are.

  4. Applying the best scoring binding in each group to each span


To create a span classifier component, you can use the following code:

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
        # To embed the spans, we will use a span pooler
            pooling_mode="mean",  # mean pooling
            # that will use a transformer to embed the doc words
        span_getter=["ents", "sc"],
        # For every span embedded by the span pooler
        # (doc.ents and doc.spans["sc"]), we will predict both
        # span._.negation and span._.event_type
        attributes=["_.negation", "_.event_type"],

To infer the values of the attributes, you can use the pipeline post_init method:


To train the model, refer to the Training tutorial.

You can inspect the bindings that will be used for training and prediction

# list of (attr name, span labels or True if all, values)
# Out: [
#   ('_.negation', True, [True, False]),
#   ('_.event_type', True, ['start', 'stop'])
# ]

You can also change these values and update the bindings by calling the update_bindings method. Don't forget to retrain the model if new values are added !



The pipeline object

TYPE: PipelineProtocol


Name of the component

TYPE: str


The word embedding component

TYPE: SpanEmbeddingComponent


How to extract the candidate spans and the attributes to predict or train on.

TYPE: SpanGetterArg DEFAULT: {'ents': True}


What context to use when computing the span embeddings (defaults to the whole document). This can be:

  • a SpanGetterArg to retrieve contexts from a whole document. For example {"section": "conclusion"} to only use the conclusion as context (you must ensure that all spans produced by the span_getter argument do fall in the conclusion in this case)
  • a callable, that gets a span and should return a context for this span. For instance, lambda span: span.sent to use the sentence as context.

TYPE: Optional[Union[Callable, SpanGetterArg]] DEFAULT: None


The attributes to predict or train on. If a dict is given, keys are the attributes and values are the labels for which the attr is allowed, or True if the attr is allowed for all labels.

TYPE: AttributesArg DEFAULT: None


If False, skip spans for which a attr returns None. If True (default), the None values will be learned and predicted, just as any other value.

TYPE: bool DEFAULT: False