Skip to content

edsnlp.utils.span_getters

SpanSetterArg

Bases: Validated

Valid values for the span_setter argument of a component can be :

  • a (doc, matches) -> None callable
  • a span group name
  • a list of span group names
  • a dict of group name to True or list of labels

The group name "ents" is a special case, and will add the matches to doc.ents

Examples

  • span_setter=["ents", "ckd"] will add the matches to both doc.ents and doc.spans["ckd"]. It is equivalent to {"ents": True, "ckd": True}.
  • span_setter={"ents": ["foo", "bar"]} will add the matches with label "foo" and "bar" to doc.ents.
  • span_setter="ents" will add all matches only to doc.ents.
  • span_setter="ckd" will add all matches only to doc.spans["ckd"].

SpanGetterArg

Bases: Validated

Valid values for the span_getter argument of a component can be :

  • a (doc) -> spans callable
  • a span group name
  • a list of span group names
  • a dict of group name to True or list of labels

The group name "ents" is a special case, and will get the matches from doc.ents

Examples

  • span_getter=["ents", "ckd"] will get the matches from both doc.ents and doc.spans["ckd"]. It is equivalent to {"ents": True, "ckd": True}.
  • span_getter={"ents": ["foo", "bar"]} will get the matches with label "foo" and "bar" from doc.ents.
  • span_getter="ents" will get all matches from doc.ents.
  • span_getter="ckd" will get all matches from doc.spans["ckd"].

make_span_context_getter

Create a span context getter.

Parameters

PARAMETER DESCRIPTION
context_words

Minimum number of words to include on each side of the span. It could be asymmetric. For example (5,2) will include 5 words before the start of the span and 2 after the end of the span

TYPE: Union[NonNegativeInt, Tuple[NonNegativeInt, NonNegativeInt]]

context_sents
Union[NonNegativeInt, Tuple[NonNegativeInt, NonNegativeInt]]

] = 1 Minimum number of sentences to include on each side of the span:

  • 0: don't use sentences to build the context.
  • 1: include the sentence of the span.
  • n: include n-1 sentences on each side of the span + the sentence of the span

TYPE: Optional[

By default, 0 if the document has no sentence annotations, 1 otherwise.

ContextWindow [source]

Bases: Validated, ABC

A ContextWindow specifies how much additional context (such as sentences or words) should be included relative to an anchor span. For example, one might define a context window that extracts the sentence immediately preceding and following the anchor span, or one that extends the span by a given number of words before and after.

ContextWindow objects can be combined using logical operations to create more complex context windows. For example, one can create a context window that includes either words from a -10 to +10 range or words from the sentence.

Examples

from confit import validate_arguments
from spacy.tokens import Span

import edsnlp
from edsnlp.utils.span_getters import ContextWindow


@validate_arguments
def apply_context(span: Span, ctx: ContextWindow):
    # ctx will be parsed and cast as a ContextWindow
    return ctx(span)


nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")

doc = nlp("A first sentence. A second sentence, longer this time. A third.")
span = doc[5:6]  # "second"

# Will return a span with the 10 words before and after the span
# and words of the current sentence and the next sentence.
apply_context(span, "words[-3:3] | sents[0:1]").text
# Out: "sentence. A second sentence, longer this time. A third."

# Will return the span covering at most the -5 and +5 words
# around the span and the current sentence of the span.
apply_context(span, "words[-4:4] & sent").text
# Out: "A second sentence, longer this"

Indexing

Unlike standard Python sequence slicing, sents[0:0] returns the current sentence, not an empty span.