edsnlp.utils.span_getters
SpanSetterArg
Bases: Validated
Valid values for the span_setter
argument of a component can be :
- a (doc, matches) -> None callable
- a span group name
- a list of span group names
- a dict of group name to True or list of labels
The group name "ents"
is a special case, and will add the matches to doc.ents
Examples
span_setter=["ents", "ckd"]
will add the matches to bothdoc.ents
anddoc.spans["ckd"]
. It is equivalent to{"ents": True, "ckd": True}
.span_setter={"ents": ["foo", "bar"]}
will add the matches with label "foo" and "bar" todoc.ents
.span_setter="ents"
will add all matches only todoc.ents
.span_setter="ckd"
will add all matches only todoc.spans["ckd"]
.
SpanGetterArg
Bases: Validated
Valid values for the span_getter
argument of a component can be :
- a (doc) -> spans callable
- a span group name
- a list of span group names
- a dict of group name to True or list of labels
The group name "ents"
is a special case, and will get the matches from doc.ents
Examples
span_getter=["ents", "ckd"]
will get the matches from bothdoc.ents
anddoc.spans["ckd"]
. It is equivalent to{"ents": True, "ckd": True}
.span_getter={"ents": ["foo", "bar"]}
will get the matches with label "foo" and "bar" fromdoc.ents
.span_getter="ents"
will get all matches fromdoc.ents
.span_getter="ckd"
will get all matches fromdoc.spans["ckd"]
.
make_span_context_getter
Create a span context getter.
Parameters
PARAMETER | DESCRIPTION |
---|---|
context_words | Minimum number of words to include on each side of the span. It could be asymmetric. For example (5,2) will include 5 words before the start of the span and 2 after the end of the span TYPE: |
context_sents |
] = 1 Minimum number of sentences to include on each side of the span:
TYPE: |
By default, 0 if the document has no sentence annotations, 1 otherwise.
ContextWindow
[source]
Bases: Validated
, ABC
A ContextWindow specifies how much additional context (such as sentences or words) should be included relative to an anchor span. For example, one might define a context window that extracts the sentence immediately preceding and following the anchor span, or one that extends the span by a given number of words before and after.
ContextWindow objects can be combined using logical operations to create more complex context windows. For example, one can create a context window that includes either words from a -10 to +10 range or words from the sentence.
Examples
from confit import validate_arguments
from spacy.tokens import Span
import edsnlp
from edsnlp.utils.span_getters import ContextWindow
@validate_arguments
def apply_context(span: Span, ctx: ContextWindow):
# ctx will be parsed and cast as a ContextWindow
return ctx(span)
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
doc = nlp("A first sentence. A second sentence, longer this time. A third.")
span = doc[5:6] # "second"
# Will return a span with the 10 words before and after the span
# and words of the current sentence and the next sentence.
apply_context(span, "words[-3:3] | sents[0:1]").text
# Out: "sentence. A second sentence, longer this time. A third."
# Will return the span covering at most the -5 and +5 words
# around the span and the current sentence of the span.
apply_context(span, "words[-4:4] & sent").text
# Out: "A second sentence, longer this"
Indexing
Unlike standard Python sequence slicing, sents[0:0]
returns the current sentence, not an empty span.