Skip to content

edsnlp.utils.span_getters

SpanSetterArg

Valid values for the span_setter argument of a component can be :

  • a (doc, matches) -> None callable
  • a span group name
  • a list of span group names
  • a dict of group name to True or list of labels

The group name "ents" is a special case, and will add the matches to doc.ents

Examples

  • span_setter=["ents", "ckd"] will add the matches to both doc.ents and doc.spans["ckd"]. It is equivalent to {"ents": True, "ckd": True}.
  • span_setter={"ents": ["foo", "bar"]} will add the matches with label "foo" and "bar" to doc.ents.
  • span_setter="ents" will add all matches only to doc.ents.
  • span_setter="ckd" will add all matches only to doc.spans["ckd"].

SpanGetterArg

Valid values for the span_getter argument of a component can be :

  • a (doc) -> spans callable
  • a span group name
  • a list of span group names
  • a dict of group name to True or list of labels

The group name "ents" is a special case, and will get the matches from doc.ents

Examples

  • span_getter=["ents", "ckd"] will get the matches from both doc.ents and doc.spans["ckd"]. It is equivalent to {"ents": True, "ckd": True}.
  • span_getter={"ents": ["foo", "bar"]} will get the matches with label "foo" and "bar" from doc.ents.
  • span_getter="ents" will get all matches from doc.ents.
  • span_getter="ckd" will get all matches from doc.spans["ckd"].

make_span_context_getter

Create a span context getter.

Parameters

PARAMETER DESCRIPTION
span_getter

Span getter, i.e. for which spans to get the context.

TYPE: SpanGetterArg

context_words

Minimum number of words to include on each side of the span.

TYPE: NonNegativeInt DEFAULT: 0

context_sents

Minimum number of sentences to include on each side of the span:

  • 0: don't use sentences to build the context.
  • 1: include the sentence of the span.
  • n: include n sentences on each side of the span.

By default, 0 if the document has no sentence annotations, 1 otherwise.

TYPE: Optional[NonNegativeInt] DEFAULT: None

overlap_policy

How to handle overlapping spans:

  • "filter": remove overlapping spans.
  • "merge": merge overlapping spans

TYPE: Literal['filter', 'merge'] DEFAULT: 'merge'

merge_spans

Merge overlapping spans into a single span.

Parameters

PARAMETER DESCRIPTION
spans

List of spans to merge.

TYPE: List[Span]

doc

Document to merge the spans on.

TYPE: Doc

RETURNS DESCRIPTION
List[Span]

Merged spans.