Skip to content

edsnlp.pipelines.trainable.span_qualifier.factory

create_component(nlp, model, on_ents=None, on_span_groups=False, qualifiers=None, label_constraints=None, candidate_getter=None, name='span_qualifier', scorer=None)

Create a generic span classification component

PARAMETER DESCRIPTION
nlp

Spacy vocabulary

model

The model to extract the spans

TYPE: Model

name

Name of the component

TYPE: str DEFAULT: 'span_qualifier'

on_ents

Whether to look into doc.ents for spans to classify. If a list of strings is provided, only the span of the given labels will be considered. If None and on_span_groups is False, labels mentioned in label_constraints will be used, and all ents will be used if label_constraints is None.

TYPE: Optional[Union[bool, Sequence[str]]] DEFAULT: None

on_span_groups

Whether to look into doc.spans for spans to classify:

  • If True, all span groups will be considered
  • If False, no span group will be considered
  • If a list of str is provided, only these span groups will be kept
  • If a mapping is provided, the keys are the span group names and the values are either a list of allowed labels in the group or True to keep them all

TYPE: Union[bool, Sequence[str], Mapping[str, Union[bool, Sequence[str]]]] DEFAULT: False

qualifiers

The qualifiers to predict or train on. If None, keys from the label_constraints will be used

TYPE: Optional[Sequence[str]] DEFAULT: None

label_constraints

Constraints to select qualifiers for each span depending on their labels. Keys of the dict are the qualifiers and values are the labels for which the qualifier is allowed. If None, all qualifiers will be used for all spans

TYPE: Optional[Dict[str, List[str]]] DEFAULT: None

candidate_getter

Optional method to call to extract the candidate spans and the qualifiers to predict or train on. If None, a candidate getter will be created from the other parameters: on_ents, on_span_groups, qualifiers and label_constraints.

TYPE: Optional[Callable[[Doc], Tuple[Spans, Optional[Spans], SpanGroups, List[List[str]]]]] DEFAULT: None

scorer

Optional method to call to score predictions

TYPE: Optional[Callable] DEFAULT: None

Source code in edsnlp/pipelines/trainable/span_qualifier/factory.py
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
@Language.factory(
    "eds.span_qualifier",
    default_config=SPAN_QUALIFIER_DEFAULTS,
    requires=["doc.ents", "doc.spans"],
    assigns=["doc.ents", "doc.spans"],
    default_score_weights={
        "qual_f": 1.0,
    },
)
def create_component(
    nlp,
    model: Model,
    on_ents: Optional[Union[bool, Sequence[str]]] = None,
    on_span_groups: Union[
        bool, Sequence[str], Mapping[str, Union[bool, Sequence[str]]]
    ] = False,
    qualifiers: Optional[Sequence[str]] = None,
    label_constraints: Optional[Dict[str, List[str]]] = None,
    candidate_getter: Optional[
        Callable[[Doc], Tuple[Spans, Optional[Spans], SpanGroups, List[List[str]]]]
    ] = None,
    name: str = "span_qualifier",
    scorer: Optional[Callable] = None,
) -> TrainableSpanQualifier:
    """
    Create a generic span classification component

    Parameters
    ----------
    nlp: Language
        Spacy vocabulary
    model: Model
        The model to extract the spans
    name: str
        Name of the component
    on_ents: Union[bool, Sequence[str]]
        Whether to look into `doc.ents` for spans to classify. If a list of strings
        is provided, only the span of the given labels will be considered. If None
        and `on_span_groups` is False, labels mentioned in `label_constraints`
        will be used, and all ents will be used if `label_constraints` is None.
    on_span_groups: Union[bool, Sequence[str], Mapping[str, Sequence[str]]]
        Whether to look into `doc.spans` for spans to classify:

        - If True, all span groups will be considered
        - If False, no span group will be considered
        - If a list of str is provided, only these span groups will be kept
        - If a mapping is provided, the keys are the span group names and the values
          are either a list of allowed labels in the group or True to keep them all
    qualifiers: Optional[Sequence[str]]
        The qualifiers to predict or train on. If None, keys from the
        `label_constraints` will be used
    label_constraints: Optional[Dict[str, List[str]]]
        Constraints to select qualifiers for each span depending on their labels.
        Keys of the dict are the qualifiers and values are the labels for which
        the qualifier is allowed. If None, all qualifiers will be used for all spans
    candidate_getter: Optional[Callable[[Doc], Tuple[Spans, Optional[Spans], SpanGroups, List[List[str]]]]]
        Optional method to call to extract the candidate spans and the qualifiers
        to predict or train on. If None, a candidate getter will be created from
        the other parameters: `on_ents`, `on_span_groups`, `qualifiers` and
        `label_constraints`.
    scorer: Optional[Callable]
        Optional method to call to score predictions
    """  # noqa: E501
    do_make_candidate_getter = (
        on_ents or on_span_groups or qualifiers or label_constraints
    )
    if (candidate_getter is not None) == do_make_candidate_getter:
        raise ValueError(
            "You must either provide a candidate getter or the parameters to "
            "make one, but not both."
        )
    if do_make_candidate_getter:
        candidate_getter = create_candidate_getter(
            on_ents=on_ents,
            on_span_groups=on_span_groups,
            qualifiers=qualifiers,
            label_constraints=label_constraints,
        )

    return TrainableSpanQualifier(
        vocab=nlp.vocab,
        model=model,
        candidate_getter=candidate_getter,
        name=name,
        scorer=scorer,
    )