Skip to content

edsnlp.pipelines.core.matcher.factory

create_component(nlp, name='eds.matcher', terms=None, attr=None, regex='TEXT', ignore_excluded=False, ignore_space_tokens=False, term_matcher=GenericTermMatcher.exact, term_matcher_config={})

Provides a generic matcher component.

PARAMETER DESCRIPTION
nlp

The spaCy object.

TYPE: Language

name

The name of the component.

TYPE: str DEFAULT: 'eds.matcher'

terms

A dictionary of terms.

TYPE: Optional[Patterns] DEFAULT: None

regex

A dictionary of regular expressions.

TYPE: Optional[Patterns] DEFAULT: 'TEXT'

attr

The default attribute to use for matching. Can be overridden using the terms and regex configurations.

TYPE: str DEFAULT: None

ignore_excluded

Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens).

TYPE: bool DEFAULT: False

ignore_space_tokens

Whether to skip space tokens during matching.

You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of eds.normalizer is enabled (by default).

TYPE: bool DEFAULT: False

term_matcher

The matcher to use for matching phrases ? One of (exact, simstring)

TYPE: GenericTermMatcher DEFAULT: GenericTermMatcher.exact

term_matcher_config

Parameters of the matcher class

TYPE: Dict[str, Any] DEFAULT: {}

Source code in edsnlp/pipelines/core/matcher/factory.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
@deprecated_factory(
    "matcher",
    "eds.matcher",
    default_config=DEFAULT_CONFIG,
    assigns=["doc.ents", "doc.spans"],
)
@Language.factory(
    "eds.matcher", default_config=DEFAULT_CONFIG, assigns=["doc.ents", "doc.spans"]
)
def create_component(
    nlp: Language,
    name: str = "eds.matcher",
    terms: Optional[Dict[str, Union[str, List[str]]]] = None,
    attr: Union[str, Dict[str, str]] = None,
    regex: Optional[Dict[str, Union[str, List[str]]]] = "TEXT",
    ignore_excluded: bool = False,
    ignore_space_tokens: bool = False,
    term_matcher: GenericTermMatcher = GenericTermMatcher.exact,
    term_matcher_config: Dict[str, Any] = {},
):
    """
    Provides a generic matcher component.

    Parameters
    ----------
    nlp : Language
        The spaCy object.
    name: str
        The name of the component.
    terms : Optional[Patterns]
        A dictionary of terms.
    regex : Optional[Patterns]
        A dictionary of regular expressions.
    attr : str
        The default attribute to use for matching.
        Can be overridden using the `terms` and `regex` configurations.
    ignore_excluded : bool
        Whether to skip excluded tokens (requires an upstream
        pipeline to mark excluded tokens).
    ignore_space_tokens: bool
        Whether to skip space tokens during matching.

        You won't be able to match on newlines if this is enabled and
        the "spaces"/"newline" option of `eds.normalizer` is enabled (by default).
    term_matcher: GenericTermMatcher
        The matcher to use for matching phrases ?
        One of (exact, simstring)
    term_matcher_config: Dict[str,Any]
        Parameters of the matcher class
    """
    assert not (terms is None and regex is None)

    if terms is None:
        terms = dict()
    if regex is None:
        regex = dict()

    return GenericMatcher(
        nlp,
        terms=terms,
        attr=attr,
        regex=regex,
        ignore_excluded=ignore_excluded,
        ignore_space_tokens=ignore_space_tokens,
        term_matcher=term_matcher,
        term_matcher_config=term_matcher_config,
    )