Skip to content

edsnlp.pipelines.ner.umls.factory

create_component(nlp, name='eds.umls', attr='NORM', ignore_excluded=False, ignore_space_tokens=False, term_matcher=TerminologyTermMatcher.exact, term_matcher_config={}, pattern_config=dict(languages=['FRE'], sources=None))

Create a component to recognize and normalize terms in document that normalize to UMLS concepts.

PARAMETER DESCRIPTION
nlp

spaCy Language object.

TYPE: Language

name

The name of the pipe

TYPE: str DEFAULT: 'eds.umls'

attr

Attribute to match on, eg TEXT, NORM, etc.

TYPE: Union[str, Dict[str, str]] DEFAULT: 'NORM'

ignore_excluded

Whether to skip excluded tokens during matching.

TYPE: bool DEFAULT: False

ignore_space_tokens

Whether to skip space tokens during matching.

TYPE: bool DEFAULT: False

term_matcher

The term matcher to use, either TerminologyTermMatcher.exact or TerminologyTermMatcher.simstring

TYPE: TerminologyTermMatcher DEFAULT: TerminologyTermMatcher.exact

term_matcher_config

The configuration for the term matcher

TYPE: Dict[str, Any] DEFAULT: {}

pattern_config

The pattern retriever configuration

TYPE: Dict[str, Any] DEFAULT: dict(languages=['FRE'], sources=None)

Source code in edsnlp/pipelines/ner/umls/factory.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
@Language.factory(
    "eds.umls", default_config=DEFAULT_CONFIG, assigns=["doc.ents", "doc.spans"]
)
def create_component(
    nlp: Language,
    name: str = "eds.umls",
    attr: Union[str, Dict[str, str]] = "NORM",
    ignore_excluded: bool = False,
    ignore_space_tokens: bool = False,
    term_matcher: TerminologyTermMatcher = TerminologyTermMatcher.exact,
    term_matcher_config: Dict[str, Any] = {},
    pattern_config: Dict[str, Any] = dict(
        languages=["FRE"],
        sources=None,
    ),
):
    """
    Create a component to recognize and normalize terms in document that
    normalize to UMLS concepts.

    Parameters
    ----------
    nlp: Language
        spaCy `Language` object.
    name: str
        The name of the pipe
    attr: Union[str, Dict[str, str]]
        Attribute to match on, eg `TEXT`, `NORM`, etc.
    ignore_excluded: bool
        Whether to skip excluded tokens during matching.
    ignore_space_tokens: bool
        Whether to skip space tokens during matching.
    term_matcher: TerminologyTermMatcher
        The term matcher to use, either `TerminologyTermMatcher.exact` or
        `TerminologyTermMatcher.simstring`
    term_matcher_config: Dict[str, Any]
        The configuration for the term matcher
    pattern_config: Dict[str, Any]
        The pattern retriever configuration
    """

    return TerminologyMatcher(
        nlp,
        label="umls",
        regex=None,
        terms=patterns.get_patterns(pattern_config),
        attr=attr,
        ignore_excluded=ignore_excluded,
        ignore_space_tokens=ignore_space_tokens,
        term_matcher=term_matcher,
        term_matcher_config=term_matcher_config,
    )