edsnlp.pipelines.core.terminology.factory
create_component(nlp, label, terms, name='eds.terminology', attr='TEXT', regex=None, ignore_excluded=False, ignore_space_tokens=False, term_matcher='exact', term_matcher_config={})
Provides a terminology matching component.
The terminology matching component differs from the simple matcher component in that
the regex
and terms
keys are used as spaCy's kb_id
. All matched entities
have the same label, defined in the top-level constructor (argument label
).
PARAMETER | DESCRIPTION |
---|---|
nlp |
The spaCy object.
TYPE:
|
name |
The name of the component.
TYPE:
|
label |
Top-level label
TYPE:
|
terms |
A dictionary of terms.
TYPE:
|
regex |
A dictionary of regular expressions.
TYPE:
|
attr |
The default attribute to use for matching.
Can be overridden using the
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens).
TYPE:
|
ignore_space_tokens |
Whether to skip space tokens during matching.
TYPE:
|
term_matcher |
The matcher to use for matching phrases ? One of (exact, simstring)
TYPE:
|
term_matcher_config |
Parameters of the matcher class
TYPE:
|
Source code in edsnlp/pipelines/core/terminology/factory.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|