Matcher
EDS-NLP simplifies the matching process by exposing a eds.matcher component that can match on terms or regular expressions.
Examples
Let us redefine the pipeline :
import edsnlp
nlp = edsnlp.blank("eds")
terms = dict(
covid=["coronavirus", "covid19"], #
patient="patient", #
)
regex = dict(
covid=r"coronavirus|covid[-\s]?19|sars[-\s]cov[-\s]2", #
)
nlp.add_pipe(
"eds.matcher",
config=dict(
terms=terms,
regex=regex,
attr="LOWER",
term_matcher="exact",
term_matcher_config={},
),
)
This snippet is complete, and should run as is.
Patterns, be they terms or regex, are defined as dictionaries where keys become the label of the extracted entities. Dictionary values are either a single expression or a list of expressions that match the concept.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
nlp | The pipeline object. TYPE: |
name | The name of the component. TYPE: |
terms | A dictionary of terms. TYPE: |
regex | A dictionary of regular expressions. TYPE: |
attr | The default attribute to use for matching. Can be overridden using the TYPE: |
ignore_excluded | Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens). TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of TYPE: |
term_matcher | The matcher to use for matching phrases ? One of (exact, simstring) TYPE: |
term_matcher_config | Parameters of the matcher class TYPE: |
span_setter | How to set the spans in the doc. TYPE: |
Authors and citation
The eds.matcher pipeline was developed by AP-HP's Data Science team.