Matcher
EDS-NLP simplifies the matching process by exposing a eds.matcher
component that can match on terms or regular expressions.
Examples
Let us redefine the pipeline :
import edsnlp
nlp = edsnlp.blank("eds")
terms = dict(
covid=["coronavirus", "covid19"], # (1)
patient="patient", # (2)
)
regex = dict(
covid=r"coronavirus|covid[-\s]?19|sars[-\s]cov[-\s]2", # (3)
)
nlp.add_pipe(
"eds.matcher",
config=dict(
terms=terms,
regex=regex,
attr="LOWER",
term_matcher="exact",
term_matcher_config={},
),
)
- Every key in the
terms
dictionary is mapped to a concept. - The
eds.matcher
pipeline expects a list of expressions, or a single expression. - We can also define regular expression patterns.
This snippet is complete, and should run as is.
Patterns, be they terms
or regex
, are defined as dictionaries where keys become the label of the extracted entities. Dictionary values are either a single expression or a list of expressions that match the concept.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object. TYPE: |
name | The name of the component. TYPE: |
terms | A dictionary of terms. TYPE: |
regex | A dictionary of regular expressions. TYPE: |
attr | The default attribute to use for matching. Can be overridden using the TYPE: |
ignore_excluded | Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens). TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of TYPE: |
term_matcher | The matcher to use for matching phrases ? One of (exact, simstring) TYPE: |
term_matcher_config | Parameters of the matcher class TYPE: |
span_setter | How to set the spans in the doc. TYPE: |
Authors and citation
The eds.matcher
pipeline was developed by AP-HP's Data Science team.