Matcher
EDS-NLP simplifies the matching process by exposing a eds.matcher
pipeline
that can match on terms or regular expressions.
Usage
Let us redefine the pipeline :
import spacy
nlp = spacy.blank("fr")
terms = dict(
covid=["coronavirus", "covid19"], #
patient="patient", #
)
regex = dict(
covid=r"coronavirus|covid[-\s]?19|sars[-\s]cov[-\s]2", #
)
nlp.add_pipe(
"eds.matcher",
config=dict(
terms=terms,
regex=regex,
attr="LOWER",
term_matcher="exact",
term_matcher_config={},
),
)
This snippet is complete, and should run as is.
Configuration
The pipeline can be configured using the following parameters :
PARAMETER | DESCRIPTION |
---|---|
terms |
A dictionary of terms.
TYPE:
|
regex |
A dictionary of regular expressions.
TYPE:
|
attr |
The default attribute to use for matching.
Can be overridden using the
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens).
TYPE:
|
ignore_space_tokens |
Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and
the "spaces"/"newline" option of
TYPE:
|
term_matcher |
The matcher to use for matching phrases ? One of (exact, simstring)
TYPE:
|
term_matcher_config |
Parameters of the matcher class
TYPE:
|
Patterns, be they terms
or regex
, are defined as dictionaries where keys become the label of the extracted entities. Dictionary values are a either a single expression or a list of expressions that match the concept (see example).
Authors and citation
The eds.matcher
pipeline was developed by AP-HP's Data Science team.