Reported Speech
The eds.reported_speech
pipeline uses a simple rule-based algorithm to detect spans that relate to reported speech (eg when the doctor quotes the patient).
It was designed at AP-HP's EDS.
Usage
The following snippet matches a simple terminology, and checks whether the extracted entities are part of a reported speech. It is complete and can be run as is.
import spacy
nlp = spacy.blank("fr")
nlp.add_pipe("eds.sentences")
# Dummy matcher
nlp.add_pipe(
"eds.matcher",
config=dict(terms=dict(patient="patient", alcool="alcoolisé")),
)
nlp.add_pipe("eds.reported_speech")
text = (
"Le patient est admis aux urgences ce soir pour une douleur au bras. "
"Il nie être alcoolisé."
)
doc = nlp(text)
doc.ents
# Out: [patient, alcoolisé]
doc.ents[0]._.reported_speech
# Out: False
doc.ents[1]._.reported_speech
# Out: True
Configuration
The pipeline can be configured using the following parameters :
Parameter | Explanation | Default |
---|---|---|
attr |
spaCy attribute to match on (eg NORM , TEXT , LOWER ) |
"NORM" |
pseudo |
Pseudo-reported speech patterns | None (use pre-defined patterns) |
preceding |
Preceding reported speech patterns | None (use pre-defined patterns) |
following |
Following reported speech patterns | None (use pre-defined patterns) |
termination |
Termination patterns (for syntagma/proposition extraction) | None (use pre-defined patterns) |
verbs |
Patterns for verbs that imply a reported speech | None (use pre-defined patterns) |
on_ents_only |
Whether to qualify pre-extracted entities only | True |
within_ents |
Whether to look for reported speech within entities | False |
explain |
Whether to keep track of the cues for each entity | False |
Declared extensions
The eds.reported_speech
pipeline declares two spaCy extensions, on both Span
and Token
objects :
- The
reported_speech
attribute is a boolean, set toTrue
if the pipeline predicts that the span/token is reported. - The
reported_speech_
property is a human-readable string, computed from thereported_speech
attribute. It implements a simple getter function that outputsDIRECT
orREPORTED
, depending on the value ofreported_speech
.
Authors and citation
The eds.reported_speech
pipeline was developed by AP-HP's Data Science team.