Reported Speech
The eds.reported_speech pipeline uses a simple rule-based algorithm to detect spans that relate to reported speech (eg when the doctor quotes the patient).
It was designed at AP-HP's EDS.
Usage
The following snippet matches a simple terminology, and checks whether the extracted entities are part of a reported speech. It is complete and can be run as is.
import spacy
nlp = spacy.blank("fr")
nlp.add_pipe("eds.sentences")
# Dummy matcher
nlp.add_pipe(
"eds.matcher",
config=dict(terms=dict(patient="patient", alcool="alcoolisé")),
)
nlp.add_pipe("eds.reported_speech")
text = (
"Le patient est admis aux urgences ce soir pour une douleur au bras. "
"Il nie être alcoolisé."
)
doc = nlp(text)
doc.ents
# Out: (patient, alcoolisé)
doc.ents[0]._.reported_speech
# Out: False
doc.ents[1]._.reported_speech
# Out: True
Configuration
The pipeline can be configured using the following parameters :
| Parameter | Explanation | Default |
|---|---|---|
attr |
spaCy attribute to match on (eg NORM, TEXT, LOWER) |
"NORM" |
pseudo |
Pseudo-reported speech patterns | None (use pre-defined patterns) |
preceding |
Preceding reported speech patterns | None (use pre-defined patterns) |
following |
Following reported speech patterns | None (use pre-defined patterns) |
termination |
Termination patterns (for syntagma/proposition extraction) | None (use pre-defined patterns) |
verbs |
Patterns for verbs that imply a reported speech | None (use pre-defined patterns) |
on_ents_only |
Whether to qualify pre-extracted entities only | True |
within_ents |
Whether to look for reported speech within entities | False |
explain |
Whether to keep track of the cues for each entity | False |
Declared extensions
The eds.reported_speech pipeline declares two spaCy extensions, on both Span and Token objects :
- The
reported_speechattribute is a boolean, set toTrueif the pipeline predicts that the span/token is reported. - The
reported_speech_property is a human-readable string, computed from thereported_speechattribute. It implements a simple getter function that outputsDIRECTorREPORTED, depending on the value ofreported_speech.
Authors and citation
The eds.reported_speech pipeline was developed by AP-HP's Data Science team.