Reported Speech
The eds.reported_speech
component uses a simple rule-based algorithm to detect spans that relate to reported speech (eg when the doctor quotes the patient). It was designed at AP-HP's EDS.
Examples
The following snippet matches a simple terminology, and checks whether the extracted entities are part of a reported speech. It is complete and can be run as is.
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
# Dummy matcher
nlp.add_pipe(
"eds.matcher",
config=dict(terms=dict(patient="patient", alcool="alcoolisé")),
)
nlp.add_pipe("eds.reported_speech")
text = (
"Le patient est admis aux urgences ce soir pour une douleur au bras. "
"Il nie être alcoolisé."
)
doc = nlp(text)
doc.ents
# Out: (patient, alcoolisé)
doc.ents[0]._.reported_speech
# Out: False
doc.ents[1]._.reported_speech
# Out: True
Extensions
The eds.reported_speech
component declares two extensions, on both Span
and Token
objects :
- The
reported_speech
attribute is a boolean, set toTrue
if the component predicts that the span/token is reported. - The
reported_speech_
property is a human-readable string, computed from thereported_speech
attribute. It implements a simple getter function that outputsDIRECT
orREPORTED
, depending on the value ofreported_speech
.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | spaCy nlp pipeline to use for matching. TYPE: |
name | The component name. TYPE: |
quotation | String gathering all quotation cues. TYPE: |
verbs | List of reported speech verbs. TYPE: |
following | List of terms following a reported speech. TYPE: |
preceding | List of terms preceding a reported speech. TYPE: |
attr | spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex. TYPE: |
span_getter | Which entities should be classified. By default, TYPE: |
on_ents_only | Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE: |
within_ents | Whether to consider cues within entities. TYPE: |
explain | Whether to keep track of cues for each entity. TYPE: |
Authors and citation
The eds.reported_speech
component was developed by AP-HP's Data Science team.