Reported Speech

The eds.reported_speech component uses a simple rule-based algorithm to detect spans that relate to reported speech (eg when the doctor quotes the patient). It was designed at AP-HP's EDS.

Examples

The following snippet matches a simple terminology, and checks whether the extracted entities are part of a reported speech. It is complete and can be run as is.

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
# Dummy matcher
nlp.add_pipe(
    "eds.matcher",
    config=dict(terms=dict(patient="patient", alcool="alcoolisé")),
)
nlp.add_pipe("eds.reported_speech")

text = (
    "Le patient est admis aux urgences ce soir pour une douleur au bras. "
    "Il nie être alcoolisé."
)

doc = nlp(text)

doc.ents
# Out: (patient, alcoolisé)

doc.ents[0]._.reported_speech
# Out: False

doc.ents[1]._.reported_speech
# Out: True

Extensions

The eds.reported_speech component declares two extensions, on both Span and Token objects :

The reported_speech attribute is a boolean, set to True if the component predicts that the span/token is reported.
The reported_speech_ property is a human-readable string, computed from the reported_speech attribute. It implements a simple getter function that outputs DIRECT or REPORTED, depending on the value of reported_speech.

Parameters

PARAMETER	DESCRIPTION
`nlp`	spaCy nlp pipeline to use for matching. TYPE: `PipelineProtocol`
`name`	The component name. TYPE: `Optional[str]` DEFAULT: `'eds.reported_speech'`
`quotation`	String gathering all quotation cues. TYPE: `str` DEFAULT: `None`
`verbs`	List of reported speech verbs. TYPE: `List[str]` DEFAULT: `None`
`following`	List of terms following a reported speech. TYPE: `List[str]` DEFAULT: `None`
`preceding`	List of terms preceding a reported speech. TYPE: `List[str]` DEFAULT: `None`
`attr`	spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex. TYPE: `str` DEFAULT: `NORM`
`span_getter`	Which entities should be classified. By default, `doc.ents` TYPE: `SpanGetterArg` DEFAULT: `None`
`on_ents_only`	Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks. If True, will look in all ents located in `doc.ents` only If an iterable of string is passed, will additionally look in `doc.spans[key]` for each key in the iterable TYPE: `Union[bool, str, List[str], Set[str]]` DEFAULT: `None`
`within_ents`	Whether to consider cues within entities. TYPE: `bool` DEFAULT: `False`
`explain`	Whether to keep track of cues for each entity. TYPE: `bool` DEFAULT: `False`

Authors and citation

The eds.reported_speech component was developed by AP-HP's Data Science team.