Skip to content

Reported Speech

The eds.reported_speech component uses a simple rule-based algorithm to detect spans that relate to reported speech (eg when the doctor quotes the patient). It was designed at AP-HP's EDS.

Examples

The following snippet matches a simple terminology, and checks whether the extracted entities are part of a reported speech. It is complete and can be run as is.

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
# Dummy matcher
nlp.add_pipe(eds.matcher(terms=dict(patient="patient", alcool="alcoolisé")))
nlp.add_pipe(eds.reported_speech())

text = (
    "Le patient est admis aux urgences ce soir pour une douleur au bras. "
    "Il nie être alcoolisé."
)

doc = nlp(text)

doc.ents
# Out: (patient, alcoolisé)

doc.ents[0]._.reported_speech
# Out: False

doc.ents[1]._.reported_speech
# Out: True

Extensions

The eds.reported_speech component declares two extensions, on both Span and Token objects :

  1. The reported_speech attribute is a boolean, set to True if the component predicts that the span/token is reported.
  2. The reported_speech_ property is a human-readable string, computed from the reported_speech attribute. It implements a simple getter function that outputs DIRECT or REPORTED, depending on the value of reported_speech.

Parameters

PARAMETER DESCRIPTION
nlp

spaCy nlp pipeline to use for matching.

TYPE: PipelineProtocol

name

The component name.

TYPE: Optional[str]

quotation

String gathering all quotation cues.

TYPE: str DEFAULT: None

verbs

List of reported speech verbs.

TYPE: List[str] DEFAULT: None

following

List of terms following a reported speech.

TYPE: List[str] DEFAULT: None

preceding

List of terms preceding a reported speech.

TYPE: List[str] DEFAULT: None

attr

spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex.

TYPE: str DEFAULT: NORM

span_getter

Which entities should be classified. By default, doc.ents

TYPE: SpanGetterArg DEFAULT: None

on_ents_only

Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.

  • If True, will look in all ents located in doc.ents only
  • If an iterable of string is passed, will additionally look in doc.spans[key] for each key in the iterable

TYPE: Union[bool, str, List[str], Set[str]] DEFAULT: None

within_ents

Whether to consider cues within entities.

TYPE: bool DEFAULT: False

explain

Whether to keep track of cues for each entity.

TYPE: bool DEFAULT: False

Authors and citation

The eds.reported_speech component was developed by AP-HP's Data Science team.