Skip to content

Aggregating results

Rationale

In some cases, you are not interested in individual extractions, but rather in document-level aggregated variables. For instance, you may be interested to know if a patient is diabetic without caring abou the actual mentions of diabetes. Here, we propose a simple and generic rule which work by:

  • Extracting entities via methods of your choice
  • Qualifiy those entities and discard appropriate entities
  • Set a threshold on the minimal number of entities that should be present in the document to aggregate them.

An example for the disorders pipelines

Below is a simple implementation of this aggregation rule (this can be adapted for other comorbidity components and other qualification methods):

MIN_NUMBER_ENTITIES = 2  

if not Doc.has_extension("aggregated"):
    Doc.set_extension("aggregated", default={})  

spans = doc.spans["diabetes"]  
kept_spans = [
    (span, span._.status, span._.detailed_status)
    for span in spans
    if not any([span._.negation, span._.hypothesis, span._.family])
]  

if len(kept_spans) < MIN_NUMBER_ENTITIES:  
    status = "ABSENT"

else:
    status = max(kept_spans, key=itemgetter(1))[2]  

doc._.aggregated["diabetes"] = status