Aggregating results
Rationale
In some cases, you are not interested in individual extractions, but rather in document-level aggregated variables. For instance, you may be interested to know if a patient is diabetic without caring abou the actual mentions of diabetes. Here, we propose a simple and generic rule which work by:
- Extracting entities via methods of your choice
- Qualifiy those entities and discard appropriate entities
- Set a threshold on the minimal number of entities that should be present in the document to aggregate them.
An example for the disorders pipes
Below is a simple implementation of this aggregation rule (this can be adapted for other comorbidity components and other qualification methods):
MIN_NUMBER_ENTITIES = 2 # (1)!
if not Doc.has_extension("aggregated"):
Doc.set_extension("aggregated", default={}) # (2)!
spans = doc.spans["diabetes"] # (3)!
kept_spans = [
(span, span._.status, span._.detailed_status)
for span in spans
if not any([span._.negation, span._.hypothesis, span._.family])
] # (4)!
if len(kept_spans) < MIN_NUMBER_ENTITIES: # (5)!
status = "ABSENT"
else:
status = max(kept_spans, key=itemgetter(1))[2] # (6)!
doc._.aggregated["diabetes"] = status
- We want at least 2 correct entities
- Storing the status in the
doc._.aggregated
dictionary - Getting status for the
diabetes
component - Disregarding entities which are either negated, hypothetical, or not about the patient himself
- Setting the status to 0 if less than 2 relevant entities are left:
- Getting the maximum severity status