Measurements
The eds.measurements pipeline's role is to detect and normalise numerical measurements within a medical document.
We use simple regular expressions to extract and normalize measurements, and use Measurement classes to store them.
Warning
The measurements pipeline is still in active development and has not been rigorously validated.
If you come across a measurement expression that goes undetected, please file an issue !
Scope
The eds.measurements pipeline can extract simple (eg 3cm) measurements.
It can detect elliptic enumerations (eg 32, 33 et 34kg) of measurements of the same type and split the measurements accordingly.
The normalized value can then be accessed via the span._.value attribute and converted on the fly to a desired unit.
The current pipeline annotates the following measurements out of the box:
| Measurement name | Example |
|---|---|
eds.size |
1m50, 1.50m |
eds.weight |
12kg, 1kg300 |
eds.bmi |
BMI: 24, 24 kg.m-2 |
eds.volume |
2 cac, 8ml |
Usage
import spacy
nlp = spacy.blank("eds")
nlp.add_pipe(
"eds.measurements",
config=dict(
measurements=["eds.size", "eds.weight", "eds.bmi"],
extract_ranges=True,
),
)
text = """
Le patient est admis hier, fait 1m78 pour 76kg.
Les deux nodules bénins sont larges de 1,2 et 2.4mm.
BMI: 24.
Le nodule fait entre 1 et 1.5 cm
"""
doc = nlp(text)
measurements = doc.spans["measurements"]
measurements
# Out: [1m78, 76kg, 1,2, 2.4mm, 24, entre 1 et 1.5 cm]
measurements[0]
# Out: 1m78
str(measurements[0]._.value)
# Out: '1.78 m'
measurements[0]._.value.cm
# Out: 178.0
measurements[2]
# Out: 1,2
str(measurements[2]._.value)
# Out: '1.2 mm'
str(measurements[2]._.value.mm)
# Out: 1.2
measurements[4]
# Out: 24
str(measurements[4]._.value)
# Out: '24 kg_per_m2'
str(measurements[4]._.value.kg_per_m2)
# Out: 24
str(measurements[5]._.value)
# Out: 1-1.5 cm
To extract all sizes in centimeters, and average range measurements, you can use the following snippet:
sizes = [
sum(item.cm for item in m._.value) / len(m._.value)
for m in doc.spans["measurements"]
if m.label_ == "eds.size"
]
print(sizes)
sizes
# Out: [178.0, 0.12, 0.24, 1.25]
Custom measurement
You can declare custom measurements by changing the patterns
import spacy
nlp = spacy.blank("eds")
nlp.add_pipe(
"eds.measurements",
config=dict(
measurements={
# this name will be used to define the labels of the matched entities
"my_custom_surface_measurement": {
# This measurement unit is homogenous to square meters
"unit": "m2",
# To handle cases like "surface: 1.8" (implied m2), we can use
# unitless patterns
"unitless_patterns": [
{
"terms": ["surface", "aire"],
"ranges": [
{
"unit": "m2",
"min": 0,
"max": 9,
}
],
}
],
},
}
),
)
Declared extensions
The eds.measurements pipeline declares a single spaCy extension on the Span object,
the value attribute that is a Measurement instance.
Configuration
The pipeline can be configured using the following parameters :
Authors and citation
The eds.measurements pipeline was developed by AP-HP's Data Science team.