Measurements
The eds.measurements
matcher detects and normalizes numerical measurements within a medical document.
Warning
The measurements
pipeline is still in active development and has not been rigorously validated. If you come across a measurement expression that goes undetected, please file an issue !
Scope
The eds.measurements
matcher can extract simple (e.g. 3cm
) measurements. It can also detect elliptic enumerations (eg 32, 33 et 34kg
) of measurements of the same type and split the measurements accordingly.
The normalized value can then be accessed via the span._.{measure_name}
attribute, for instance span._.size
or span._.weight
and be converted on the fly to a desired unit. Like for other components, the span._.value
extension can also be used to access the normalized value for any measurement span.
The current matcher annotates the following measurements out of the box:
Measurement name | Example |
---|---|
size | 1m50 , 1.50m |
weight | 12kg , 1kg300 |
bmi | BMI: 24 , 24 kg.m-2 |
volume | 2 cac , 8ml |
Examples
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(
"eds.measurements",
config=dict(
measurements=["size", "weight", "bmi"],
extract_ranges=True,
),
)
text = """
Le patient est admis hier, fait 1m78 pour 76kg.
Les deux nodules bénins sont larges de 1,2 et 2.4mm.
BMI: 24.
Le nodule fait entre 1 et 1.5 cm
"""
doc = nlp(text)
measurements = doc.spans["measurements"]
measurements
# Out: [1m78, 76kg, 1,2, 2.4mm, 24, entre 1 et 1.5 cm]
measurements[0]
# Out: 1m78
str(measurements[0]._.size), str(measurements[0]._.value)
# Out: ('1.78 m', '1.78 m')
measurements[0]._.value.cm
# Out: 178.0
measurements[2]
# Out: 1,2
str(measurements[2]._.value)
# Out: '1.2 mm'
str(measurements[2]._.value.mm)
# Out: 1.2
measurements[4]
# Out: 24
str(measurements[4]._.value)
# Out: '24 kg_per_m2'
str(measurements[4]._.value.kg_per_m2)
# Out: 24
str(measurements[5]._.value)
# Out: 1-1.5 cm
To extract all sizes in centimeters, and average range measurements, you can use the following snippet:
sizes = [
sum(item.cm for item in m._.value) / len(m._.value)
for m in doc.spans["measurements"]
if m.label_ == "size"
]
sizes
# Out: [178.0, 0.12, 0.24, 1.25]
Customization
You can declare custom measurements by altering the patterns:
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(
"eds.measurements",
config=dict(
measurements={
"my_custom_surface_measurement": {
# This measurement unit is homogenous to square meters
"unit": "m2",
# Handle cases like "surface: 1.8" (implied m2),
# vs "surface: 50" (implied cm2)
"unitless_patterns": [
{
"terms": ["surface", "aire"],
"ranges": [
{"unit": "m2", "min": 0, "max": 9},
{"unit": "cm2", "min": 10, "max": 100},
],
}
],
},
}
),
)
Extensions
The eds.measurements
pipeline declares its extensions dynamically, depending on the measurements
parameter: each measurement gets its own extension, and is assigned to a different span group.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | The name of the component. TYPE: |
measurements | A mapping from measure names to MsrConfig Each measure's configuration has the following shape:
TYPE: |
number_terms | A mapping of numbers to their lexical variants DEFAULT: |
stopwords | A list of stopwords that do not matter when placed between a unitless trigger and a number DEFAULT: |
unit_divisors | A list of terms used to divide two units (like: m / s) DEFAULT: |
attr | Whether to match on the text ('TEXT') or on the normalized text ('NORM') TYPE: |
ignore_excluded | Whether to exclude pollution patterns when matching in the text TYPE: |
compose_units | Whether to compose units (like "m/s" or "m.s-1") DEFAULT: |
extract_ranges | Whether to extract ranges (like "entre 1 et 2 cm") DEFAULT: |
range_patterns | A list of "{FROM} xx {TO} yy" patterns to match range measurements DEFAULT: |
after_snippet_limit | Maximum word distance after to link a part of a measurement after its number DEFAULT: |
before_snippet_limit | Maximum word distance after to link a part of a measurement before its number DEFAULT: |
span_setter | How to set the spans in the document. By default, each measurement will be assigned to its own span group (using either the "name" field of the config, or the key if you passed a dict), and to the "measurements" group. DEFAULT: |
span_getter | Where to look for measurements in the doc. By default, look in the whole doc. You can combine this with the TYPE: |
merge_mode | How to merge matches with the spans from
TYPE: |
Authors and citation
The eds.measurements
pipeline was developed by AP-HP's Data Science team.