Measurements

The eds.measurements pipeline's role is to detect and normalise numerical measurements within a medical document. We use simple regular expressions to extract and normalize measurements, and use Measurement classes to store them.

Warning

The measurements pipeline is still in active development and has not been rigorously validated. If you come across a measurement expression that goes undetected, please file an issue !

Scope

The eds.measurements pipeline can extract simple (eg 3cm) measurements. It can detect elliptic enumerations (eg 32, 33 et 34kg) of measurements of the same type and split the measurements accordingly.

The normalized value can then be accessed via the span._.value attribute and converted on the fly to a desired unit.

The current pipeline annotates the following measurements out of the box:

Measurement name	Example
`eds.size`	`1m50`, `1.50m`
`eds.weight`	`12kg`, `1kg300`
`eds.bmi`	`BMI: 24`, `24 kg.m-2`
`eds.volume`	`2 cac`, `8ml`

Usage

import spacy

nlp = spacy.blank("eds")
nlp.add_pipe(
    "eds.measurements",
    config=dict(
        measurements=["eds.size", "eds.weight", "eds.bmi"],
        extract_ranges=True,
    ),
)

text = """
Le patient est admis hier, fait 1m78 pour 76kg.
Les deux nodules bénins sont larges de 1,2 et 2.4mm.
BMI: 24.

Le nodule fait entre 1 et 1.5 cm
"""

doc = nlp(text)

measurements = doc.spans["measurements"]

measurements
# Out: [1m78, 76kg, 1,2, 2.4mm, 24, entre 1 et 1.5 cm]

measurements[0]
# Out: 1m78

str(measurements[0]._.value)
# Out: '1.78 m'

measurements[0]._.value.cm
# Out: 178.0

measurements[2]
# Out: 1,2

str(measurements[2]._.value)
# Out: '1.2 mm'

str(measurements[2]._.value.mm)
# Out: 1.2

measurements[4]
# Out: 24

str(measurements[4]._.value)
# Out: '24 kg_per_m2'

str(measurements[4]._.value.kg_per_m2)
# Out: 24

str(measurements[5]._.value)
# Out: 1-1.5 cm

To extract all sizes in centimeters, and average range measurements, you can use the following snippet:

sizes = [
    sum(item.cm for item in m._.value) / len(m._.value)
    for m in doc.spans["measurements"]
    if m.label_ == "eds.size"
]
print(sizes)
sizes
# Out: [178.0, 0.12, 0.24, 1.25]

Custom measurement

You can declare custom measurements by changing the patterns

import spacy

nlp = spacy.blank("eds")
nlp.add_pipe(
    "eds.measurements",
    config=dict(
        measurements={
            # this name will be used to define the labels of the matched entities
            "my_custom_surface_measurement": {
                # This measurement unit is homogenous to square meters
                "unit": "m2",
                # To handle cases like "surface: 1.8" (implied m2), we can use
                # unitless patterns
                "unitless_patterns": [
                    {
                        "terms": ["surface", "aire"],
                        "ranges": [
                            {
                                "unit": "m2",
                                "min": 0,
                                "max": 9,
                            }
                        ],
                    }
                ],
            },
        }
    ),
)

Declared extensions

The eds.measurements pipeline declares a single spaCy extension on the Span object, the value attribute that is a Measurement instance.

Configuration

The pipeline can be configured using the following parameters :

Authors and citation

The eds.measurements pipeline was developed by AP-HP's Data Science team.