Skip to content

Tables

The eds.tables matcher detects tables in a documents.

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.tables())

text = """
SERVICE
MEDECINE INTENSIVE –
REANIMATION
Réanimation / Surveillance Continue
Médicale

COMPTE RENDU D'HOSPITALISATION du 05/06/2020 au 10/06/2020
Madame DUPONT Marie, née le 16/05/1900, âgée de 20 ans, a été hospitalisée en
réanimation du 05/06/1920 au 10/06/1920 pour intoxication médicamenteuse volontaire.

Examens complémentaires
Hématologie
Numération
Leucocytes ¦x10*9/L ¦4.97 ¦4.09-11
Hématies ¦x10*12/L¦4.68 ¦4.53-5.79
Hémoglobine ¦g/dL ¦14.8 ¦13.4-16.7
Hématocrite ¦% ¦44.2 ¦39.2-48.6
VGM ¦fL ¦94.4 + ¦79.6-94
TCMH ¦pg ¦31.6 ¦27.3-32.8
CCMH ¦g/dL ¦33.5 ¦32.4-36.3
Plaquettes ¦x10*9/L ¦191 ¦172-398
VMP ¦fL ¦11.5 + ¦7.4-10.8

Sur le plan neurologique : Devant la persistance d'une confusion à distance de
l'intoxication au
...

2/2Pat : <NOM> <Prenom>|F |<date> | <ipp> |Intitulé RCP
"""

doc = nlp(text)

# A table span
table = doc.spans["tables"][0]

# Leucocytes ¦x10*9/L ¦4.97 ¦4.09-11
# Hématies ¦x10*12/L¦4.68 ¦4.53-5.79
# Hémoglobine ¦g/dL ¦14.8 ¦13.4-16.7
# Hématocrite ¦% ¦44.2 ¦39.2-48.6
# VGM ¦fL ¦94.4 + ¦79.6-94
# TCMH ¦pg ¦31.6 ¦27.3-32.8
# CCMH ¦g/dL ¦33.5 ¦32.4-36.3
# Plaquettes ¦x10*9/L ¦191 ¦172-398
# VMP ¦fL ¦11.5 + ¦7.4-10.8

# Convert span to Pandas table
df = table._.to_pd_table(
    as_spans=False,  # set True to set the table cells as spans instead of strings
    header=False,  # set True to use the first row as header
    index=False,  # set True to use the first column as index
)
type(df)
# Out: pandas.core.frame.DataFrame
The pandas DataFrame:

0 1 2 3
0 Leucocytes x10*9/L 4.97 4.09-11
1 Hématies x10*12/L 4.68 4.53-5.79
2 Hémoglobine g/dL 14.8 13.4-16.7
3 Hématocrite % 44.2 39.2-48.6
4 VGM fL 94.4 + 79.6-94
5 TCMH pg 31.6 27.3-32.8
6 CCMH g/dL 33.5 32.4-36.3
7 Plaquettes x10*9/L 191 172-398
8 VMP fL 11.5 + 7.4-10.8

Extensions

The eds.tables pipeline declares the span._.to_pd_table() Span extension. This function returns a parsed pandas version of the table.

Parameters

PARAMETER DESCRIPTION
nlp

Pipeline object

TYPE: PipelineProtocol

name

Name of the component.

tables_pattern

The regex pattern to identify tables. The key of dictionary should be tables

TYPE: Optional[Dict[str, str]] DEFAULT: None

sep_pattern

The regex pattern to identify the separator pattern. Used when calling to_pd_table.

TYPE: Optional[str] DEFAULT: None

min_rows

Only tables with more then min_rows lines will be detected.

TYPE: Optional[int] DEFAULT: 2

attr

spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr'. We can also add a key for each regex.

TYPE: str DEFAULT: TEXT

ignore_excluded

Whether to skip excluded tokens.

TYPE: bool DEFAULT: True

Authors and citation

The eds.tables pipeline was developed by AP-HP's Data Science team.