Tables
The eds.tables
matcher detects tables in a documents.
Examples
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.tables())
text = """
SERVICE
MEDECINE INTENSIVE –
REANIMATION
Réanimation / Surveillance Continue
Médicale
COMPTE RENDU D'HOSPITALISATION du 05/06/2020 au 10/06/2020
Madame DUPONT Marie, née le 16/05/1900, âgée de 20 ans, a été hospitalisée en
réanimation du 05/06/1920 au 10/06/1920 pour intoxication médicamenteuse volontaire.
Examens complémentaires
Hématologie
Numération
Leucocytes ¦x10*9/L ¦4.97 ¦4.09-11
Hématies ¦x10*12/L¦4.68 ¦4.53-5.79
Hémoglobine ¦g/dL ¦14.8 ¦13.4-16.7
Hématocrite ¦% ¦44.2 ¦39.2-48.6
VGM ¦fL ¦94.4 + ¦79.6-94
TCMH ¦pg ¦31.6 ¦27.3-32.8
CCMH ¦g/dL ¦33.5 ¦32.4-36.3
Plaquettes ¦x10*9/L ¦191 ¦172-398
VMP ¦fL ¦11.5 + ¦7.4-10.8
Sur le plan neurologique : Devant la persistance d'une confusion à distance de
l'intoxication au
...
2/2Pat : <NOM> <Prenom>|F |<date> | <ipp> |Intitulé RCP
"""
doc = nlp(text)
# A table span
table = doc.spans["tables"][0]
# Leucocytes ¦x10*9/L ¦4.97 ¦4.09-11
# Hématies ¦x10*12/L¦4.68 ¦4.53-5.79
# Hémoglobine ¦g/dL ¦14.8 ¦13.4-16.7
# Hématocrite ¦% ¦44.2 ¦39.2-48.6
# VGM ¦fL ¦94.4 + ¦79.6-94
# TCMH ¦pg ¦31.6 ¦27.3-32.8
# CCMH ¦g/dL ¦33.5 ¦32.4-36.3
# Plaquettes ¦x10*9/L ¦191 ¦172-398
# VMP ¦fL ¦11.5 + ¦7.4-10.8
# Convert span to Pandas table
df = table._.to_pd_table(
as_spans=False, # set True to set the table cells as spans instead of strings
header=False, # set True to use the first row as header
index=False, # set True to use the first column as index
)
type(df)
# Out: pandas.core.frame.DataFrame
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | Leucocytes | x10*9/L | 4.97 | 4.09-11 |
1 | Hématies | x10*12/L | 4.68 | 4.53-5.79 |
2 | Hémoglobine | g/dL | 14.8 | 13.4-16.7 |
3 | Hématocrite | % | 44.2 | 39.2-48.6 |
4 | VGM | fL | 94.4 + | 79.6-94 |
5 | TCMH | pg | 31.6 | 27.3-32.8 |
6 | CCMH | g/dL | 33.5 | 32.4-36.3 |
7 | Plaquettes | x10*9/L | 191 | 172-398 |
8 | VMP | fL | 11.5 + | 7.4-10.8 |
Extensions
The eds.tables
pipeline declares the span._.to_pd_table()
Span extension. This function returns a parsed pandas version of the table.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | Pipeline object TYPE: |
name | Name of the component.
|
tables_pattern | The regex pattern to identify tables. The key of dictionary should be TYPE: |
sep_pattern | The regex pattern to identify the separator pattern. Used when calling TYPE: |
min_rows | Only tables with more then TYPE: |
attr | spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr'. We can also add a key for each regex. TYPE: |
ignore_excluded | Whether to skip excluded tokens. TYPE: |
Authors and citation
The eds.tables
pipeline was developed by AP-HP's Data Science team.