Tables[source]

The eds.tables matcher detects tables in a documents.

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.tables())

text = """
SERVICE
MEDECINE INTENSIVE –
REANIMATION
Réanimation / Surveillance Continue
Médicale

COMPTE RENDU D'HOSPITALISATION du 05/06/2020 au 10/06/2020
Madame DUPONT Marie, née le 16/05/1900, âgée de 20 ans, a été hospitalisée en
réanimation du 05/06/1920 au 10/06/1920 pour intoxication médicamenteuse volontaire.

Examens complémentaires
Hématologie
Numération
Leucocytes ¦x10*9/L ¦4.97 ¦4.09-11
Hématies ¦x10*12/L¦4.68 ¦4.53-5.79
Hémoglobine ¦g/dL ¦14.8 ¦13.4-16.7
Hématocrite ¦% ¦44.2 ¦39.2-48.6
VGM ¦fL ¦94.4 + ¦79.6-94
TCMH ¦pg ¦31.6 ¦27.3-32.8
CCMH ¦g/dL ¦33.5 ¦32.4-36.3
Plaquettes ¦x10*9/L ¦191 ¦172-398
VMP ¦fL ¦11.5 + ¦7.4-10.8

Sur le plan neurologique : Devant la persistance d'une confusion à distance de
l'intoxication au
...

2/2Pat : <NOM> <Prenom>|F |<date> | <ipp> |Intitulé RCP
"""

doc = nlp(text)

# A table span
table = doc.spans["tables"][0]

# Leucocytes ¦x10*9/L ¦4.97 ¦4.09-11
# Hématies ¦x10*12/L¦4.68 ¦4.53-5.79
# Hémoglobine ¦g/dL ¦14.8 ¦13.4-16.7
# Hématocrite ¦% ¦44.2 ¦39.2-48.6
# VGM ¦fL ¦94.4 + ¦79.6-94
# TCMH ¦pg ¦31.6 ¦27.3-32.8
# CCMH ¦g/dL ¦33.5 ¦32.4-36.3
# Plaquettes ¦x10*9/L ¦191 ¦172-398
# VMP ¦fL ¦11.5 + ¦7.4-10.8

# Convert span to Pandas table
df = table._.to_pd_table(
    as_spans=False,  # set True to set the table cells as spans instead of strings
    header=False,  # set True to use the first row as header
    index=False,  # set True to use the first column as index
)
type(df)
# Out: pandas.core.frame.DataFrame

The pandas DataFrame:

	0	1	2	3
0	Leucocytes	x10*9/L	4.97	4.09-11
1	Hématies	x10*12/L	4.68	4.53-5.79
2	Hémoglobine	g/dL	14.8	13.4-16.7
3	Hématocrite	%	44.2	39.2-48.6
4	VGM	fL	94.4 +	79.6-94
5	TCMH	pg	31.6	27.3-32.8
6	CCMH	g/dL	33.5	32.4-36.3
7	Plaquettes	x10*9/L	191	172-398
8	VMP	fL	11.5 +	7.4-10.8

Extensions

The eds.tables pipeline declares the span._.to_pd_table() Span extension. This function returns a parsed pandas version of the table.

Parameters

PARAMETER	DESCRIPTION
`nlp`	Pipeline object TYPE: `PipelineProtocol`
`name`	Name of the component.
`tables_pattern`	The regex pattern to identify tables. The key of dictionary should be `tables` TYPE: `Optional[Dict[str, str]]` DEFAULT: `None`
`sep_pattern`	The regex pattern to identify the separator pattern. Used when calling `to_pd_table`. TYPE: `Optional[str]` DEFAULT: `None`
`min_rows`	Only tables with more then `min_rows` lines will be detected. TYPE: `Optional[int]` DEFAULT: `2`
`attr`	spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr'. We can also add a key for each regex. TYPE: `str` DEFAULT: `TEXT`
`ignore_excluded`	Whether to skip excluded tokens. TYPE: `bool` DEFAULT: `True`

Authors and citation

The eds.tables pipeline was developed by AP-HP's Data Science team.