Skip to content

OMOP Connector

We provide a connector between OMOP-formatted dataframes and spaCy documents.

OMOP-style dataframes

Consider a corpus of just one document:

Le patient est admis pour une pneumopathie au coronavirus.
On lui prescrit du paracétamol.

And its OMOP-style representation, separated in two tables note and note_nlp (here with selected columns) :

note:

note_id note_text note_datetime
0 Le patient est admis pour une pneumopathie... 2021-10-23

note_nlp:

note_nlp_id note_id start_char end_char note_nlp_source_value lexical_variant
0 0 46 57 disease coronavirus
1 0 77 88 drug paracétamol

Using the connector

The following snippet expects the tables note and note_nlp to be already defined (eg through PySpark's toPandas() method).

import spacy
from edsnlp.connectors.omop import OmopConnector

# Instantiate a spacy pipeline
nlp = spacy.blank("fr")

# Instantiate the connector
connector = OmopConnector(nlp)

# Convert OMOP tables (note and note_nlp) to a list of documents
docs = connector.omop2docs(note, note_nlp)
doc = docs[0]

doc.ents
# Out: [coronavirus, paracétamol]

doc.ents[0].label_
# Out: 'disease'

doc.text == note.loc[0].note_text
# Out: True

The object docs now contains a list of documents that reflects the information contained in the OMOP-formatted dataframes.

Back to top