OMOP Connector

We provide a connector between OMOP-formatted dataframes and spaCy documents.

OMOP-style dataframes

Consider a corpus of just one document:

Le patient est admis pour une pneumopathie au coronavirus.
On lui prescrit du paracétamol.

And its OMOP-style representation, separated in two tables note and note_nlp (here with selected columns) :

note:

note_id	note_text	note_datetime
0	Le patient est admis pour une pneumopathie...	2021-10-23

note_nlp:

note_nlp_id	note_id	start_char	end_char	note_nlp_source_value	lexical_variant
0	0	46	57	disease	coronavirus
1	0	77	88	drug	paracétamol

Using the connector

The following snippet expects the tables note and note_nlp to be already defined (eg through PySpark's toPandas() method).

import spacy
from edsnlp.connectors.omop import OmopConnector

# Instantiate a spacy pipeline
nlp = spacy.blank("fr")

# Instantiate the connector
connector = OmopConnector(nlp)

# Convert OMOP tables (note and note_nlp) to a list of documents
docs = connector.omop2docs(note, note_nlp)
doc = docs[0]

doc.ents
# Out: [coronavirus, paracétamol]

doc.ents[0].label_
# Out: 'disease'

doc.text == note.loc[0].note_text
# Out: True

The object docs now contains a list of documents that reflects the information contained in the OMOP-formatted dataframes.