Skip to content

Pandas

TLDR
import edsnlp

stream = edsnlp.data.from_pandas(df, converter="omop")
stream = stream.map_pipeline(nlp)
res = stream.to_pandas(converter="omop")
# or equivalently
edsnlp.data.to_pandas(stream, converter="omop")

We provide methods to read and write documents (raw or annotated) from and to Pandas DataFrames.

As an example, imagine that we have the following OMOP dataframe (we'll name it note_df)

note_id note_text note_datetime
0 Le patient est admis pour une pneumopathie... 2021-10-23

Reading from a Pandas Dataframe[source]

The PandasReader (or edsnlp.data.from_pandas) handles reading from a table and yields documents. At the moment, only entities and attributes are loaded. Relations and events are not supported.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_pandas(df, nlp=nlp, converter="omop")
annotated_docs = nlp.pipe(doc_iterator)

Generator vs list

edsnlp.data.from_pandas returns a Stream. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list

docs = list(edsnlp.data.from_pandas(df, converter="omop"))

Parameters

PARAMETER DESCRIPTION
data

Pandas object

shuffle

Whether to shuffle the data. If "dataset", the whole dataset will be shuffled before starting iterating on it (at the start of every epoch if looping).

TYPE: Literal['dataset', False] DEFAULT: False

seed

The seed to use for shuffling.

TYPE: Optional[int] DEFAULT: None

loop

Whether to loop over the data indefinitely.

TYPE: bool DEFAULT: False

converter

Converters to use to convert the rows of the DataFrame (represented as dicts) to Doc objects. These are documented on the Converters page.

TYPE: Optional[AsList[Union[str, Callable]]] DEFAULT: None

kwargs

Additional keyword arguments to pass to the converter. These are documented on the Converters page.

DEFAULT: {}

RETURNS DESCRIPTION
Stream

Writing to a Pandas DataFrame[source]

edsnlp.data.to_pandas writes a list of documents as a pandas table.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)

doc = nlp("My document with entities")

edsnlp.data.to_pandas([doc], converter="omop")

Parameters

PARAMETER DESCRIPTION
data

The data to write (either a list of documents or a Stream).

TYPE: Union[Any, Stream]

dtypes

Dictionary of column names to dtypes. This is passed to pd.DataFrame.astype.

TYPE: Optional[dict] DEFAULT: None

execute

Whether to execute the writing operation immediately or to return a stream

TYPE: bool DEFAULT: True

converter

Converter to use to convert the documents to dictionary objects before storing them in the dataframe. These are documented on the Converters page.

TYPE: Optional[Union[str, Callable]] DEFAULT: None

kwargs

Additional keyword arguments to pass to the converter. These are documented on the Converters page.

DEFAULT: {}

Importing entities from a Pandas DataFrame

If you have a dataframe with entities (e.g., note_nlp in OMOP), you must join it with the dataframe containing the raw text (e.g., note in OMOP) to obtain a single dataframe with the entities next to the raw text. For instance, the second note_nlp dataframe that we will name note_nlp_df.

note_nlp_id note_id start_char end_char note_nlp_source_value lexical_variant
0 0 46 57 disease coronavirus
1 0 77 88 drug paracétamol
... ... ... ... ... ...
df = (
    note_df
    .set_index("note_id")
    .join(
        note_nlp_df
        .set_index('note_id')
        .groupby(level=0)
        .apply(pd.DataFrame.to_dict, orient='records')
        .rename("entities")
    )
).reset_index()
note_id note_text note_datetime entities
0 Le patient... 2021-10-23 [{"note_nlp_id": 0, "start_char": 46, ...]
... ... ... ...