Pandas

TLDR

import edsnlp

iterator = edsnlp.data.from_pandas(df, converter="omop")
docs = nlp.pipe(iterator)
res = edsnlp.data.to_pandas(docs, converter="omop")

We provide methods to read and write documents (raw or annotated) from and to Pandas DataFrames.

As an example, imagine that we have the following OMOP dataframe (we'll name it note_df)

note_id	note_text	note_datetime
0	Le patient est admis pour une pneumopathie...	2021-10-23

Reading from a Pandas Dataframe

The PandasReader (or edsnlp.data.from_pandas) handles reading from a table and yields documents. At the moment, only entities and attributes are loaded. Relations and events are not supported.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_pandas(df, nlp=nlp, converter="omop")
annotated_docs = nlp.pipe(doc_iterator)

Generator vs list

edsnlp.data.from_pandas returns a LazyCollection. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list

docs = list(edsnlp.data.from_pandas(df, converter="omop"))

Parameters

PARAMETER DESCRIPTION

data

Pandas object

converter

Converter to use to convert the rows of the DataFrame to Doc objects

TYPE: Union[str, Callable]

kwargs

Additional keyword arguments passed to the converter. These are documented on the Data schemas page.

DEFAULT: {}

RETURNS	DESCRIPTION
`LazyCollection`

Writing to a Pandas DataFrame

edsnlp.data.to_pandas writes a list of documents as a pandas table.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)

doc = nlp("My document with entities")

edsnlp.data.to_pandas([doc], converter="omop")

Parameters

PARAMETER	DESCRIPTION
`data`	The data to write (either a list of documents or a LazyCollection). TYPE: `Union[Any, LazyCollection]`
`converter`	Converter to use to convert the documents to dictionary objects before storing them in the dataframe. TYPE: `Optional[Union[str, Callable]]`
`dtypes`	Dictionary of column names to dtypes. This is passed to `pd.DataFrame.astype`. TYPE: `Optional[dict]` DEFAULT: `None`
`kwargs`	Additional keyword arguments passed to the converter. These are documented on the Data schemas page. DEFAULT: `{}`

Importing entities from a Pandas DataFrame

If you have a dataframe with entities (e.g., note_nlp in OMOP), you must join it with the dataframe containing the raw text (e.g., note in OMOP) to obtain a single dataframe with the entities next to the raw text. For instance, the second note_nlp dataframe that we will name note_nlp_df.

note_nlp_id	note_id	start_char	end_char	note_nlp_source_value	lexical_variant
0	0	46	57	disease	coronavirus
1	0	77	88	drug	paracétamol
...	...	...	...	...	...

df = (
    note_df
    .set_index("note_id")
    .join(
        note_nlp_df
        .set_index('note_id')
        .groupby(level=0)
        .apply(pd.DataFrame.to_dict, orient='records')
        .rename("entities")
    )
).reset_index()

note_id	note_text	note_datetime	entities
0	Le patient...	2021-10-23	`[{"note_nlp_id": 0, "start_char": 46, ...]`
...	...	...	...