Skip to content

Polars

TLDR
import edsnlp

docs = edsnlp.data.from_polars(df, converter="omop")
docs = docs.map_pipeline(nlp)
res = edsnlp.data.to_polars(docs, converter="omop")

We provide methods to read and write documents (raw or annotated) from and to Polars DataFrames.

As an example, imagine that we have the following OMOP dataframe (we'll name it note_df)

note_id note_text note_datetime
0 Le patient est admis pour une pneumopathie... 2021-10-23

Reading from a Polars Dataframe

The PolarsReader (or edsnlp.data.from_polars) handles reading from a table and yields documents. At the moment, only entities and attributes are loaded. Relations and events are not supported.

Example
import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_polars(df, nlp=nlp, converter="omop")
annotated_docs = nlp.pipe(doc_iterator)

Generator vs list

edsnlp.data.from_polars returns a LazyCollection. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list

docs = list(edsnlp.data.from_polars(df, converter="omop"))

Parameters

PARAMETER DESCRIPTION
data

Polars object

converter

Converter to use to convert the rows of the DataFrame (represented as dicts) to Doc objects. These are documented on the Converters page.

TYPE: Union[str, Callable]

kwargs

Additional keyword arguments to pass to the converter. These are documented on the Converters page.

DEFAULT: {}

RETURNS DESCRIPTION
LazyCollection

Writing to a Polars DataFrame

edsnlp.data.to_polars writes a list of documents as a polars dataframe.

Example
import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)

doc = nlp("My document with entities")

edsnlp.data.to_polars([doc], converter="omop")

Parameters

PARAMETER DESCRIPTION
data

The data to write (either a list of documents or a LazyCollection).

TYPE: Union[Any, LazyCollection]

dtypes

Dictionary of column names to dtypes. This is passed to the schema parameter of pl.from_dicts.

TYPE: Optional[dict] DEFAULT: None

converter

Converter to use to convert the documents to dictionary objects before storing them in the dataframe. These are documented on the Converters page.

TYPE: Optional[Union[str, Callable]] DEFAULT: None

kwargs

Additional keyword arguments to pass to the converter. These are documented on the Converters page.

DEFAULT: {}