Polars

TLDR

import edsnlp

stream = edsnlp.data.from_polars(df, converter="omop")
stream = stream.map_pipeline(nlp)
res = stream.to_polars(converter="omop")
# or equivalently
edsnlp.data.to_polars(stream, converter="omop")

We provide methods to read and write documents (raw or annotated) from and to Polars DataFrames.

As an example, imagine that we have the following OMOP dataframe (we'll name it note_df)

note_id	note_text	note_datetime
0	Le patient est admis pour une pneumopathie...	2021-10-23

Reading from a Polars Dataframe

The PolarsReader (or edsnlp.data.from_polars) handles reading from a table and yields documents. At the moment, only entities and attributes are loaded. Relations and events are not supported.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_polars(df, nlp=nlp, converter="omop")
annotated_docs = nlp.pipe(doc_iterator)

Generator vs list

edsnlp.data.from_polars returns a Stream. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list

docs = list(edsnlp.data.from_polars(df, converter="omop"))

Parameters

PARAMETER	DESCRIPTION
`data`	Polars object TYPE: `Union[DataFrame, LazyFrame]`
`shuffle`	Whether to shuffle the data. If "dataset", the whole dataset will be shuffled at the beginning (of every epoch if looping). TYPE: `Literal['dataset', False]` DEFAULT: `False`
`seed`	The seed to use for shuffling. TYPE: `Optional[int]` DEFAULT: `None`
`loop`	Whether to loop over the data indefinitely. TYPE: `bool` DEFAULT: `False`
`converter`	Converters to use to convert the rows of the DataFrame (represented as dicts) to Doc objects. These are documented on the Converters page. TYPE: `Optional[AsList[Union[str, Callable]]]` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to the converter. These are documented on the Converters page. DEFAULT: `{}`

RETURNS	DESCRIPTION
`Stream`

Writing to a Polars DataFrame

edsnlp.data.to_polars writes a list of documents as a polars dataframe.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)

doc = nlp("My document with entities")

edsnlp.data.to_polars([doc], converter="omop")

Parameters

PARAMETER	DESCRIPTION
`data`	The data to write (either a list of documents or a Stream). TYPE: `Union[Any, Stream]`
`dtypes`	Dictionary of column names to dtypes. This is passed to the schema parameter of `pl.from_dicts`. TYPE: `Optional[dict]` DEFAULT: `None`
`converter`	Converter to use to convert the documents to dictionary objects before storing them in the dataframe. These are documented on the Converters page. TYPE: `Optional[Union[str, Callable]]` DEFAULT: `None`
`execute`	Whether to execute the writing operation immediately or to return a stream TYPE: `bool` DEFAULT: `True`
`kwargs`	Additional keyword arguments to pass to the converter. These are documented on the Converters page. DEFAULT: `{}`