Polars
TLDR
import edsnlp
docs = edsnlp.data.from_polars(df, converter="omop")
docs = docs.map_pipeline(nlp)
res = edsnlp.data.to_polars(docs, converter="omop")
We provide methods to read and write documents (raw or annotated) from and to Polars DataFrames.
As an example, imagine that we have the following OMOP dataframe (we'll name it note_df
)
note_id | note_text | note_datetime |
---|---|---|
0 | Le patient est admis pour une pneumopathie... | 2021-10-23 |
Reading from a Polars Dataframe
The PolarsReader (or edsnlp.data.from_polars
) handles reading from a table and yields documents. At the moment, only entities and attributes are loaded. Relations and events are not supported.
Example
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_polars(df, nlp=nlp, converter="omop")
annotated_docs = nlp.pipe(doc_iterator)
Generator vs list
edsnlp.data.from_polars
returns a LazyCollection. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list
docs = list(edsnlp.data.from_polars(df, converter="omop"))
Parameters
PARAMETER | DESCRIPTION |
---|---|
data | Polars object TYPE: |
converter | Converter to use to convert the rows of the DataFrame (represented as dicts) to Doc objects. These are documented on the Converters page. TYPE: |
kwargs | Additional keyword arguments to pass to the converter. These are documented on the Converters page. DEFAULT: |
RETURNS | DESCRIPTION |
---|---|
LazyCollection | |
Writing to a Polars DataFrame
edsnlp.data.to_polars
writes a list of documents as a polars dataframe.
Example
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc = nlp("My document with entities")
edsnlp.data.to_polars([doc], converter="omop")
Parameters
PARAMETER | DESCRIPTION |
---|---|
data | The data to write (either a list of documents or a LazyCollection). TYPE: |
dtypes | Dictionary of column names to dtypes. This is passed to the schema parameter of TYPE: |
converter | Converter to use to convert the documents to dictionary objects before storing them in the dataframe. These are documented on the Converters page. TYPE: |
kwargs | Additional keyword arguments to pass to the converter. These are documented on the Converters page. DEFAULT: |