`edsnlp.data.base`

`from_iterable` [source]

The IterableReader (or edsnlp.data.from_iterable) reads a list of Python objects ( texts, dictionaries, ...) and yields documents by passing them through the converter if given, or returns them as is.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_iterable([{...}], nlp=nlp, converter=...)
annotated_docs = nlp.pipe(doc_iterator)

Generator vs list

edsnlp.data.from_iterable returns a Stream. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list

docs = list(edsnlp.data.from_iterable([{...}], converter=...)

Parameters

PARAMETER	DESCRIPTION
`data`	The data to read TYPE: `Any`
`converter`	Converters to use to convert the JSON rows of the data source to Doc objects TYPE: `Optional[AsList[Union[str, Callable]]]` DEFAULT: `None`
`read_in_worker`	In multiprocessing mode, whether to read the data in the worker processes. If `True`, the data will be read in the worker processes, requires pickling the input iterable: this is mostly useful if the pickled iterable is smaller than the data itself (eg, an infinite generator of synthetic data). If `False`, the data will be read in the main process and distributed to the workers. TYPE: `bool` DEFAULT: `False`
`kwargs`	Additional keyword arguments to pass to the converter. These are documented on the Converters page. DEFAULT: `{}`
`shuffle`	Whether to shuffle the data. If "dataset", the whole dataset will be shuffled before starting iterating on it (at the start of every epoch if looping). TYPE: `Literal['dataset', False]` DEFAULT: `False`
`seed`	The seed to use for shuffling. TYPE: `Optional[int]` DEFAULT: `None`
`loop`	Whether to loop over the data indefinitely. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`Stream`

`to_iterable` [source]

edsnlp.data.to_iterable returns an iterator of documents, as converted by the converter. In comparison to just iterating over a Stream, this will also apply the converter to the documents, which can lower the data transfer overhead when using multiprocessing.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)

doc = nlp("My document with entities")

edsnlp.data.to_iterable([doc], converter="omop")

Parameters

PARAMETER DESCRIPTION

data

The data to write (either a list of documents or a Stream).

TYPE: Union[Any, Stream]

converter

Converter to use to convert the documents to dictionary objects.

TYPE: Optional[Union[str, Callable]] DEFAULT: None

kwargs

Additional keyword arguments passed to the converter. These are documented on the Converters page.

DEFAULT: {}

edsnlp.data.base

from_iterable [source]

Example

Parameters

to_iterable [source]

Example

Parameters

`edsnlp.data.base`

`from_iterable` [source]

`to_iterable` [source]