edsnlp.data.base
from_iterable
[source]
The IterableReader (or edsnlp.data.from_iterable
) reads a list of Python objects ( texts, dictionaries, ...) and yields documents by passing them through the converter
if given, or returns them as is.
Example
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_iterable([{...}], nlp=nlp, converter=...)
annotated_docs = nlp.pipe(doc_iterator)
Generator vs list
edsnlp.data.from_iterable
returns a Stream. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list
docs = list(edsnlp.data.from_iterable([{...}], converter=...)
Parameters
PARAMETER | DESCRIPTION |
---|---|
data | The data to read TYPE: |
converter | Converters to use to convert the JSON rows of the data source to Doc objects TYPE: |
read_in_worker | In multiprocessing mode, whether to read the data in the worker processes. If TYPE: |
kwargs | Additional keyword arguments to pass to the converter. These are documented on the Converters page. DEFAULT: |
shuffle | Whether to shuffle the data. If "dataset", the whole dataset will be shuffled before starting iterating on it (at the start of every epoch if looping). TYPE: |
seed | The seed to use for shuffling. TYPE: |
loop | Whether to loop over the data indefinitely. TYPE: |
RETURNS | DESCRIPTION |
---|---|
Stream | |
to_iterable
[source]
edsnlp.data.to_iterable
returns an iterator of documents, as converted by the converter
. In comparison to just iterating over a Stream, this will also apply the converter
to the documents, which can lower the data transfer overhead when using multiprocessing.
Example
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc = nlp("My document with entities")
edsnlp.data.to_iterable([doc], converter="omop")
Parameters
PARAMETER | DESCRIPTION |
---|---|
data | The data to write (either a list of documents or a Stream). TYPE: |
converter | Converter to use to convert the documents to dictionary objects. TYPE: |
kwargs | Additional keyword arguments passed to the converter. These are documented on the Converters page. DEFAULT: |