edsnlp.processing.parallel
nlp = spacy.blank('fr')
module-attribute
_define_nlp(new_nlp)
Set the global nlp variable Doing it this way saves non negligeable amount of time
Source code in edsnlp/processing/parallel.py
14 15 16 17 18 19 20 |
|
_chunker(iterable, total_length, chunksize)
Takes an iterable and chunk it.
Source code in edsnlp/processing/parallel.py
23 24 25 26 27 28 29 30 31 32 33 |
|
_process_chunk(note, **pipe_kwargs)
Source code in edsnlp/processing/parallel.py
36 37 38 39 40 41 42 43 |
|
pipe(note, nlp, context=[], additional_spans='discarded', extensions=[], chunksize=100, n_jobs=-2, progress_bar=True, **pipe_kwargs)
Function to apply a spaCy pipe to a pandas DataFrame note by using multiprocessing
PARAMETER | DESCRIPTION |
---|---|
note |
A pandas DataFrame with a
TYPE:
|
nlp |
A spaCy pipe
TYPE:
|
context |
A list of column to add to the generated SpaCy document as an extension.
For instance, if
TYPE:
|
additional_spans |
A name (or list of names) of SpanGroup on which to apply the pipe too:
SpanGroup are available as
TYPE:
|
extensions |
Spans extensions to add to the extracted results:
FOr instance, if
TYPE:
|
chunksize |
Batch size used to split tasks
TYPE:
|
n_jobs |
Max number of parallel jobs. The default value uses the maximum number of available cores.
TYPE:
|
progress_bar |
Whether to display a progress bar or not
TYPE:
|
**pipe_kwargs |
Arguments exposed in
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A pandas DataFrame with one line per extraction |
Source code in edsnlp/processing/parallel.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|