edsnlp.processing.parallel
pipe(note, nlp, context=[], additional_spans=[], extensions=[], results_extractor=None, chunksize=100, n_jobs=-2, progress_bar=True, **pipe_kwargs)
Function to apply a spaCy pipe to a pandas DataFrame note by using multiprocessing
PARAMETER | DESCRIPTION |
---|---|
note |
A pandas DataFrame with a
TYPE:
|
nlp |
A spaCy pipe
TYPE:
|
context |
A list of column to add to the generated SpaCy document as an extension.
For instance, if
TYPE:
|
results_extractor |
Arbitrary function that takes extract serialisable results from the computed
spaCy
TYPE:
|
additional_spans |
A name (or list of names) of SpanGroup on which to apply the pipe too:
SpanGroup are available as
TYPE:
|
extensions |
Spans extensions to add to the extracted results:
For instance, if
TYPE:
|
chunksize |
Batch size used to split tasks
TYPE:
|
n_jobs |
Max number of parallel jobs. The default value uses the maximum number of available cores.
TYPE:
|
progress_bar |
Whether to display a progress bar or not
TYPE:
|
**pipe_kwargs |
Arguments exposed in
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A pandas DataFrame with one line per extraction |
Source code in edsnlp/processing/parallel.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|