edsnlp.processing.distributed
pyspark_type_finder
Returns (when possible) the PySpark type of any python object
pipe
Function to apply a spaCy pipe to a pyspark or koalas DataFrame note
Parameters
PARAMETER | DESCRIPTION |
---|---|
note | A Pyspark or Koalas DataFrame with a TYPE: |
nlp | A spaCy pipe TYPE: |
context | A list of column to add to the generated SpaCy document as an extension. For instance, if TYPE: |
additional_spans | A name (or list of names) of SpanGroup on which to apply the pipe too: SpanGroup are available as TYPE: |
extensions | Spans extensions to add to the extracted results: For instance, if TYPE: |
RETURNS | DESCRIPTION |
---|---|
DataFrame | A pyspark DataFrame with one line per extraction |
custom_pipe
Function to apply a spaCy pipe to a pyspark or koalas DataFrame note, a generic callback function that converts a spaCy Doc
object into a list of dictionaries.
Parameters
PARAMETER | DESCRIPTION |
---|---|
note | A Pyspark or Koalas DataFrame with a TYPE: |
nlp | A spaCy pipe TYPE: |
results_extractor | Arbitrary function that takes extract serialisable results from the computed spaCy There is no requirement for all entities to provide every dictionary key. TYPE: |
dtypes | Dictionary containing all expected keys from the TYPE: |
context | A list of column to add to the generated SpaCy document as an extension. For instance, if TYPE: |
RETURNS | DESCRIPTION |
---|---|
DataFrame | A pyspark DataFrame with one line per extraction |