Skip to content

Data connectors

We provide various connectors to read and write data from and to different formats.

Reading from a given path or object takes the following form:

import edsnlp

docs ={format}(  # or .from_{format} for objects
    # Path to the file or directory
    # How to convert JSON-like samples to Doc objects

Writing to given path or object takes the following form:

import edsnlp{format}(  # or .to_{format} for objects
    # Path to the file or directory
    # Iterable of Doc objects
    # How to convert Doc objects to JSON-like samples

The overall process is illustrated in the following diagram:

Data connectors overview

At the moment, we support the following data sources:

Source Description
JSON .json and .jsonl files
Standoff & BRAT .ann and .txt files
Pandas Pandas DataFrame objects
Spark Spark DataFrame objects

and the following schemas:

Schema Shorthand
OMOP omop
Standoff standoff