`edsnlp.data.converters`

Converters are used to convert documents between python dictionaries and Doc objects. There are two types of converters: readers and writers. Readers convert dictionaries to Doc objects, and writers convert Doc objects to dictionaries.

`AttributesMappingArg`

Bases: Validated

A span attribute mapping (can be a list too to keep the same names).

For instance:

doc_attributes="note_datetime" will map the note_datetime JSON attribute to the note_datetime extension.
span_attributes=["negation", "family"] will map the negation and family JSON attributes to the negation and family extensions.

`StandoffDict2DocConverter` [source]

Why does BRAT/Standoff need a converter ?

You may wonder : why do I need a converter ? Since BRAT is already a NLP oriented format, it should be straightforward to convert it to a Doc object.

Indeed, we do provide a default converter for the BRAT standoff format, but we also acknowledge that there may be more than one way to convert a standoff document to a Doc object. For instance, an annotated span may be used to represent a relation between two smaller included entities, or another entity scope, etc.

In such cases, we recommend you use a custom converter as described here.

Examples

# Any kind of reader (`edsnlp.data.read/from_...`) can be used here
docs = edsnlp.data.read_standoff(
    "path/to/standoff",
    converter="standoff",  # set by default

    # Optional parameters
    tokenizer=tokenizer,
    span_setter={"ents": True, "*": True},
    span_attributes={"negation": "negated"},
    keep_raw_attribute_values=False,
    default_attributes={"negated": False, "temporality": "present"},
)

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object (optional and likely not needed, prefer to use the `tokenizer` directly argument instead).
`tokenizer`	The tokenizer instance used to tokenize the documents. Likely not needed since by default it uses the current context tokenizer : the tokenizer of the next pipeline run by `.map_pipeline` in a Stream. or the `eds` tokenizer by default. TYPE: `Optional[Tokenizer]` DEFAULT: `None`
`span_setter`	The span setter to use when setting the spans in the documents. Defaults to setting the spans in the `ents` attribute, and creates a new span group for each JSON entity label. TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, '*': True}`
`span_attributes`	Mapping from BRAT attributes to Span extensions (can be a list too). By default, all attributes are imported as Span extensions with the same name. TYPE: `Optional[AttributesMappingArg]` DEFAULT: `None`
`keep_raw_attribute_values`	Whether to keep the raw attribute values (as strings) or to convert them to Python objects (e.g. booleans). TYPE: `bool` DEFAULT: `False`
`default_attributes`	How to set attributes on spans for which no attribute value was found in the input format. This is especially useful for negation, or frequent attributes values (e.g. "negated" is often False, "temporal" is often "present"), that annotators may not want to annotate every time. TYPE: `AttributesMappingArg` DEFAULT: `{}`
`notes_as_span_attribute`	If set, the AnnotatorNote annotations will be concatenated and stored in a span attribute with this name. TYPE: `Optional[str]` DEFAULT: `None`
`split_fragments`	Whether to split the fragments into separate spans or not. If set to False, the fragments will be concatenated into a single span. TYPE: `bool` DEFAULT: `True`

`StandoffDoc2DictConverter` [source]

Examples

# Any kind of writer (`edsnlp.data.read/from_...`) can be used here
edsnlp.data.write_standoff(
    docs,
    converter="standoff",  # set by default

    # Optional parameters
    span_getter={"ents": True},
    span_attributes=["negation"],
)
# or docs.to_standoff(...) if it's already a
# [Stream][edsnlp.core.stream.Stream]

Parameters

PARAMETER DESCRIPTION

span_getter

The span getter to use when getting the spans from the documents. Defaults to getting the spans in the ents attribute.

TYPE: Optional[SpanGetterArg] DEFAULT: {'ents': True}

span_attributes

Mapping from Span extensions to JSON attributes (can be a list too). By default, no attribute is exported, except note_id.

TYPE: AttributesMappingArg DEFAULT: {}

`ConllDict2DocConverter` [source]

TODO

`OmopDict2DocConverter` [source]

Examples

# Any kind of reader (`edsnlp.data.read/from_...`) can be used here
docs = edsnlp.data.from_pandas(
    df,
    converter="omop",

    # Optional parameters
    tokenizer=tokenizer,
    doc_attributes=["note_datetime"],

    # Parameters below should only matter if you plan to import entities
    # from the dataframe. If the data doesn't contain pre-annotated
    # entities, you can ignore these.
    span_setter={"ents": True, "*": True},
    span_attributes={"negation": "negated"},
    default_attributes={"negated": False, "temporality": "present"},
)

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object (optional and likely not needed, prefer to use the `tokenizer` directly argument instead).
`tokenizer`	The tokenizer instance used to tokenize the documents. Likely not needed since by default it uses the current context tokenizer : the tokenizer of the next pipeline run by `.map_pipeline` in a Stream. or the `eds` tokenizer by default. TYPE: `Optional[Tokenizer]` DEFAULT: `None`
`span_setter`	The span setter to use when setting the spans in the documents. Defaults to setting the spans in the `ents` attribute, and creates a new span group for each JSON entity label. TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, '*': True}`
`doc_attributes`	Mapping from JSON attributes to additional Span extensions (can be a list too). By default, all attributes are imported as Doc extensions with the same name. TYPE: `AttributesMappingArg` DEFAULT: `{'note_datetime': 'note_datetime'}`
`span_attributes`	Mapping from JSON attributes to Span extensions (can be a list too). By default, all attributes are imported as Span extensions with the same name. TYPE: `Optional[AttributesMappingArg]` DEFAULT: `None`
`default_attributes`	How to set attributes on spans for which no attribute value was found in the input format. This is especially useful for negation, or frequent attributes values (e.g. "negated" is often False, "temporal" is often "present"), that annotators may not want to annotate every time. TYPE: `AttributesMappingArg` DEFAULT: `{}`

`OmopDoc2DictConverter` [source]

Examples

# Any kind of writer (`edsnlp.data.write/to_...`) can be used here
df = edsnlp.data.to_pandas(
    docs,
    converter="omop",

    # Optional parameters
    span_getter={"ents": True},
    doc_attributes=["note_datetime"],
    span_attributes=["negation", "family"],
)
# or docs.to_pandas(...) if it's already a
# [Stream][edsnlp.core.stream.Stream]

Parameters

PARAMETER DESCRIPTION

span_getter

The span getter to use when getting the spans from the documents. Defaults to getting the spans in the ents attribute.

TYPE: SpanGetterArg DEFAULT: {'ents': True}

doc_attributes

Mapping from Doc extensions to JSON attributes (can be a list too). By default, no doc attribute is exported, except note_id.

TYPE: AttributesMappingArg DEFAULT: {}

span_attributes

Mapping from Span extensions to JSON attributes (can be a list too). By default, no attribute is exported.

TYPE: AttributesMappingArg DEFAULT: {}

`EntsDoc2DictConverter` [source]

Parameters

PARAMETER DESCRIPTION

span_getter

The span getter to use when getting the spans from the documents. Defaults to getting the spans in the ents attribute.

TYPE: SpanGetterArg DEFAULT: {'ents': True}

doc_attributes

Mapping from Doc extensions to JSON attributes (can be a list too). By default, no doc attribute is exported, except note_id.

TYPE: AttributesMappingArg DEFAULT: {}

span_attributes

Mapping from Span extensions to JSON attributes (can be a list too). By default, no attribute is exported.

TYPE: AttributesMappingArg DEFAULT: {}

edsnlp.data.converters

AttributesMappingArg

StandoffDict2DocConverter [source]

Examples

Parameters

StandoffDoc2DictConverter [source]

Examples

Parameters

ConllDict2DocConverter [source]

OmopDict2DocConverter [source]

Examples

Parameters

OmopDoc2DictConverter [source]

Examples

Parameters

EntsDoc2DictConverter [source]

Parameters

`edsnlp.data.converters`

`AttributesMappingArg`

`StandoffDict2DocConverter` [source]

`StandoffDoc2DictConverter` [source]

`ConllDict2DocConverter` [source]

`OmopDict2DocConverter` [source]

`OmopDoc2DictConverter` [source]

`EntsDoc2DictConverter` [source]