Skip to content

edsnlp.utils.collections

batch_compress_dict

Compress a sequence of dictionaries in which values that occur multiple times are deduplicated. The corresponding keys will be merged into a single string using the "|" character as a separator. This is useful to preserve referential identities when decompressing the dictionary after it has been serialized and deserialized.

Parameters

PARAMETER DESCRIPTION
seq

Sequence of dictionaries to compress

TYPE: Optional[Iterable[Dict[str, Any]]] DEFAULT: None

multi_tee

Makes copies of an iterable such that every iteration over it starts from 0. If the iterable is a sequence (list, tuple), just returns it since every iter() over the object restart from the beginning

FrozenDict

Bases: dict

Copied from spacy.util.SimpleFrozenDict to ensure compatibility.

Initialize the frozen dict. Can be initialized with pre-defined values.

error (str): The error message when user tries to assign to dict.

FrozenList

Bases: list

Copied from spacy.util.SimpleFrozenDict to ensure compatibility

Initialize the frozen list.

error (str): The error message when user tries to mutate the list.

ld_to_dl

Convert a list of dictionaries to a dictionary of lists

Parameters

PARAMETER DESCRIPTION
ld

The list of dictionaries

TYPE: Iterable[Mapping[str, T]]

RETURNS DESCRIPTION
Dict[str, List[T]]

The dictionary of lists

dl_to_ld

Convert a dictionary of lists to a list of dictionaries

Parameters

PARAMETER DESCRIPTION
dl

The dictionary of lists

TYPE: Mapping[str, Sequence[Any]]

RETURNS DESCRIPTION
List[Dict[str, Any]]

The list of dictionaries

decompress_dict

Decompress a dictionary of lists into a sequence of dictionaries. This function assumes that the dictionary structure was obtained using the batch_compress_dict class. Keys that were merged into a single string using the "|" character as a separator will be split into a nested dictionary structure.

Parameters

PARAMETER DESCRIPTION
seq

The dictionary to decompress or a sequence of dictionaries to decompress

TYPE: Union[Iterable[Dict[str, Any]], Dict[str, Any]]

batchify

Yields batch that contain at most batch_size elements. If an item contains more than batch_size elements, it will be yielded as a single batch.

Parameters

PARAMETER DESCRIPTION
iterable

TYPE: Iterable[T]

batch_size

TYPE: int

drop_last

TYPE: bool DEFAULT: False