BRAT and Standoff
TLDR
import edsnlp
doc_iterator = edsnlp.data.from_standoff(path)
res = edsnlp.data.write_standoff(docs, path)
You can easily integrate BRAT into your project by using EDS-NLP's BRAT reader and writer.
BRAT annotations are in the standoff format. Consider the following document:
Le patient est admis pour une pneumopathie au coronavirus.
On lui prescrit du paracétamol.
Brat annotations are stored in a separate file formatted as follows:
T1 Patient 4 11 patient
T2 Disease 31 58 pneumopathie au coronavirus
T3 Drug 79 90 paracétamol
Reading Standoff files
The BratReader (or edsnlp.data.read_standoff
) reads a directory of BRAT files and yields documents. At the moment, only entities and attributes are loaded. Relations and events are not supported.
Example
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.read_standoff("path/to/brat/directory")
annotated_docs = nlp.pipe(doc_iterator)
Generator vs list
edsnlp.data.read_standoff
returns a LazyCollection. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list :
docs = list(edsnlp.data.read_standoff("path/to/brat/directory"))
Parameters
PARAMETER | DESCRIPTION |
---|---|
path | Path to the directory containing the BRAT files (will recursively look for files in subdirectories). TYPE: |
nlp | The pipeline instance (defaults to
|
span_setter | The span setter to use when setting the spans in the documents. Defaults to setting the spans in the
|
span_attributes | Mapping from BRAT
|
RETURNS | DESCRIPTION |
---|---|
LazyCollection | |
Writing Standoff files
edsnlp.data.write_standoff
writes a list of documents using the BRAT/Standoff format in a directory. The BRAT files will be named after the note_id
attribute of the documents, and subdirectories will be created if the name contains /
characters.
Example
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc = nlp("My document with entities")
edsnlp.data.write_standoff([doc], "path/to/brat/directory")
Overwriting files
By default, write_standoff
will raise an error if the directory already exists and contains files with .a*
or .txt
suffixes. This is to avoid overwriting existing annotations. To allow overwriting existing files, use overwrite=True
.
Parameters
PARAMETER | DESCRIPTION |
---|---|
data | The data to write (either a list of documents or a LazyCollection). TYPE: |
path | Path to the directory containing the BRAT files (will recursively look for files in subdirectories). TYPE: |
span_getter | The span getter to use when listing the spans that will be exported as BRAT entities. Defaults to getting the spans in the
|
span_attributes | Mapping from BRAT attributes to Span extension. By default, no attribute will be exported.
|
overwrite | Whether to overwrite existing directories. TYPE: |
converter | Converter to use to convert the documents to dictionary objects. Defaults to the "standoff" format converter. TYPE: |