Skip to content

Results

Our article is available on arXiv. You will find below some of the results presented in the article, as well as interactive charts.

General results

Label Precision Recall F1 Redact Full redact
RB ML Hybrid RB ML Hybrid RB ML Hybrid RB ML Hybrid RB ML Hybrid
ADDRESS 99.5 99.7 99.4 73.5 93.6 93.8 84.3 96.4 96.3 74.9 94.2 94.3 83.9 98.0 98.4
BIRTHDATE 97.8 98.1 98.2 78.2 98.5 98.5 86.6 98.3 98.3 98.5 99.8 99.8 98.7 99.7 99.7
CITY 94.9 97.5 97.3 41.7 96.4 96.5 57.3 96.9 96.9 41.7 96.5 96.5 61.4 98.1 98.2
DATE 93.9 99.7 99.7 95.8 99.3 99.3 94.9 99.5 99.5 96.1 99.6 99.6 76.7 95.4 95.4
EMAIL 99.9 79.4 90.5 96.8 66.3 100.0 98.3 66.2 93.9 96.8 66.3 100.0 98.7 99.6 99.9
FIRSTNAME 96.8 98.5 98.5 39.1 97.6 97.6 55.6 98.0 98.0 45.1 98.8 98.9 46.7 97.4 97.4
LASTNAME 89.8 98.3 98.3 59.3 98.1 98.6 71.4 98.2 98.4 59.8 99.1 99.6 47.3 96.4 97.2
NSS 98.1 83.9 83.7 96.6 97.8 99.2 97.3 89.8 90.3 96.6 99.4 100.0 99.7 99.9 100.0
PATIENT ID 99.6 99.3 99.3 89.7 95.9 95.9 94.3 97.5 97.5 89.7 98.8 98.8 93.1 99.1 99.1
PHONE 99.9 99.7 99.7 93.9 99.6 99.6 96.8 99.7 99.7 93.9 99.6 99.6 93.2 98.8 99.0
VISIT ID 98.8 87.3 87.3 76.9 81.1 81.3 85.5 83.7 83.9 77.1 81.8 82.0 97.0 98.1 98.3
ZIP 100.0 99.5 99.5 80.8 98.7 99.5 89.4 99.1 99.5 80.9 98.7 99.5 87.4 99.3 99.9
ALL 95.8 99.1 99.1 82.7 98.8 98.9 88.7 99.0 99.0 84.8 99.3 99.4 31.9 84.4 86.2

If you have trouble seeing the chart, please refresh the page.

{ "schema-url": "../assets/figures/label_scores.json" }


Impact of the language model

Transformer P R F1 Redact Full redact
finetuned raw 97.8 ± 0.2 97.7 ± 0.2 97.8 ± 0.2 98.2 ± 0.2 75.5 ± 1.8
camembert base 96.8 ± 0.5 96.9 ± 0.1 96.8 ± 0.3 97.4 ± 0.1 68.9 ± 0.5
scratch pseudo 97.3 ± 0.1 97.2 ± 0.1 97.3 ± 0.1 97.6 ± 0.1 69.0 ± 1.0

If you have trouble seeing the chart, please refresh the page.

{ "schema-url": "../assets/figures/bert_ablation.json" }


Impact of the PDF extraction step

PDF extractor P R F1 Redact Full redact
edspdf 99.1 ± 0.1 98.8 ± 0.1 98.9 ± 0.1 99.2 ± 0.1 93.1 ± 1.0
pdfbox 99.1 ± 0.0 98.9 ± 0.2 99.0 ± 0.1 99.4 ± 0.1 75.7 ± 3.0

Impact of the number of training examples

If you have trouble seeing the chart, please refresh the page.

{ "schema-url": "../assets/figures/limit_ablation.json" }

Impact of the missing document types

If you have trouble seeing the chart, please refresh the page.

{ "schema-url": "../assets/figures/doc_type_ablation.json" }