Tobacco consumption
The eds.tobacco
pipeline component extracts mentions of tobacco consumption.
Details of the used patterns
# fmt: off
PA = r"(?:\bp/?a\b|paquets?.?annee)"
QUANTITY = r"(?P<quantity>[\d]{1,3})"
PUNCT = r"\.,-;\(\)"
default_patterns = [
dict(
source="tobacco",
regex=[
r"tabagi",
r"tabac",
r"\bfume\b",
r"\bfumeu",
r"\bpipes?\b",
],
exclude=dict(
regex=[
"occasion",
"moder",
"quelqu",
"festi",
"rare",
"sujet", # Example : Chez le sujet fumeur ... generic sentences
],
window=(-3, 5),
),
regex_attr="NORM",
assign=[
dict(
name="stopped",
regex=r"(\bex\b|sevr|arret|stop|ancien)",
window=(-3, 15),
reduce_mode="keep_first",
),
dict(
name="zero_after",
regex=r"(?=^[a-z]*\s*:?[\s-]*(0|non|aucun|jamais))",
window=3,
reduce_mode="keep_first",
),
dict(
name="PA",
regex=rf"{QUANTITY}[^{PUNCT}]{{0,10}}{PA}|{PA}[^{PUNCT}]{{0,10}}{QUANTITY}",
window=(-10, 10),
reduce_mode="keep_first",
),
dict(
name="secondhand",
regex="(passif)",
window=5,
reduce_mode="keep_first",
),
],
)
]
# fmt: on
Extensions
On each span span
that match, the following attributes are available:
span._.detailed_status
: either None or"ABSTINENCE"
if the patient stopped its consumptionspan._.assigned
: dictionary with the following keys, if relevant:PA
: the mentioned year-pack (= paquet-année)secondhand
: if secondhand smoking
span._.negation
: set to True when either- A pack-year value of 0 is extracted
- A mention such as "tabac: 0" is found
- The patient experiences secondhand smoking
Use qualifiers !
Although the tobacco pipe sometime sets value for the negation
attribute, generic qualifier should still be used after the pipe.
Examples
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
eds.normalizer(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
nlp.add_pipe(eds.tobacco())
Below are a few examples:
text = "Tabagisme évalué à 15 PA"
doc = nlp(text)
spans = doc.spans["tobacco"]
spans
# Out: [Tabagisme évalué à 15 PA]
span = spans[0]
span._.assigned
# Out: {'PA': 15}
text = "Patient tabagique"
doc = nlp(text)
spans = doc.spans["tobacco"]
spans
# Out: [tabagique]
text = "Tabagisme festif"
doc = nlp(text)
spans = doc.spans["tobacco"]
spans
# Out: []
text = "On a un tabagisme ancien"
doc = nlp(text)
spans = doc.spans["tobacco"]
spans
# Out: [tabagisme ancien]
span = spans[0]
span._.detailed_status
# Out: ABSTINENCE
span._.assigned
# Out: {'stopped': ancien}
text = "Tabac: 0"
doc = nlp(text)
spans = doc.spans["tobacco"]
spans
# Out: [Tabac: 0]
span = spans[0]
span._.detailed_status
# Out: None
span._.negation
# Out: True
span._.assigned
# Out: {'zero_after': [0]}
text = "Tabagisme passif"
doc = nlp(text)
spans = doc.spans["tobacco"]
spans
# Out: [Tabagisme passif]
span = spans[0]
span._.detailed_status
# Out: None
span._.negation
# Out: True
span._.assigned
# Out: {'secondhand': passif}
text = "Tabac: sevré depuis 5 ans"
doc = nlp(text)
spans = doc.spans["tobacco"]
spans
# Out: [Tabac: sevré]
span = spans[0]
span._.detailed_status
# Out: ABSTINENCE
span._.assigned
# Out: {'stopped': sevré}
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | The name of the component TYPE: |
patterns | The patterns to use for matching DEFAULT: |
label | The label to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
Authors and citation
The eds.tobacco
component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.
Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069