Association rule mining is a powerful technique for identifying hidden patterns in data and is particularly applicable in healthcare. In this project, I aim to use this technique to explore how diseases and symptoms can be predicted using association rules. The goal is to analyze relationships between symptoms and diseases, considering symptoms both as antecedents and consequents, while also addressing some personal questions about specific symptoms.
During this project we will search for association rules between
different symptoms and diseases. Association rule mining is a method for
identifying frequent patterns, correlations, associations, or causal
structures in datasets. Given a set of transactions, the goal of
association rule mining is to find the rules that allow us to predict
the occurrence of a specific item based on the occurrences of the other
items in the transaction. An association rule consists of two parts: an
antecedent (if) and a consequent (then). Association rules in medical
diagnosis can help doctors diagnose and treat patients. Diagnosis is a
difficult process with many potential errors that can lead to unreliable
results. Relational association rule mining can be used to determine the
likelihood of illness based on various factors and symptoms. There are 3
main measures used when it comes to mining for association rules.
Support. This measure gives an idea of how frequent
an itemset is in all the transactions. X indicates the number of
transactions the itemset X appers in; N is the number of transactions in
the database. For symptoms and disease dataset, support measures how
frequently a particular symptom or combination of symptoms appears in
the dataset relative to all transactions. For instance, a high support
value for a symptom indicates that it is a commonly reported symptom
across many diseases.
Confidence. Confidence is the
percentage in which the consequent is also satisfied upon particular
antecedent. The proportion of transactions where the presence of item or
itemset X results in the presence of item or itemset Y. It indicates the
strength of the relationship between symptoms and a disease. For the
selected dataset, confidence indicates the strength of the relationship
between symptoms and a disease or symptoms and symptoms.
Lift. Lift controls for the support (frequency) of
consequent while calculating the conditional probability of occurrence
of Y given X. Lift measures how much more likely the symptom/disease
(rhs) is to occur given the symptoms (lhs). Significance:
- lift
> 1 indicates a positive association, meaning the presence of the
symptom increases the likelihood of the disease or another symptom.
- lift = 1 suggests no association.
- lift < 1 indicates a
negative association.
library(arules)
library(reshape2)
library(Matrix)
library(stringr)
library(ggplot2)
library(arulesViz)
The dataset used in this project was sourced from the Disease-Symptom Knowledge Base, provided by Columbia University (https://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/).This dataset consists of a mapping of diseases and their associated symptoms. Information regarding the metadata structure and terminology used in this dataset can be found in the Unified Medical Language System (UMLS) (https://www.nlm.nih.gov/research/umls/index.html), which provides standardized medical vocabulary. The method used the MedLEE natural language processing system to obtain UMLS codes for diseases and symptoms from the notes.
setwd("C:/Users/ydmar/Documents/UW/UW - 1 semester/UL")
UMLS <- read.transactions("UMLS.csv", sep = ",")
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in scan(text = l, what = "character", sep = sep, quote = quote, : EOF
## within quoted string
## Warning in asMethod(object): removing duplicated items in transactions
summary(UMLS)
## transactions as itemMatrix in sparse format with
## 134 rows (elements/itemsets/transactions) and
## 528 columns (items) and a density of 0.02733492
##
## most frequent items:
## UMLS:C0392680_shortness of breath UMLS:C0030193_pain
## 46 40
## UMLS:C0015967_fever UMLS:C0000737_pain abdominal
## 33 27
## UMLS:C0042963_vomiting (Other)
## 24 1764
##
## element (itemset/transaction) length distribution:
## sizes
## 1 3 4 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 29
## 1 1 1 1 2 3 5 7 12 14 16 11 9 14 9 7 3 4 3 4 1 2 1 1 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 12.00 14.00 14.43 17.00 29.00
##
## includes extended item information - examples:
## labels
## 1 UMLS:C0000727_abdomen acute
## 2 UMLS:C0000731_distended abdomen
## 3 UMLS:C0000737_pain abdominal
In the dataset we have 134 deseases (rows) and 528 unique items, which are diseases or symptoms. The most frequent items are: shortness of breath, pain, fever, pain abdominal and vomiting. The dencity of the data set is 0.02733493, meaning only about ~2.7 % of the possible item combinations are present. Most transactions have between 12 and 17 items, with a few very short (1 item) and very long (29 items) transactions.
size(UMLS)
## [1] 13 15 22 10 20 8 9 10 19 15 13 13 21 11 18 18 21 12 22 25 11 15 15 18 15
## [26] 13 16 17 17 14 16 23 20 12 12 26 13 29 13 14 13 16 10 11 15 11 18 15 15 11
## [51] 13 17 10 16 12 17 24 7 17 12 13 12 21 12 13 13 13 17 14 12 12 29 13 3 16
## [76] 11 10 11 17 14 16 24 19 10 18 14 12 14 11 1 16 11 7 13 11 11 16 16 18 15
## [101] 16 10 12 16 14 20 19 22 17 18 9 14 11 17 13 16 9 13 14 9 22 14 12 5 16
## [126] 8 8 12 12 16 14 20 9 4
length(UMLS)
## [1] 134
UMLS_freq <- sort(itemFrequency(UMLS, type="relative"), decreasing = TRUE)
cat(paste(names(UMLS_freq), UMLS_freq, sep = ": ", collapse = "|"), "\n")
## UMLS:C0392680_shortness of breath: 0.343283582089552|UMLS:C0030193_pain: 0.298507462686567|UMLS:C0015967_fever: 0.246268656716418|UMLS:C0000737_pain abdominal: 0.201492537313433|UMLS:C0042963_vomiting: 0.17910447761194|UMLS:C0011991_diarrhea: 0.171641791044776|UMLS:C0004093_asthenia: 0.164179104477612|UMLS:C0027497_nausea: 0.164179104477612|UMLS:C0241526_unresponsiveness: 0.164179104477612|UMLS:C0010200_cough: 0.156716417910448|UMLS:C0013404_dyspnea: 0.156716417910448|UMLS:C0085593_chill: 0.149253731343284|UMLS:C0008031_pain chest: 0.141791044776119|UMLS:C0043096_decreased body weight: 0.141791044776119|UMLS:C0277797_apyrexial: 0.141791044776119|UMLS:C0034642_rale: 0.134328358208955|UMLS:C0085631_agitation: 0.126865671641791|UMLS:C0221198_lesion: 0.126865671641791|UMLS:C0577559_mass of body structure: 0.126865671641791|UMLS:C0234233_sore to touch: 0.119402985074627|UMLS:C0020649_hypotension: 0.104477611940299|UMLS:C0028081_night sweat: 0.104477611940299|UMLS:C0085619_orthopnea: 0.104477611940299|UMLS:C0205400_thicken: 0.104477611940299|UMLS:C0233762_hallucinations auditory: 0.104477611940299|UMLS:C0019080_haemorrhage: 0.0970149253731343|UMLS:C0038999_swelling: 0.0970149253731343|UMLS:C0039070_syncope: 0.0970149253731343|UMLS:C0040822_tremor: 0.0895522388059701|UMLS:C0086439_hypokinesia: 0.0895522388059701|UMLS:C0424000_feeling suicidal: 0.0895522388059701|UMLS:C0438696_suicidal: 0.0895522388059701|UMLS:C0476273_distress respiratory: 0.0895522388059701|UMLS:C0022107_irritable mood: 0.082089552238806|UMLS:C0150041_feeling hopeless: 0.082089552238806|UMLS:C0231835_tachypnea: 0.082089552238806|UMLS:C0233481_worry: 0.082089552238806|UMLS:C0376405_patient non compliance: 0.082089552238806|UMLS:C0008033_pleuritic pain: 0.0746268656716418|UMLS:C0018991_hemiplegia: 0.0746268656716418|UMLS:C0020625_hyponatremia: 0.0746268656716418|UMLS:C0033774_pruritus: 0.0746268656716418|UMLS:C0036572_seizure: 0.0746268656716418|UMLS:C0038990_sweat^UMLS:C0700590_sweating increased: 0.0746268656716418|UMLS:C0043144_wheezing: 0.0746268656716418|UMLS:C0312422_blackout: 0.0746268656716418|UMLS:C0850149_non-productive cough: 0.0746268656716418|UMLS:C0917801_sleeplessness: 0.0746268656716418|UMLS:C0002962_angina pectoris: 0.0671641791044776|UMLS:C0003962_ascites: 0.0671641791044776|UMLS:C0015672_fatigue: 0.0671641791044776|UMLS:C0018681_headache: 0.0671641791044776|UMLS:C0030252_palpitation: 0.0671641791044776|UMLS:C0085639_fall: 0.0671641791044776|UMLS:C0231807_dyspnea on exertion: 0.0671641791044776|UMLS:C0233763_hallucinations visual: 0.0671641791044776|UMLS:C0239134_productive cough: 0.0671641791044776|UMLS:C0578150_hemodynamically stable: 0.0671641791044776|UMLS:C0856054_mental status changes: 0.0671641791044776|UMLS:C1299586_difficulty: 0.0671641791044776|UMLS:C0003123_anorexia: 0.0597014925373134|UMLS:C0009806_constipation: 0.0597014925373134|UMLS:C0012833_dizziness: 0.0597014925373134|UMLS:C0232292_chest tightness: 0.0597014925373134|UMLS:C0237154_homelessness: 0.0597014925373134|UMLS:C0242453_prostatism: 0.0597014925373134|UMLS:C0428977_bradycardia: 0.0597014925373134|UMLS:C0744492_guaiac positive: 0.0597014925373134|UMLS:C1269955_tumor cell invasion: 0.0597014925373134|UMLS:C0000731_distended abdomen: 0.0522388059701493|UMLS:C0013144_drowsiness^UMLS:C0234450_sleepy: 0.0522388059701493|UMLS:C0020461_hyperkalemia: 0.0522388059701493|UMLS:C0023380_lethargy: 0.0522388059701493|UMLS:C0029053_decreased translucency: 0.0522388059701493|UMLS:C0041657_unconscious state: 0.0522388059701493|UMLS:C0041834_erythema: 0.0522388059701493|UMLS:C0231528_myalgia: 0.0522388059701493|UMLS:C0232498_abdominal tenderness: 0.0522388059701493|UMLS:C0332575_redness: 0.0522388059701493|UMLS:C0344315_mood depressed: 0.0522388059701493|UMLS:C0424109_weepiness: 0.0522388059701493|UMLS:C0427055_facial paresis: 0.0522388059701493|UMLS:C0549483_abscess bacterial: 0.0522388059701493|UMLS:C1096646_transaminitis: 0.0522388059701493|UMLS:C1273573_unsteady gait: 0.0522388059701493|UMLS:C0013362_dysarthria: 0.0447761194029851|UMLS:C0018965_hematuria: 0.0447761194029851|UMLS:C0028643_numbness: 0.0447761194029851|UMLS:C0035508_rhonchus: 0.0447761194029851|UMLS:C0235710_chest discomfort: 0.0447761194029851|UMLS:C0239110_consciousness clear: 0.0447761194029851|UMLS:C0240100_jugular venous distention: 0.0447761194029851|UMLS:C0424068_verbal auditory hallucinations: 0.0447761194029851|UMLS:C0438716_pressure chest: 0.0447761194029851|UMLS:C0457096_yellow sputum: 0.0447761194029851|UMLS:C0728899_intoxication: 0.0447761194029851|UMLS:C0007398_catatonia: 0.0373134328358209|UMLS:C0013491_ecchymosis: 0.0373134328358209|UMLS:C0019079_haemoptysis: 0.0373134328358209|UMLS:C0028084_nightmare: 0.0373134328358209|UMLS:C0028961_oliguria: 0.0373134328358209|UMLS:C0038002_splenomegaly: 0.0373134328358209|UMLS:C0042571_vertigo: 0.0373134328358209|UMLS:C0149696_food intolerance: 0.0373134328358209|UMLS:C0149746_orthostasis: 0.0373134328358209|UMLS:C0220870_lightheadedness: 0.0373134328358209|UMLS:C0231530_muscle twitch: 0.0373134328358209|UMLS:C0232517_gurgle: 0.0373134328358209|UMLS:C0234518_speech slurred: 0.0373134328358209|UMLS:C0238844_breath sounds decreased: 0.0373134328358209|UMLS:C0242429_throat sore: 0.0373134328358209|UMLS:C0277794_extreme exhaustion: 0.0373134328358209|UMLS:C0332601_cushingoid facies^UMLS:C0878661_cushingoid habitus: 0.0373134328358209|UMLS:C0424230_motor retardation: 0.0373134328358209|UMLS:C0455769_energy increased: 0.0373134328358209|UMLS:C0553668_labored breathing: 0.0373134328358209|UMLS:C0858924_general discomfort: 0.0373134328358209|UMLS:C1260880_snuffle: 0.0373134328358209|UMLS:C1513183_metastatic lesion: 0.0373134328358209|UMLS:C0006625_cachexia: 0.0298507462686567|UMLS:C0009024_clonus: 0.0298507462686567|UMLS:C0013144_drowsiness: 0.0298507462686567|UMLS:C0013428_dysuria: 0.0298507462686567|UMLS:C0014394_enuresis: 0.0298507462686567|UMLS:C0018834_heartburn: 0.0298507462686567|UMLS:C0020672_hypothermia, natural: 0.0298507462686567|UMLS:C0027498_nausea and vomiting: 0.0298507462686567|UMLS:C0027769_nervousness: 0.0298507462686567|UMLS:C0030554_paresthesia: 0.0298507462686567|UMLS:C0032617_polyuria: 0.0298507462686567|UMLS:C0034079_lung nodule: 0.0298507462686567|UMLS:C0043094_weight gain: 0.0298507462686567|UMLS:C0085602_polydypsia: 0.0298507462686567|UMLS:C0085606_urgency of micturition: 0.0298507462686567|UMLS:C0085624_burning sensation: 0.0298507462686567|UMLS:C0150045_urge incontinence: 0.0298507462686567|UMLS:C0221150_painful swallowing: 0.0298507462686567|UMLS:C0231872_egophony: 0.0298507462686567|UMLS:C0231890_fremitus: 0.0298507462686567|UMLS:C0232201_sinus rhythm: 0.0298507462686567|UMLS:C0232766_asterixis: 0.0298507462686567|UMLS:C0232995_gravida 0: 0.0298507462686567|UMLS:C0233308_spontaneous rupture of membranes: 0.0298507462686567|UMLS:C0234215_sensory discomfort: 0.0298507462686567|UMLS:C0234450_sleepy: 0.0298507462686567|UMLS:C0236018_aura: 0.0298507462686567|UMLS:C0239832_numbness of hand: 0.0298507462686567|UMLS:C0424092_withdraw: 0.0298507462686567|UMLS:C0427008_stiffness: 0.0298507462686567|UMLS:C0436331_symptom aggravating factors: 0.0298507462686567|UMLS:C0457097_green sputum: 0.0298507462686567|UMLS:C0557075_has religious belief: 0.0298507462686567|UMLS:C0700292_hypoxemia: 0.0298507462686567|UMLS:C0003862_arthralgia: 0.0223880597014925|UMLS:C0004134_ataxia: 0.0223880597014925|UMLS:C0006157_breech presentation: 0.0223880597014925|UMLS:C0015672_fatigue^UMLS:C0557875_tired: 0.0223880597014925|UMLS:C0016579_formication: 0.0223880597014925|UMLS:C0018800_cardiomegaly: 0.0223880597014925|UMLS:C0019214_hepatosplenomegaly: 0.0223880597014925|UMLS:C0019825_hoarseness: 0.0223880597014925|UMLS:C0020440_hypercapnia: 0.0223880597014925|UMLS:C0030552_paresis: 0.0223880597014925|UMLS:C0037384_snore: 0.0223880597014925|UMLS:C0037763_spasm: 0.0223880597014925|UMLS:C0085628_stupor: 0.0223880597014925|UMLS:C0085632_indifferent mood: 0.0223880597014925|UMLS:C0085636_photophobia: 0.0223880597014925|UMLS:C0151315_neck stiffness: 0.0223880597014925|UMLS:C0152032_urinary hesitation: 0.0223880597014925|UMLS:C0221151_projectile vomiting: 0.0223880597014925|UMLS:C0221166_paraparesis: 0.0223880597014925|UMLS:C0231187_decompensation: 0.0223880597014925|UMLS:C0231218_malaise: 0.0223880597014925|UMLS:C0231230_fatigability: 0.0223880597014925|UMLS:C0232257_systolic murmur: 0.0223880597014925|UMLS:C0232726_tenesmus: 0.0223880597014925|UMLS:C0233647_neologism: 0.0223880597014925|UMLS:C0238705_left atrial hypertrophy: 0.0223880597014925|UMLS:C0239233_satiety early: 0.0223880597014925|UMLS:C0240233_loose associations: 0.0223880597014925|UMLS:C0241158_scar tissue: 0.0223880597014925|UMLS:C0241252_stool color yellow: 0.0223880597014925|UMLS:C0241705_difficulty passing urine: 0.0223880597014925|UMLS:C0425251_bedridden^UMLS:C0741453_bedridden: 0.0223880597014925|UMLS:C0520888_t wave inverted: 0.0223880597014925|UMLS:C0521516_polymyalgia: 0.0223880597014925|UMLS:C0558089_verbally abusive behavior: 0.0223880597014925|UMLS:C0558261_terrify: 0.0223880597014925|UMLS:C0742985_debilitation: 0.0223880597014925|UMLS:C0743482_emphysematous change: 0.0223880597014925|UMLS:C0744727_hematocrit decreased: 0.0223880597014925|UMLS:C0744740_heme positive: 0.0223880597014925|UMLS:C0746619_monoclonal: 0.0223880597014925|UMLS:C0847488_unhappy: 0.0223880597014925|UMLS:C0848168_out of breath: 0.0223880597014925|UMLS:C0859032_moan: 0.0223880597014925|UMLS:C1305739_presence of q wave: 0.0223880597014925|UMLS:C1321756_achalasia: 0.0223880597014925|UMLS:C1384489_scratch marks: 0.0223880597014925|UMLS:C0002416_ambidexterity: 0.0149253731343284|UMLS:C0004604_pain back: 0.0149253731343284|UMLS:C0008767_cicatrisation: 0.0149253731343284|UMLS:C0008767_cicatrisation^UMLS:C0241158_scar tissue: 0.0149253731343284|UMLS:C0010520_cyanosis: 0.0149253731343284|UMLS:C0016204_flatulence: 0.0149253731343284|UMLS:C0016382_flushing: 0.0149253731343284|UMLS:C0019209_hepatomegaly: 0.0149253731343284|UMLS:C0019572_hirsutism: 0.0149253731343284|UMLS:C0020175_hunger: 0.0149253731343284|UMLS:C0020303_hydropneumothorax: 0.0149253731343284|UMLS:C0020598_hypocalcemia result: 0.0149253731343284|UMLS:C0020621_hypokalemia: 0.0149253731343284|UMLS:C0024103_mass in breast: 0.0149253731343284|UMLS:C0026827_muscle hypotonia^UMLS:C0241938_hypotonic: 0.0149253731343284|UMLS:C0027066_myoclonus: 0.0149253731343284|UMLS:C0030232_pallor: 0.0149253731343284|UMLS:C0034880_hyperacusis: 0.0149253731343284|UMLS:C0038450_stridor: 0.0149253731343284|UMLS:C0149758_poor dentition: 0.0149253731343284|UMLS:C0151706_bleeding of vagina: 0.0149253731343284|UMLS:C0151878_qt interval prolonged: 0.0149253731343284|UMLS:C0221470_aphagia: 0.0149253731343284|UMLS:C0231221_asymptomatic: 0.0149253731343284|UMLS:C0232118_pulsus paradoxus: 0.0149253731343284|UMLS:C0232488_colic abdominal: 0.0149253731343284|UMLS:C0232854_slowing of urinary stream: 0.0149253731343284|UMLS:C0233070_para 1: 0.0149253731343284|UMLS:C0234133_extrapyramidal sign: 0.0149253731343284|UMLS:C0234238_ache: 0.0149253731343284|UMLS:C0235129_feeling strange: 0.0149253731343284|UMLS:C0235198_unable to concentrate: 0.0149253731343284|UMLS:C0235634_renal angle tenderness: 0.0149253731343284|UMLS:C0239133_hacking cough: 0.0149253731343284|UMLS:C0239301_estrogen use: 0.0149253731343284|UMLS:C0239981_hypoalbuminemia: 0.0149253731343284|UMLS:C0240962_scleral icterus: 0.0149253731343284|UMLS:C0241157_pustule: 0.0149253731343284|UMLS:C0241235_sputum purulent: 0.0149253731343284|UMLS:C0242143_uncoordination: 0.0149253731343284|UMLS:C0264576_mediastinal shift: 0.0149253731343284|UMLS:C0271202_hemianopsia homonymous: 0.0149253731343284|UMLS:C0278014_decreased stool caliber: 0.0149253731343284|UMLS:C0337672_nonsmoker: 0.0149253731343284|UMLS:C0338656_impaired cognition: 0.0149253731343284|UMLS:C0392162_clammy skin: 0.0149253731343284|UMLS:C0392701_giddy mood: 0.0149253731343284|UMLS:C0424295_behavior hyperactive: 0.0149253731343284|UMLS:C0424530_absences finding: 0.0149253731343284|UMLS:C0425560_cardiovascular finding^UMLS:C1320716_cardiovascular event: 0.0149253731343284|UMLS:C0427108_general unsteadiness: 0.0149253731343284|UMLS:C0429562_superimposition: 0.0149253731343284|UMLS:C0520887_st segment depression: 0.0149253731343284|UMLS:C0541798_awakening early: 0.0149253731343284|UMLS:C0541911_dullness: 0.0149253731343284|UMLS:C0542073_lip smacking: 0.0149253731343284|UMLS:C0554980_moody: 0.0149253731343284|UMLS:C0558195_wheelchair bound: 0.0149253731343284|UMLS:C0578859_disturbed family: 0.0149253731343284|UMLS:C0600142_hot flush: 0.0149253731343284|UMLS:C0694547_systolic ejection murmur: 0.0149253731343284|UMLS:C0741302_atypia: 0.0149253731343284|UMLS:C0748706_side pain: 0.0149253731343284|UMLS:C0848277_room spinning: 0.0149253731343284|UMLS:C0848340_stuffy nose: 0.0149253731343284|UMLS:C0871754_frail: 0.0149253731343284|UMLS:C0872410_posturing: 0.0149253731343284|UMLS:C0917799_hypersomnia: 0.0149253731343284|UMLS:C1511606_cystic lesion: 0.0149253731343284|UMLS:C0000727_abdomen acute: 0.00746268656716418|UMLS:C0001175_acquired immuno-deficiency syndrome^UMLS:C0019682_HIV^UMLS:C0019693_hiv infections: 0.00746268656716418|UMLS:C0001418_adenocarcinoma: 0.00746268656716418|UMLS:C0001511_adhesion: 0.00746268656716418|UMLS:C0001973_chronic alcoholic intoxication: 0.00746268656716418|UMLS:C0002395_Alzheimers disease,UMLS:C0013132_drool,UMLS:C0085631_agitation,UMLS:C0028084_nightmare,UMLS:C0035508_rhonchus,UMLS:C0239110_consciousness clear,UMLS:C0235231_pin-point pupils,UMLS:C0425251_bedridden^UMLS:C0741453_bedridden,UMLS:C0871754_frail,UMLS:C0234379_tremor resting,UMLS:C0020461_hyperkalemia,UMLS:C0427055_facial paresis,UMLS:C0541992_groggy,UMLS:C0231530_muscle twitch,UMLS:C0558195_wheelchair bound,UMLS:C0040822_tremor,UMLS:C0010200_cough,UMLS:C0015967_fever,,,,,,,,,,,: 0.00746268656716418|UMLS:C0002871_anemia: 0.00746268656716418|UMLS:C0002895_sickle cell anemia: 0.00746268656716418|UMLS:C0003126_anosmia: 0.00746268656716418|UMLS:C0003507_stenosis aortic valve: 0.00746268656716418|UMLS:C0003537_aphasia: 0.00746268656716418|UMLS:C0003864_arthritis: 0.00746268656716418|UMLS:C0004096_asthma: 0.00746268656716418|UMLS:C0004610_bacteremia: 0.00746268656716418|UMLS:C0005001_benign prostatic hypertrophy: 0.00746268656716418|UMLS:C0005586_bipolar disorder: 0.00746268656716418|UMLS:C0006142_malignant neoplasm of breast^UMLS:C0678222_carcinoma breast: 0.00746268656716418|UMLS:C0006266_spasm bronchial: 0.00746268656716418|UMLS:C0006277_bronchitis: 0.00746268656716418|UMLS:C0006318_bruit: 0.00746268656716418|UMLS:C0006826_malignant neoplasms^UMLS:C1306459_primary malignant neoplasm: 0.00746268656716418|UMLS:C0006826_malignant neoplasms: 0.00746268656716418|UMLS:C0006840_candidiasis^UMLS:C0006849_oral candidiasis: 0.00746268656716418|UMLS:C0007097_carcinoma: 0.00746268656716418|UMLS:C0007102_malignant tumor of colon^UMLS:C0699790_carcinoma colon: 0.00746268656716418|UMLS:C0007642_cellulitis: 0.00746268656716418|UMLS:C0007787_transient ischemic attack: 0.00746268656716418|UMLS:C0007859_pain neck: 0.00746268656716418|UMLS:C0008301_choke: 0.00746268656716418|UMLS:C0008325_cholecystitis: 0.00746268656716418|UMLS:C0008350_cholelithiasis^UMLS:C0242216_biliary calculus: 0.00746268656716418|UMLS:C0009319_colitis: 0.00746268656716418|UMLS:C0009676_confusion: 0.00746268656716418|UMLS:C0010054_coronary arteriosclerosis^UMLS:C0010068_coronary heart disease: 0.00746268656716418|UMLS:C0011127_decubitus ulcer: 0.00746268656716418|UMLS:C0011168_deglutition disorder: 0.00746268656716418|UMLS:C0011175_dehydration: 0.00746268656716418|UMLS:C0011206_delirium: 0.00746268656716418|UMLS:C0011253_delusion: 0.00746268656716418|UMLS:C0011570_depression mental^UMLS:C0011581_depressive disorder: 0.00746268656716418|UMLS:C0011847_diabetes: 0.00746268656716418|UMLS:C0011880_ketoacidosis diabetic: 0.00746268656716418|UMLS:C0012813_diverticulitis: 0.00746268656716418|UMLS:C0013405_paroxysmal dyspnea: 0.00746268656716418|UMLS:C0014118_endocarditis: 0.00746268656716418|UMLS:C0014544_epilepsy: 0.00746268656716418|UMLS:C0014549_tonic-clonic epilepsy^UMLS:C0494475_tonic-clonic seizures: 0.00746268656716418|UMLS:C0015230_exanthema: 0.00746268656716418|UMLS:C0016512_pain foot: 0.00746268656716418|UMLS:C0016927_gag: 0.00746268656716418|UMLS:C0017152_gastritis: 0.00746268656716418|UMLS:C0017160_gastroenteritis: 0.00746268656716418|UMLS:C0017168_gastroesophageal reflux disease: 0.00746268656716418|UMLS:C0017601_glaucoma: 0.00746268656716418|UMLS:C0018099_gout: 0.00746268656716418|UMLS:C0018801_failure heart: 0.00746268656716418|UMLS:C0018802_failure heart congestive: 0.00746268656716418|UMLS:C0018862_Heberdens node,UMLS:C0240100_jugular venous distention,UMLS:C0013404_dyspnea,UMLS:C0038990_sweat^UMLS:C0700590_sweating increased,UMLS:C0376405_patient non compliance,UMLS:C0235710_chest discomfort,UMLS:C0020461_hyperkalemia,UMLS:C0232201_sinus rhythm,UMLS:C0008031_pain chest,UMLS:C0020649_hypotension,UMLS:C0043144_wheezing,,,,,,,,,,,,: 0.00746268656716418|UMLS:C0018932_hematochezia: 0.00746268656716418|UMLS:C0018989_hemiparesis: 0.00746268656716418|UMLS:C0019112_hemorrhoids: 0.00746268656716418|UMLS:C0019158_hepatitis: 0.00746268656716418|UMLS:C0019163_hepatitis B: 0.00746268656716418|UMLS:C0019196_hepatitis C: 0.00746268656716418|UMLS:C0019204_primary carcinoma of the liver cells: 0.00746268656716418|UMLS:C0019270_hernia: 0.00746268656716418|UMLS:C0019291_hernia hiatal: 0.00746268656716418|UMLS:C0020433_hyperbilirubinemia: 0.00746268656716418|UMLS:C0020443_hypercholesterolemia: 0.00746268656716418|UMLS:C0020456_hyperglycemia: 0.00746268656716418|UMLS:C0020458_hyperhidrosis disorder: 0.00746268656716418|UMLS:C0020473_hyperlipidemia: 0.00746268656716418|UMLS:C0020538_hypertensive disease: 0.00746268656716418|UMLS:C0020542_hypertension pulmonary: 0.00746268656716418|UMLS:C0020578_hyperventilation: 0.00746268656716418|UMLS:C0020580_hypesthesia: 0.00746268656716418|UMLS:C0020615_hypoglycemia: 0.00746268656716418|UMLS:C0020639_hypoproteinemia: 0.00746268656716418|UMLS:C0020676_hypothyroidism: 0.00746268656716418|UMLS:C0021167_incontinence: 0.00746268656716418|UMLS:C0021311_infection: 0.00746268656716418|UMLS:C0021400_influenza: 0.00746268656716418|UMLS:C0022116_ischemia: 0.00746268656716418|UMLS:C0022658_kidney disease: 0.00746268656716418|UMLS:C0022660_kidney failure acute: 0.00746268656716418|UMLS:C0022661_chronic kidney failure: 0.00746268656716418|UMLS:C0023222_pain in lower limb: 0.00746268656716418|UMLS:C0023267_fibroid tumor: 0.00746268656716418|UMLS:C0024031_low back pain: 0.00746268656716418|UMLS:C0024117_chronic obstructive airway disease: 0.00746268656716418|UMLS:C0024228_lymphatic diseases: 0.00746268656716418|UMLS:C0024299_lymphoma: 0.00746268656716418|UMLS:C0024713_manic disorder: 0.00746268656716418|UMLS:C0025202_melanoma: 0.00746268656716418|UMLS:C0026266_mitral valve insufficiency: 0.00746268656716418|UMLS:C0026961_mydriasis: 0.00746268656716418|UMLS:C0027051_myocardial infarction: 0.00746268656716418|UMLS:C0027627_neoplasm metastasis: 0.00746268656716418|UMLS:C0027651_neoplasm: 0.00746268656716418|UMLS:C0027947_neutropenia: 0.00746268656716418|UMLS:C0028754_obesity: 0.00746268656716418|UMLS:C0028756_obesity morbid: 0.00746268656716418|UMLS:C0029408_degenerative polyarthritis: 0.00746268656716418|UMLS:C0029443_osteomyelitis: 0.00746268656716418|UMLS:C0029456_osteoporosis: 0.00746268656716418|UMLS:C0030305_pancreatitis: 0.00746268656716418|UMLS:C0030312_pancytopenia: 0.00746268656716418|UMLS:C0030318_panic: 0.00746268656716418|UMLS:C0030567_parkinson disease: 0.00746268656716418|UMLS:C0030920_ulcer peptic: 0.00746268656716418|UMLS:C0031039_effusion pericardial^UMLS:C1253937_pericardial effusion body substance: 0.00746268656716418|UMLS:C0031212_personality disorder: 0.00746268656716418|UMLS:C0032285_pneumonia: 0.00746268656716418|UMLS:C0032290_pneumonia aspiration: 0.00746268656716418|UMLS:C0032305_Pneumocystis carinii pneumonia: 0.00746268656716418|UMLS:C0032326_pneumothorax: 0.00746268656716418|UMLS:C0032739_: 0.00746268656716418|UMLS:C0032781_posterior rhinorrhea: 0.00746268656716418|UMLS:C0033975_psychotic disorder: 0.00746268656716418|UMLS:C0034063_edema pulmonary: 0.00746268656716418|UMLS:C0034065_embolism pulmonary: 0.00746268656716418|UMLS:C0034067_emphysema pulmonary: 0.00746268656716418|UMLS:C0034186_pyelonephritis: 0.00746268656716418|UMLS:C0035078_failure kidney: 0.00746268656716418|UMLS:C0036341_schizophrenia: 0.00746268656716418|UMLS:C0036396_sciatica: 0.00746268656716418|UMLS:C0036690_septicemia^UMLS:C0243026_systemic infection^UMLS:C1090821_sepsis (invertebrate): 0.00746268656716418|UMLS:C0037383_sneeze: 0.00746268656716418|UMLS:C0037580_soft tissue swelling: 0.00746268656716418|UMLS:C0038454_accident cerebrovascular: 0.00746268656716418|UMLS:C0038663_suicide attempt: 0.00746268656716418|UMLS:C0039070_syncope^UMLS:C0312422_blackout^UMLS:C0424533_history of - blackout: 0.00746268656716418|UMLS:C0039239_tachycardia sinus: 0.00746268656716418|UMLS:C0040034_thrombocytopaenia: 0.00746268656716418|UMLS:C0040264_tinnitus: 0.00746268656716418|UMLS:C0040961_tricuspid valve insufficiency: 0.00746268656716418|UMLS:C0041667_underweight^UMLS:C1319518_underweight: 0.00746268656716418|UMLS:C0041912_upper respiratory infection: 0.00746268656716418|UMLS:C0042029_infection urinary tract: 0.00746268656716418|UMLS:C0085096_peripheral vascular disease: 0.00746268656716418|UMLS:C0085584_encephalopathy: 0.00746268656716418|UMLS:C0085635_photopsia: 0.00746268656716418|UMLS:C0085702_monocytosis: 0.00746268656716418|UMLS:C0087086_thrombus: 0.00746268656716418|UMLS:C0149871_deep vein thrombosis: 0.00746268656716418|UMLS:C0149931_migraine disorders: 0.00746268656716418|UMLS:C0156543_abortion: 0.00746268656716418|UMLS:C0205254_sedentary: 0.00746268656716418|UMLS:C0221232_welt: 0.00746268656716418|UMLS:C0231441_immobile: 0.00746268656716418|UMLS:C0231690_titubation: 0.00746268656716418|UMLS:C0232258_pansystolic murmur: 0.00746268656716418|UMLS:C0232267_pericardial friction rub: 0.00746268656716418|UMLS:C0232602_retch: 0.00746268656716418|UMLS:C0232605_regurgitates after swallowing: 0.00746268656716418|UMLS:C0232695_bowel sounds decreased: 0.00746268656716418|UMLS:C0232894_pneumatouria: 0.00746268656716418|UMLS:C0232943_intermenstrual heavy bleeding: 0.00746268656716418|UMLS:C0232997_previous pregnancies 2: 0.00746268656716418|UMLS:C0233071_para 2: 0.00746268656716418|UMLS:C0233467_inappropriate affect: 0.00746268656716418|UMLS:C0233472_affect labile: 0.00746268656716418|UMLS:C0233492_elation: 0.00746268656716418|UMLS:C0233565_bradykinesia: 0.00746268656716418|UMLS:C0234253_rest pain: 0.00746268656716418|UMLS:C0234544_todd paralysis: 0.00746268656716418|UMLS:C0234866_barking cough: 0.00746268656716418|UMLS:C0234979_dysdiadochokinesia: 0.00746268656716418|UMLS:C0235250_hyperemesis: 0.00746268656716418|UMLS:C0235396_hypertonicity: 0.00746268656716418|UMLS:C0237304_noisy respiration: 0.00746268656716418|UMLS:C0240805_prodrome: 0.00746268656716418|UMLS:C0240813_prostate tender: 0.00746268656716418|UMLS:C0242379_malignant neoplasm of lung^UMLS:C0684249_carcinoma of lung: 0.00746268656716418|UMLS:C0262581_no known drug allergies: 0.00746268656716418|UMLS:C0264273_nasal discharge present: 0.00746268656716418|UMLS:C0270844_tonic seizures: 0.00746268656716418|UMLS:C0271276_Stahlis line,UMLS:C0344232_vision blurred,UMLS:C0018681_headache,UMLS:C0848277_room spinning,UMLS:C0039070_syncope,UMLS:C1299586_difficulty,UMLS:C0423982_rambling speech,UMLS:C0233844_clumsiness,,,,,,,,,,,: 0.00746268656716418|UMLS:C0271276_Stahlis line,UMLS:C0581911_heavy legs,UMLS:C0238844_breath sounds decreased,UMLS:C0151315_neck stiffness,UMLS:C0231807_dyspnea on exertion,UMLS:C0010520_cyanosis,UMLS:C0020649_hypotension,UMLS:C0238705_left atrial hypertrophy,,,,,,,,,,,,,,,,,,,: 0.00746268656716418|UMLS:C0271276_Stahlis line,UMLS:C0677500_stinging sensation,UMLS:C0522224_paralyse,UMLS:C0009024_clonus,UMLS:C0427055_facial paresis,,,,,,,,,,: 0.00746268656716418|UMLS:C0277823_charleyhorse: 0.00746268656716418|UMLS:C0277845_retropulsion: 0.00746268656716418|UMLS:C0277873_nasal flaring: 0.00746268656716418|UMLS:C0277899_pulse absent: 0.00746268656716418|UMLS:C0277977_Murphys sign,UMLS:C0016204_flatulence,UMLS:C0232488_colic abdominal,UMLS:C0030193_pain,UMLS:C0003962_ascites,UMLS:C0011991_diarrhea,UMLS:C0151878_qt interval prolonged,UMLS:C0425560_cardiovascular finding^UMLS:C1320716_cardiovascular event,UMLS:C0541992_groggy,UMLS:C0232201_sinus rhythm,UMLS:C0425449_gasping for breath,UMLS:C0009806_constipation,UMLS:C0474505_feces in rectum,UMLS:C0702118_abnormally hard consistency,,,,,,,,,,,: 0.00746268656716418|UMLS:C0278141_excruciating pain: 0.00746268656716418|UMLS:C0278146_shooting pain: 0.00746268656716418|UMLS:C0281825_disequilibrium: 0.00746268656716418|UMLS:C0311395_lameness^UMLS:C1456822_claudication: 0.00746268656716418|UMLS:C0332573_macule: 0.00746268656716418|UMLS:C0344232_vision blurred: 0.00746268656716418|UMLS:C0347938_hypometabolism: 0.00746268656716418|UMLS:C0376358_malignant neoplasm of prostate^UMLS:C0600139_carcinoma prostate: 0.00746268656716418|UMLS:C0392674_exhaustion: 0.00746268656716418|UMLS:C0392699_dysesthesia: 0.00746268656716418|UMLS:C0423571_abnormal sensation: 0.00746268656716418|UMLS:C0424337_hoard: 0.00746268656716418|UMLS:C0424749_feels hot/feverish: 0.00746268656716418|UMLS:C0424790_rigor - temperature-associated observation: 0.00746268656716418|UMLS:C0425488_rapid shallow breathing: 0.00746268656716418|UMLS:C0425491_catching breath: 0.00746268656716418|UMLS:C0427629_rhd positive: 0.00746268656716418|UMLS:C0429091_r wave feature: 0.00746268656716418|UMLS:C0439857_dependence: 0.00746268656716418|UMLS:C0442739_no status change: 0.00746268656716418|UMLS:C0442874_neuropathy: 0.00746268656716418|UMLS:C0443260_milky: 0.00746268656716418|UMLS:C0455204_homicidal thoughts: 0.00746268656716418|UMLS:C0456091_large-for-dates fetus: 0.00746268656716418|UMLS:C0474395_behavior showing increased motor activity: 0.00746268656716418|UMLS:C0476287_breath-holding spell: 0.00746268656716418|UMLS:C0497327_dementia: 0.00746268656716418|UMLS:C0497406_overweight: 0.00746268656716418|UMLS:C0520886_st segment elevation: 0.00746268656716418|UMLS:C0520966_coordination abnormal: 0.00746268656716418|UMLS:C0522224_paralyse: 0.00746268656716418|UMLS:C0522336_rolling of eyes: 0.00746268656716418|UMLS:C0542044_incoherent: 0.00746268656716418|UMLS:C0546817_overload fluid: 0.00746268656716418|UMLS:C0556346_alcohol binge episode: 0.00746268656716418|UMLS:C0558141_transsexual: 0.00746268656716418|UMLS:C0558143_macerated skin: 0.00746268656716418|UMLS:C0558361_sniffle: 0.00746268656716418|UMLS:C0559546_adverse reaction^UMLS:C0879626_adverse effect: 0.00746268656716418|UMLS:C0576456_poor feeding: 0.00746268656716418|UMLS:C0577979_frothy sputum: 0.00746268656716418|UMLS:C0581911_heavy legs: 0.00746268656716418|UMLS:C0581912_heavy feeling: 0.00746268656716418|UMLS:C0700613_anxiety state: 0.00746268656716418|UMLS:C0702118_abnormally hard consistency: 0.00746268656716418|UMLS:C0740844_air fluid level: 0.00746268656716418|UMLS:C0740880_alcoholic withdrawal symptoms: 0.00746268656716418|UMLS:C0751229_hypersomnolence: 0.00746268656716418|UMLS:C0751466_phonophobia: 0.00746268656716418|UMLS:C0751495_focal seizures: 0.00746268656716418|UMLS:C0848621_passed stones: 0.00746268656716418|UMLS:C0857087_dizzy spells: 0.00746268656716418|UMLS:C0857199_red blotches: 0.00746268656716418|UMLS:C0857256_unwell: 0.00746268656716418|UMLS:C0857516_floppy: 0.00746268656716418|UMLS:C0860096_primigravida: 0.00746268656716418|UMLS:C0877040_fear of falling: 0.00746268656716418|UMLS:C0878544_cardiomyopathy: 0.00746268656716418|UMLS:C0948786_blanch: 0.00746268656716418|UMLS:C1135120_breakthrough pain: 0.00746268656716418|UMLS:C1145670_respiratory failure: 0.00746268656716418|UMLS:C1167754_fecaluria: 0.00746268656716418|UMLS:C1258215_ileus: 0.00746268656716418|UMLS:C1291077_abdominal bloating: 0.00746268656716418|UMLS:C1291692_gravida 10: 0.00746268656716418|UMLS:C1313921_urinoma: 0.00746268656716418|UMLS:C1384606_dyspareunia: 0.00746268656716418|UMLS:C1405524_proteinemia: 0.00746268656716418|UMLS:C1444773_throbbing sensation quality: 0.00746268656716418|UMLS:C1456784_paranoia: 0.00746268656716418|UMLS:C1510475_diverticulosis: 0.00746268656716418|UMLS:C1517205_flare: 0.00746268656716418|UMLS:C1565489_insufficiency renal: 0.00746268656716418|UMLS:C1623038_cirrhosis: 0.00746268656716418
As was mentioned above, the most frequents items are shortness of breath (34.3%), pain (29.9%), fever (24.6%) and other items with the high support percent (till ~ 10%). Items with support around 7-10% represent rare symptoms and conditions. If we want to focus on common patterns, we need to set minimum support around 10-15%. Otherwise, we can lower the threshold to 5-7% and include rare patterns.
itemFrequencyPlot(UMLS, support = 0.1)
image(sample(UMLS,70))
Plot represents randomly chosen samples of 70 diseases from the
whole dataset. X axis stands for symptoms we analyse. Y axis stands for
sample of 70 diseases. Black points represent symptoms visible in chosen
diseases. We can observe that nearly vertical lines appear for certain
symptoms, particularly around the 80th, 175th, and 390th symptoms. After
examining the names of these items, they correspond to “fever,” “pain,”
and “shortness of breath.” This observation aligns with the findings
from previous results, confirming their significance.
colnames(UMLS[,81])
## [1] "UMLS:C0015967_fever"
colnames(UMLS[,175])
## [1] "UMLS:C0030193_pain"
colnames(UMLS[,391])
## [1] "UMLS:C0392680_shortness of breath"
The Apriori algorithm is a systematic method for discovering
relationships between items in a dataset. It starts by identifying
frequent individual items (1-itemsets) based on a minimum support
threshold, ensuring only commonly occurring items are considered. The
algorithm then iteratively generates larger itemsets (e.g., 2-itemsets,
3-itemsets) by combining frequent items, continuing until no more
frequent itemsets can be formed. A key step involves pruning infrequent
itemsets using the Apriori Property, which states that if an itemset is
infrequent, all its supersets must also be infrequent—this significantly
reduces computational complexity. Finally, the algorithm generates
association rules from the frequent itemsets, using metrics such as
support, confidence, and lift to assess the strength and relevance of
these relationships.
For the Apriori algorithm, set the support level to 5% to ensure that
only itemsets appearing in at least 5% of transactions are considered.
Set the confidence level to 0.5 to ensure that the rules are reliable,
with at least a 50% probability that the “then” part occurs when the
“if” part happens. Set maxlen to 14 to limit the complexity of the rules
to a maximum of 14 items.
405 separate rules were found.
UMLS_rules <- apriori(UMLS, parameter = list(support = 0.05, confidence = 0.5, maxlen = 14))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 14 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 6
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[528 item(s), 134 transaction(s)] done [0.00s].
## sorting and recoding items ... [85 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [405 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Inspect first 10 rules.
inspect(UMLS_rules[1:10])
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0041834_erythema} => {UMLS:C0038999_swelling} 0.05223881 1.0000000 0.05223881 10.307692 7
## [2] {UMLS:C0038999_swelling} => {UMLS:C0041834_erythema} 0.05223881 0.5384615 0.09701493 10.307692 7
## [3] {UMLS:C0002962_angina pectoris} => {UMLS:C0392680_shortness of breath} 0.05970149 0.8888889 0.06716418 2.589372 8
## [4] {UMLS:C0232292_chest tightness} => {UMLS:C0392680_shortness of breath} 0.05223881 0.8750000 0.05970149 2.548913 7
## [5] {UMLS:C0038990_sweat^UMLS:C0700590_sweating increased} => {UMLS:C0008031_pain chest} 0.06716418 0.9000000 0.07462687 6.347368 9
## [6] {UMLS:C0038990_sweat^UMLS:C0700590_sweating increased} => {UMLS:C0392680_shortness of breath} 0.06716418 0.9000000 0.07462687 2.621739 9
## [7] {UMLS:C0030252_palpitation} => {UMLS:C0008031_pain chest} 0.05223881 0.7777778 0.06716418 5.485380 7
## [8] {UMLS:C0856054_mental status changes} => {UMLS:C0004093_asthenia} 0.05223881 0.7777778 0.06716418 4.737374 7
## [9] {UMLS:C0237154_homelessness} => {UMLS:C0022107_irritable mood} 0.05223881 0.8750000 0.05970149 10.659091 7
## [10] {UMLS:C0022107_irritable mood} => {UMLS:C0237154_homelessness} 0.05223881 0.6363636 0.08208955 10.659091 7
Inspect last 10 rules.
inspect(UMLS_rules[395:405])
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0022107_irritable mood,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0438696_suicidal} => {UMLS:C0085631_agitation} 0.05223881 1.000 0.05223881 7.882353 7
## [2] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0438696_suicidal} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1.000 0.05223881 9.571429 7
## [3] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory} => {UMLS:C0438696_suicidal} 0.05223881 0.875 0.05970149 9.770833 7
## [4] {UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0438696_suicidal} => {UMLS:C0022107_irritable mood} 0.05223881 1.000 0.05223881 12.181818 7
## [5] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0438696_suicidal} => {UMLS:C0233481_worry} 0.05223881 0.875 0.05970149 10.659091 7
## [6] {UMLS:C0022107_irritable mood,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0424109_weepiness,
## UMLS:C0917801_sleeplessness} => {UMLS:C0085631_agitation} 0.05223881 1.000 0.05223881 7.882353 7
## [7] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0424109_weepiness,
## UMLS:C0917801_sleeplessness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1.000 0.05223881 9.571429 7
## [8] {UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0424109_weepiness,
## UMLS:C0917801_sleeplessness} => {UMLS:C0022107_irritable mood} 0.05223881 1.000 0.05223881 12.181818 7
## [9] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0424109_weepiness,
## UMLS:C0917801_sleeplessness} => {UMLS:C0233481_worry} 0.05223881 1.000 0.05223881 12.181818 7
## [10] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1.000 0.05223881 13.400000 7
## [11] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1.000 0.05223881 19.142857 7
From the results above, we can identify the rules that demonstrate the strongest associations based on high confidence and lift values. These rules indicate common or highly related symptoms. However, instead of simply printing all the rules, it would be more informative to focus on those characterized by the highest levels of support, confidence, and lift. As was observed, different rules represent the highest values for each of these three parameters.
inspect(sort(UMLS_rules, by = "support")[1:10])
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0013404_dyspnea} => {UMLS:C0392680_shortness of breath} 0.12686567 0.8095238 0.1567164 2.358178 17
## [2] {UMLS:C0027497_nausea} => {UMLS:C0000737_pain abdominal} 0.11194030 0.6818182 0.1641791 3.383838 15
## [3] {UMLS:C0000737_pain abdominal} => {UMLS:C0027497_nausea} 0.11194030 0.5555556 0.2014925 3.383838 15
## [4] {UMLS:C0008031_pain chest} => {UMLS:C0392680_shortness of breath} 0.10447761 0.7368421 0.1417910 2.146453 14
## [5] {UMLS:C0027497_nausea} => {UMLS:C0042963_vomiting} 0.10447761 0.6363636 0.1641791 3.553030 14
## [6] {UMLS:C0042963_vomiting} => {UMLS:C0027497_nausea} 0.10447761 0.5833333 0.1791045 3.553030 14
## [7] {UMLS:C0085593_chill} => {UMLS:C0015967_fever} 0.10447761 0.7000000 0.1492537 2.842424 14
## [8] {UMLS:C0000737_pain abdominal} => {UMLS:C0030193_pain} 0.10447761 0.5185185 0.2014925 1.737037 14
## [9] {UMLS:C0085619_orthopnea} => {UMLS:C0392680_shortness of breath} 0.09701493 0.9285714 0.1044776 2.704969 13
## [10] {UMLS:C0034642_rale} => {UMLS:C0392680_shortness of breath} 0.09701493 0.7222222 0.1343284 2.103865 13
All the rules listed have relatively high support, meaning they appear frequently in the dataset. Rule 1 indicates dyspnea and shortness of breath, that occur together in 12.7% of the transactions(quite logical since dyspnea also associated with shortness of breath). Rule 1 has a high confidence of 80.95%,showing that in most cases where dyspnea is present, shortness of breath also occurs. And high value of lift reveals a significant association. We can notice quite the same patterns for the Rules 4, 9 and 10. Those rules show medically relevant insights, that align with known conditions, such as heart or respiratory issues. Rule 2 represents nausea and abdominal pain occur together in 11.2% of the transactions with confidence around 68%. Interestingly, that Rule 3 has lower confidence of 55.6% where abdominal pain and nausea occur, suggesting nausea is less consistently associated with abdominal pain than the reverse direction. Both rules have lift of 3.38, showing quite a strong relationship.The same pattern we can observe for the Rule 5 and Rule 6. Those rules suggest a strong link to gastrointestinal issues. The rule, also with a high confidence value and strong association, represents the conditions like infections or fevers caused by systemic illnesses.
inspect(sort(UMLS_rules, by = "confidence")[1:10])
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0041834_erythema} => {UMLS:C0038999_swelling} 0.05223881 1 0.05223881 10.307692 7
## [2] {UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.400000 7
## [3] {UMLS:C0424109_weepiness} => {UMLS:C0233481_worry} 0.05223881 1 0.05223881 12.181818 7
## [4] {UMLS:C0424109_weepiness} => {UMLS:C0022107_irritable mood} 0.05223881 1 0.05223881 12.181818 7
## [5] {UMLS:C0424109_weepiness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
## [6] {UMLS:C0424109_weepiness} => {UMLS:C0085631_agitation} 0.05223881 1 0.05223881 7.882353 7
## [7] {UMLS:C0022107_irritable mood,
## UMLS:C0237154_homelessness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
## [8] {UMLS:C0233762_hallucinations auditory,
## UMLS:C0237154_homelessness} => {UMLS:C0022107_irritable mood} 0.05223881 1 0.05223881 12.181818 7
## [9] {UMLS:C0424109_weepiness,
## UMLS:C0917801_sleeplessness} => {UMLS:C0233481_worry} 0.05223881 1 0.05223881 12.181818 7
## [10] {UMLS:C0233481_worry,
## UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.400000 7
All the top 10 rules have a confidence of 1, meaning the RHS (then-part) always occurs when the LHS (if-part) occurs. The rules highlight that symptoms on the RHS are guaranteed if the symptoms on the LHS are presented. It is also proved by high lift values (from 7.88 to 13.4).However, the support is around 5.2%, meaning they are presented in about 7 transactions out of the 134 in the dataset. Another interesting note, that the top rules by confidence heavily feature symptoms like weepiness, sleeplessness, worry, and agitation, which are typically associated with mental health conditions such as depression or anxiety. The results differ significantly from the top rules by support because support prioritizes frequency and confidence prioritizes reliability.
inspect(sort(UMLS_rules, by = "lift")[1:10])
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0233481_worry,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [2] {UMLS:C0085631_agitation,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [3] {UMLS:C0022107_irritable mood,
## UMLS:C0233481_worry,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [4] {UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [5] {UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [6] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [7] {UMLS:C0085631_agitation,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [8] {UMLS:C0022107_irritable mood,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [9] {UMLS:C0022107_irritable mood,
## UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
## [10] {UMLS:C0085631_agitation,
## UMLS:C0233481_worry,
## UMLS:C0233762_hallucinations auditory,
## UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 1 0.05223881 19.14286 7
From the top rules order by the highest lift values, all 10 rules have one the same RHS - weepiness. The LHS of these rules includes combinations of multiple symptoms related to mental health conditions. Weepiness likely has a unique statistical profile in the dataset. It occurs infrequently (low support reveals it) but is strongly tied to specific combinations of symptoms.The symptoms on the LHS are good predictors of weepiness, resulting in high confidence and lift. All rules have a lift of 19.14, which is quite high. This means that the presence of the symptoms in the LHS makes weepiness approximately 19 times more likely than it would be by chance. The confidence for all 10 rules is 1, meaning that when the symptoms on the LHS occur, weepiness is guaranteed to occur. However, the support for all these rules is relatively low (5.2%) and applies to 7 transactions as it was for the top 10 rules grouped by confidence.
plot(UMLS_rules)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
From the plot above we can notice, that mostly rules are grouped in
small clusters based on the exact support and confident values. There is
a cluster of rules at the top of the plot (confidence = 1). These rules
have perfect confidence, meaning the RHS is always true when the LHS
occurs. We can notice, that most rules have support values between 5%
and 7%. I can assume, that this happens because symptoms and conditions
are specific to certain cases in dataset and this is a characteristic
issue in healthcare.
Now I would like to address some questions of interest regarding the
relationships between symptoms.
What are the most reliable predictors of sleeplessness (insomnia)?
rules_Insomnia <-apriori(data=UMLS, parameter=list(supp=0.05,conf = 0.5),
appearance=list(default="lhs", rhs="UMLS:C0917801_sleeplessness"), control=list(verbose=F))
rules_Insomnia.byconf<-sort(rules_Insomnia, by="confidence", decreasing=TRUE)
inspect(head(rules_Insomnia.byconf))
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.4 7
## [2] {UMLS:C0233481_worry,
## UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.4 7
## [3] {UMLS:C0022107_irritable mood,
## UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.4 7
## [4] {UMLS:C0233762_hallucinations auditory,
## UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.4 7
## [5] {UMLS:C0085631_agitation,
## UMLS:C0424109_weepiness} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.4 7
## [6] {UMLS:C0022107_irritable mood,
## UMLS:C0150041_feeling hopeless} => {UMLS:C0917801_sleeplessness} 0.05223881 1 0.05223881 13.4 7
plot(rules_Insomnia, method="grouped")
From these rules, I was particularly surprised by the inclusion of
the symptom feeling hopeless. To be honest, I hadn’t known it was
categorized as a symptom before working on this project. Another symptom
gave me idea for the following question:
Are auditory hallucinations more strongly associated with specific mood disturbances?
rules_hallucinations <-apriori(data=UMLS, parameter=list(supp=0.05,conf = 0.5),
appearance=list(default="lhs", rhs="UMLS:C0233762_hallucinations auditory"),
control=list(verbose=F))
rules_hallucinations.byconf<-sort(rules_hallucinations , by="confidence", decreasing=TRUE)
inspect(head(rules_hallucinations.byconf))
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0424109_weepiness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
## [2] {UMLS:C0022107_irritable mood,
## UMLS:C0237154_homelessness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
## [3] {UMLS:C0424109_weepiness,
## UMLS:C0917801_sleeplessness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
## [4] {UMLS:C0233481_worry,
## UMLS:C0424109_weepiness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
## [5] {UMLS:C0022107_irritable mood,
## UMLS:C0424109_weepiness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
## [6] {UMLS:C0085631_agitation,
## UMLS:C0424109_weepiness} => {UMLS:C0233762_hallucinations auditory} 0.05223881 1 0.05223881 9.571429 7
plot(rules_hallucinations, method="grouped")
We can see now that auditory hallucinations are often accompanied
by symptoms such as irritable mood, agitation, weepiness, sleeplessness,
and homelessness.
In this project, I would like to focus on analyzing the symptom sleeplessness (insomnia) as both an antecedent and a consequent to uncover its relationships with other symptoms and diseases. This analysis is particularly relevant during stressful periods, such as exam seasons, where sleeplessness can significantly impact well-being. To begin, I study sleeplessness as a consequent to identify which related symptoms and conditions commonly lead to its occurrence.
plot(rules_Insomnia)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
We can observe 38 rules which achieve at least 5% support and 50%
confidence. We also can notice that some of those rules have quite high
lift values. 38 of all rules meet those conditions.
plot(rules_Insomnia, method="graph")
As we mentioned earlier, weepiness and worry prominently connected to
sleeplessness. These are likely strong predictors of sleeplessness due
to their frequent co-occurrence and high lift. Interesting observation
that feeling hopeless is a more frequently occurring symptom in the
transactions where sleeplessness is a consequent. Symptoms like
hallucinations auditory, irritable mood, and agitation have high-lift
relationships with sleeplessness, indicating they are significantly more
likely to co-occur with sleeplessness than expected by chance.
plot(rules_Insomnia, method="paracoord", control=list(reorder=TRUE))
All lines converge at the RHS (UMLS:C0917801_sleeplessness), confirming
that sleeplessness is the consequent for all rules. We can observe, that
sleeplessness often results from a combination of symptoms rather than a
single cause.
Now I turn to sleeplessness as an antecedent.
rules_Insomnia <-apriori(data=UMLS, parameter=list(supp=0.05,conf = 0.5),
appearance=list(default="rhs", lhs="UMLS:C0917801_sleeplessness"), control=list(verbose=F))
rules_Insomnia.byconf<-sort(rules_Insomnia, by="confidence", decreasing=TRUE)
inspect(head(rules_Insomnia.byconf))
## lhs rhs support confidence coverage lift count
## [1] {UMLS:C0917801_sleeplessness} => {UMLS:C0150041_feeling hopeless} 0.05970149 0.8 0.07462687 9.745455 8
## [2] {UMLS:C0917801_sleeplessness} => {UMLS:C0022107_irritable mood} 0.05970149 0.8 0.07462687 9.745455 8
## [3] {UMLS:C0917801_sleeplessness} => {UMLS:C0233762_hallucinations auditory} 0.05970149 0.8 0.07462687 7.657143 8
## [4] {UMLS:C0917801_sleeplessness} => {UMLS:C0424109_weepiness} 0.05223881 0.7 0.07462687 13.400000 7
## [5] {UMLS:C0917801_sleeplessness} => {UMLS:C0233481_worry} 0.05223881 0.7 0.07462687 8.527273 7
## [6] {UMLS:C0917801_sleeplessness} => {UMLS:C0438696_suicidal} 0.05223881 0.7 0.07462687 7.816667 7
plot(rules_Insomnia)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(rules_Insomnia, method="graph")
plot(rules_Insomnia, method="paracoord", control=list(reorder=TRUE))
After the application of the Apriori algorithm, focusing on
sleeplessness as the antecedent, the algorithm identified 6 association
rules. The results indicate that sleeplessness is strongly associated
with symptoms such as hallucinations auditory, irritable mood, and
feeling hopeless. The last one we can experience in stressful scenarios
like exams. Additionally, it was noted that these symptoms could serve
as consequences of sleeplessness and also as antecedents, highlighting
bidirectional relationships.
In this project I studied the application of asssociation rule mining on a disease-symptom dataset to uncover meaningful patterns and relationships. Using the Apriori algorithm, the analysis aimed to identify frequent symptom associations and evaluate the number and quality of derived rules. A particular focus was placed on the symptom sleeplessness, examining its role as both an antecedent (cause) and a consequent (effect) within the identified rules. This analysis provided insights into how sleeplessness is linked to other symptoms, such as auditory hallucinations, irritable mood, and feeling hopeless, highlighting its potential significance in the context of disease progression, diagnosis or our concerns.