First, we import the datasets into 3 separate data-frames.
# begin: load files
demog <- read.csv("dem.csv", header = TRUE, sep = ",")
diag <- read.csv("dia.csv", header = TRUE, sep = ",")
EDvis <- read.table("ed_visits.txt", header = TRUE, sep = "$")
# include R packages
library(ggplot2)
library(dplyr)
library(scales)
library(knitr)
The patients present a mix of diverse races, age-groups and representation from both sexes. The following plots visually describe the proportions of these mixtures.
# Bar plot to describe sex grouped by race
sexRacePlot <- ggplot(data = demog, aes(gender, fill = race)) + geom_bar(alpha = 0.7) + coord_flip()
sexRacePlot
This plot yields two important obervations.
Next, we visualize the age-mix combined with race using a histogram, as well as combined with gender using a box-plot:
# histogram
ageplotHist <- ggplot(data = demog, aes(x = demog$age, fill = race)) + geom_histogram(bins = 9, col = "white", alpha = 0.8)
ageplotHist
# box-plot
ageplotBox <- ggplot(data = demog, aes(x = gender, y = age)) + geom_boxplot() + coord_flip()
ageplotBox
We note that population skews to being older, with the largest sub-group being over 90 years of age, and a generally increasing trend of number of patients with older age. It is interesting to note, however, that both race and sex seem to be uniformly represented.
After playing around with the data (aka, fooling around and failing) in different ways, I have decided to focus on diseases. Before doing anything, I wanted to see the spread of diseases. From the data, it seems like the number of diseases diagnosed is:
length(levels(diag$dia_code))
## [1] 688
Since there were so many of them to work with, I just wanted to see a landscape of what they looked like for the population at large, and see which ones affected patients disproportionately.
# create combined data frame of the patients and their diagnoses
patientDiagnoses <- merge(demog, diag, by = "empi", all = FALSE)
# plot
diseasePlot <- ggplot(data = patientDiagnoses, aes(dia_code, fill = race)) + geom_bar()
diseasePlot <- diseasePlot + theme(legend.position = "bottom", axis.text.x = element_blank())
diseasePlot
While there is too much going on in this plot, it certainly seems there there are some diseases that tend to affect a lot of patients. Let us try to find which these might be.
kable(summary(patientDiagnoses$dia_name), col.names = 'Patient Count')
| Patient Count | |
|---|---|
| Other and unspecified hyperlipidemia | 590 |
| Abdominal pain, unspecified site | 226 |
| Other malignant lymphomas, unspecified site, extranodal and solid organ sites | 202 |
| Malignant neoplasm of breast (female), unspecified | 144 |
| Acute reaction to stress | 119 |
| Other acute reactions to stress | 118 |
| Panic disorder without agoraphobia | 118 |
| Unspecified acute reaction to stress | 115 |
| Pain in joint involving lower leg | 113 |
| Hysteria, unspecified | 110 |
| Generalized anxiety disorder | 107 |
| Anxiety, dissociative and somatoform disorders | 103 |
| Other anxiety states | 103 |
| Diabetes mellitus type II [non-insulin dependent type] [NIDDM type] [adult-onset type] or unspecified type, not stated as uncontrolled, with unspecified complication | 98 |
| Hypertensive chronic kidney disease, benign, with chronic kidney disease stage V or end stage renal disease | 84 |
| Hyposmolality and/or hyponatremia | 82 |
| Cirrhosis of liver without mention of alcohol | 77 |
| Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease | 75 |
| Other seborrheic keratosis | 53 |
| Other malignant neoplasm of skin of other and unspecified parts of face | 50 |
| Esophageal reflux | 45 |
| Other paralytic syndromes | 45 |
| Chronic pulmonary heart disease | 43 |
| Mylagia and myositis, unspecified | 43 |
| Secondary malignant neoplasm of other specified sites | 43 |
| Malignant neoplasm without specification of site | 41 |
| Secondary and unspecified malignant neoplasm of lymph nodes | 40 |
| Malignant neoplasm of brain, unspecified | 38 |
| Unspecified disease of pulmonary circulation | 38 |
| Embolism and thrombosis of other specified veins | 37 |
| Secondary malignant neoplasm of respiratory and digestive systems | 36 |
| Swelling, mass, or lump in chest | 36 |
| Chronic pulmonary heart disease, unspecified | 35 |
| Intracerebral hemorrhage | 35 |
| Acute or unspecified hepatitis C without mention of hepatic coma | 33 |
| Atherosclerosis of native arteries of the extremities, unspecified | 32 |
| Pain in joint, site unspecified | 32 |
| Adjustment reaction with prolonged depressive reaction | 31 |
| Hemiplegia and hemiparesis | 31 |
| Cognitive deficits as late effect of cerebrovascular disease | 30 |
| Unspecified essential hypertension | 30 |
| Benign hypertensive heart disease without heart failure | 29 |
| Other second degree atrioventricular block | 28 |
| Acute cor pulmonale | 27 |
| Alcohol-induced psychotic disorder with delusions | 27 |
| Chronic hepatitis C without mention of hepatic coma | 27 |
| Atrioventricular block, unspecified | 26 |
| First degree atrioventricular block | 26 |
| Secondary malignant neoplasm of ovary | 26 |
| Unspecified gastritis and gastroduodenitis, without mention of hemorrhage | 26 |
| Alcohol-induced persisting dementia | 25 |
| Benign essential hypertension | 25 |
| Neurogenic bladder NOS | 25 |
| Epilepsy, unspecified, without mention of intractable epilepsy | 24 |
| Other conditions of brain | 24 |
| Other severe protein-calorie malnutrition | 24 |
| Alcohol-induced persisting amnestic disorder | 23 |
| Chest pain, unspecified | 22 |
| Complications of transplanted liver | 22 |
| Nontoxic multinodular goiter | 22 |
| Nutritional marasmus | 22 |
| Human immunodeficiency virus [HIV] disease | 21 |
| Pain in limb | 21 |
| Cellulitis and abscess of upper arm and forearm | 20 |
| Chronic viral hepatitis B without mention of hepatic coma without mention of hepatitis delta | 20 |
| Subarachnoid hemorrhage | 20 |
| Atherosclerosis of aorta | 19 |
| Other chronic pulmonary heart diseases | 19 |
| Abdominal pain, other specified site | 18 |
| Acidosis | 18 |
| Diabetes mellitus type II [non-insulin dependent type] [NIDDM type] [adult-onset type] or unspecified type, not stated as uncontrolled, with ophthalmic manifestations | 18 |
| Headache | 18 |
| Secondary and unspecified malignant neoplasm of lymph nodes of axilla and upper limb | 18 |
| Acute myeloid leukemia, in relapse | 17 |
| Esophageal varices with bleeding | 17 |
| Kwashiorkor | 17 |
| Epistaxis | 16 |
| Hematemesis | 16 |
| Other and unspecified protein-calorie malnutrition | 16 |
| Thoracic or lumbosacral neuritis or radiculitis, unspecified | 16 |
| Pain in thoracic spine | 15 |
| Sensorineural hearing loss of combined types | 15 |
| Shortness of breath | 15 |
| Atherosclerosis of renal artery | 14 |
| Hyperpotassemia | 14 |
| Leukocytosis, unspecified | 14 |
| Other malaise and fatigue | 14 |
| Subdural hemorrhage following injury, without mention of open intracranial wound, with state of consciousness unspecified | 14 |
| Aftercare following joint replacement | 13 |
| Lumbago | 13 |
| Chronic viral hepatitis B without mention of hepatic coma with hepatitis delta | 12 |
| Congenital deficiency of other clotting factors | 12 |
| Dizziness and giddiness | 12 |
| Mixed acid-base balance disorder | 12 |
| Nocturia | 12 |
| Other and unspecified coagulation defects | 12 |
| Other specified personal history presenting hazards to health | 12 |
| Syncope and collapse | 12 |
| Anxiety state, unspecified | 11 |
| (Other) | 1339 |
The conditions that find their way onto this table are fairly diverse, from heart conditions to abdominal pain. Given the age demopgrahic at hand, many of the conditions affect the body directly and seem fairly reasonable. > What piqued my interest is the prevalence of conditions of stress / anxiety related conditions, i.e. mental health afflictions near the very top of the table.
So next, let’s do deep dive into these conditions to figure out who they’re affecting, how they’re being treated, and in general, what may be going on.
In doing this, I began by gathering some basic domain knowledge on how the ICD-9 codes work, and how diseases are systematically grouped together accordingly. In order to group the diseases from the given data, I did some basic data wrangling to classify the diagnosis that qualify as mental health conditions (ICD codes 290-319)
# select all records with patients who've been diagnosed with mental health conditions
mentalHealthPatients <- patientDiagnoses[which(as.numeric(patientDiagnoses$dia_code) >= 290 & as.numeric(patientDiagnoses$dia_code) < 320), ]
It is extremely interesting that while the number of diagnoses of mental health conditions as observed from the previous table is well over 400, there are only
length(mentalHealthPatients)
## [1] 14
unique patients with mental health disorders, which suggests that a number of these disorders occur together in patients. I now seek to answer who these patients might be.
# age and race
ageplotHist2 <- ggplot(data = mentalHealthPatients, aes(x = age, fill = race)) + geom_histogram(bins = 9, col = "white", alpha = 0.8)
ageplotHist2
# population
ageplotHist
How does this distribution compare with the cohort’s overall patient population?
# age and sex
ageplotHist3 <- ggplot(data = mentalHealthPatients, aes(x = age, fill = gender)) + geom_histogram(bins = 9, col = "white", alpha = 0.8)
ageplotHist3
The gender distribution is rather even and shows no particularly interesting outliers.
mhID <- mentalHealthPatients$empi
mentalHealth <- filter(patientDiagnoses, empi %in% mhID)
nrow(mentalHealth)
## [1] 1843
This is truly strange: there are 1843 records of patients who suffer from mental health conditions, but only 132 such patients.
This makes me curious as to what non-mental health conditions they may be suffering from, the stress from some of which, might even have led to the mental health troubles. Let us find out.
# top 5 accompanying conditions
kable(mentalHealth %>% count(dia_name, sort = TRUE) %>% top_n(5))
## Selecting by n
| dia_name | n |
|---|---|
| Other and unspecified hyperlipidemia | 181 |
| Other malignant lymphomas, unspecified site, extranodal and solid organ sites | 70 |
| Abdominal pain, unspecified site | 65 |
| Malignant neoplasm of breast (female), unspecified | 55 |
| Esophageal reflux | 45 |
The condition most commonly accompanying mental health disorders is hyperlipidemia, which, as Google tells me, is the presence of a high level of fats in the blood. Also on the list are malignant lymphomas and neoplasm of breas, both referring to cancer conditions. There are also to stomach / digestive conditions, abdominal pain and esophageal reflux.
This list is interesting and insightful. However, it isn’t incredibly surprising. If asked to guess, I would say that people with hyperlipidemia and esophegal reflux tend to suffer from mental disorders due to an unhealthy and stressful lifestyle (think long hours at work, with no exercise and fast food for meals). For the other group, with cancerous conditions, it is possible that the onset of cancer could’ve provoked severe stress, anxiety, and, consequently, ill mental health.