Libraries used in the Assignment

library(texreg)
library(stargazer)
library(ggplot2)
library(ggthemes)
library(rCharts)
library(plotly)
library(babynames)
library(Zelig)
library(readr)
library(dplyr)
library(sjmisc)
library(DescTools)
library(plyr)
library(magrittr)
library(tidyverse)
library(Cite)

Loading data from the CSV file into R

d_csv <- read_csv("C:/Users/AHMED/Desktop/noShowDoc.csv", col_names = TRUE)
head(d_csv)
## # A tibble: 6 × 16
##     Age Gender AppointmentRegistration ApointmentData DayOfTheWeek  Status
##   <int>  <chr>                  <dttm>         <dttm>        <chr>   <chr>
## 1    19      M     2014-12-16 14:46:25     2015-01-14    Wednesday Show-Up
## 2    24      F     2015-08-18 07:01:26     2015-08-19    Wednesday Show-Up
## 3     4      F     2014-02-17 12:53:46     2014-02-18      Tuesday Show-Up
## 4     5      M     2014-07-23 17:02:11     2014-08-07     Thursday Show-Up
## 5    38      M     2015-10-21 15:20:09     2015-10-27      Tuesday Show-Up
## 6     5      F     2014-06-17 06:47:27     2014-07-22      Tuesday No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## #   HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## #   Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>

Creating Reference variable for DayOfTheWeek

d_csv$DayOfTheWeek <- recode(d_csv$DayOfTheWeek, "Sunday" = 1, "Monday" = 2, "Tuesday" =3, "Wednesday" = 4, "Thursday" = 5, "Friday" = 6, "Saturday" = 7)

Factoring the Reference variable into the dataset

d_csv$DayOfTheWeek <-factor(d_csv$DayOfTheWeek)
head(d_csv)
## # A tibble: 6 × 16
##     Age Gender AppointmentRegistration ApointmentData DayOfTheWeek  Status
##   <int>  <chr>                  <dttm>         <dttm>       <fctr>   <chr>
## 1    19      M     2014-12-16 14:46:25     2015-01-14            4 Show-Up
## 2    24      F     2015-08-18 07:01:26     2015-08-19            4 Show-Up
## 3     4      F     2014-02-17 12:53:46     2014-02-18            3 Show-Up
## 4     5      M     2014-07-23 17:02:11     2014-08-07            5 Show-Up
## 5    38      M     2015-10-21 15:20:09     2015-10-27            3 Show-Up
## 6     5      F     2014-06-17 06:47:27     2014-07-22            3 No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## #   HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## #   Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>

Gender Differences in regards to Age

g <- ggplot(d_csv, mapping = aes(x = Gender, y = Age))
g2 <- g + geom_bin2d()
g2

The following graph above shows the age differences between males and females within the dataset. As the data suggests, there are more older females within the dataset relative to males.The oldest male(s) in the dataset is around 105 years of age while the oldest female is around 115 years of age. Furthermore, there is a higher number of young patients among females compared to male patients. This observation suggests that female patients schedule more medical appoinments than male patients.

Qplot: Male Vs Female Attendance Rate

d_csv3 <- d_csv
d_csv3$Gender <-factor(d_csv3$Gender, labels = c("M", "F"))
d_csv3$DayOfTheWeek <-factor(d_csv3$DayOfTheWeek, labels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
d_csv3$Status <-factor(d_csv3$Status, labels = c("S", "NS"))

qplot(Gender, Age, data = d_csv3, geom="boxplot")+ geom_point() +
 facet_grid(.~DayOfTheWeek, scales="free",space="free") 

The following Qplot above discovers differences in attendance rates of medical appoinments between male patients versus female patients. Out of the seven days of the week, the box plot for Sunday is quite small compared to the other days of the week. This is due to the abscence of females having medical appointments on Sunday. In other words, no female patient in the dataset had a medical appointment on Sunday according to the dataset. Additionally, there are very few males within the datset that have medical appointments scheduled for Saturday. The information from the boxplot indicates that there are more younger female patients who attend medical appointments than males. As shown from Monday through Saturday, the box plot for females are longer which indicates greater variability. Consequently, males are more likely to attend medical appointments overall.

BoxPlot: Male Vs Female Attendance Rate

ds <- ddply(d_csv3, .(Status), summarise, mean = mean(Age), sd = sd(Age))
ggplot()+ 
geom_point(data= d_csv3, aes(x=Status, y=Age)) + scale_y_continuous() + scale_x_discrete() + xlab("Show Versus no Show") +
geom_point(data=ds, aes(x=Status, y = mean, colour = 'red', size = 3)) +
geom_errorbar(data=ds, aes(x = Status, y = mean, ymin = mean - sd, ymax = mean + sd),
                   colour = 'red', width = 0.4) +
facet_grid(Gender ~ DayOfTheWeek, margins = TRUE)
## Warning: Ignoring unknown aesthetics: y

The boxplot created utilizing ggplot provides us with a similar result as the qplot. Although younger females attend more medical appointments, males are more likely to attend medical appointments overall as indicated by the variability from the bars in each individual box. Similar to the previous chart, data for Sunday is not explicit as there are potenital N/a (missing) values in the dataset. Ultimately, males are more likely to show up to their medical appointments although females are slightly more likely to show up on Saturdays.

Descriptive Statistics Table

d_csv1  <- select(d_csv, Stat, Age, Gender, DayOfTheWeek)
  stargazer(d_csv1, summary = TRUE, type = "html")
Statistic N Mean St. Dev. Min Max
stargazer(head(d_csv1), summary = FALSE, type = "html")
Stat Age Gender DayOfTheWeek
1 1 19 M 4
2 1 24 F 4
3 1 4 F 3
4 1 5 M 5
5 1 38 M 3
6 0 5 F 3

Regression Results Table

m0 <- glm(Stat ~ Gender, family = binomial, data = d_csv)
m1 <- glm(Stat ~ Gender + DayOfTheWeek + Age, family = binomial, data = d_csv)
m2 <- glm(Stat ~ Gender*DayOfTheWeek + Age, family = binomial, data = d_csv)
stargazer(m0, m1, m2, type = "html", title="3 Logit Models with glm")
3 Logit Models with glm
Dependent variable:
Stat
(1) (2) (3)
GenderM -0.053*** -0.002 -0.266**
(0.008) (0.009) (0.123)
DayOfTheWeek2 -0.765 -0.765
(1.098) (1.098)
DayOfTheWeek3 -0.610 -0.608
(1.098) (1.098)
DayOfTheWeek4 -0.651 -0.664
(1.098) (1.098)
DayOfTheWeek5 -0.631 -0.634
(1.098) (1.098)
DayOfTheWeek6 -0.695 -0.681
(1.098) (1.098)
DayOfTheWeek7 -1.033 -0.958
(1.099) (1.099)
Age 0.010*** 0.010***
(0.0002) (0.0002)
GenderM:DayOfTheWeek2 0.264**
(0.124)
GenderM:DayOfTheWeek3 0.258**
(0.124)
GenderM:DayOfTheWeek4 0.301**
(0.124)
GenderM:DayOfTheWeek5 0.274**
(0.124)
GenderM:DayOfTheWeek6 0.222*
(0.125)
GenderM:DayOfTheWeek7
Constant 0.853*** 1.151 1.151
(0.005) (1.098) (1.098)
Observations 300,000 300,000 300,000
Log Likelihood -183,854.500 -182,254.400 -182,247.700
Akaike Inf. Crit. 367,713.000 364,526.800 364,523.300
Note: p<0.1; p<0.05; p<0.01

Citations

(Curlee 1970)

(Thomas et al. 1995)

Curlee, Joan. 1970. “A Comparison of Male and Female Patients at an Alcoholism Treatment Center.” The Journal of Psychology 74 (2). Taylor & Francis: 239–47.

Thomas, David L, Jonathan M Zenilman, Harvey J Alter, James W Shih, Noya Galai, Anthony V Carella, and Thomas C Quinn. 1995. “Sexual Transmission of Hepatitis c Virus Among Patients Attending Sexually Transmitted Diseases Clinics in Baltimore—an Analysis of 309 Sex Partnerships.” Journal of Infectious Diseases 171 (4). Oxford University Press: 768–75.