library(texreg)
library(stargazer)
library(ggplot2)
library(ggthemes)
library(rCharts)
library(plotly)
library(babynames)
library(Zelig)
library(readr)
library(dplyr)
library(sjmisc)
library(DescTools)
library(plyr)
library(magrittr)
library(tidyverse)
library(Cite)
d_csv <- read_csv("C:/Users/AHMED/Desktop/noShowDoc.csv", col_names = TRUE)
head(d_csv)
## # A tibble: 6 × 16
## Age Gender AppointmentRegistration ApointmentData DayOfTheWeek Status
## <int> <chr> <dttm> <dttm> <chr> <chr>
## 1 19 M 2014-12-16 14:46:25 2015-01-14 Wednesday Show-Up
## 2 24 F 2015-08-18 07:01:26 2015-08-19 Wednesday Show-Up
## 3 4 F 2014-02-17 12:53:46 2014-02-18 Tuesday Show-Up
## 4 5 M 2014-07-23 17:02:11 2014-08-07 Thursday Show-Up
## 5 38 M 2015-10-21 15:20:09 2015-10-27 Tuesday Show-Up
## 6 5 F 2014-06-17 06:47:27 2014-07-22 Tuesday No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## # HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## # Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>
d_csv$DayOfTheWeek <- recode(d_csv$DayOfTheWeek, "Sunday" = 1, "Monday" = 2, "Tuesday" =3, "Wednesday" = 4, "Thursday" = 5, "Friday" = 6, "Saturday" = 7)
d_csv$DayOfTheWeek <-factor(d_csv$DayOfTheWeek)
head(d_csv)
## # A tibble: 6 × 16
## Age Gender AppointmentRegistration ApointmentData DayOfTheWeek Status
## <int> <chr> <dttm> <dttm> <fctr> <chr>
## 1 19 M 2014-12-16 14:46:25 2015-01-14 4 Show-Up
## 2 24 F 2015-08-18 07:01:26 2015-08-19 4 Show-Up
## 3 4 F 2014-02-17 12:53:46 2014-02-18 3 Show-Up
## 4 5 M 2014-07-23 17:02:11 2014-08-07 5 Show-Up
## 5 38 M 2015-10-21 15:20:09 2015-10-27 3 Show-Up
## 6 5 F 2014-06-17 06:47:27 2014-07-22 3 No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## # HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## # Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>
g <- ggplot(d_csv, mapping = aes(x = Gender, y = Age))
g2 <- g + geom_bin2d()
g2
The following graph above shows the age differences between males and females within the dataset. As the data suggests, there are more older females within the dataset relative to males.The oldest male(s) in the dataset is around 105 years of age while the oldest female is around 115 years of age. Furthermore, there is a higher number of young patients among females compared to male patients. This observation suggests that female patients schedule more medical appoinments than male patients.
d_csv3 <- d_csv
d_csv3$Gender <-factor(d_csv3$Gender, labels = c("M", "F"))
d_csv3$DayOfTheWeek <-factor(d_csv3$DayOfTheWeek, labels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
d_csv3$Status <-factor(d_csv3$Status, labels = c("S", "NS"))
qplot(Gender, Age, data = d_csv3, geom="boxplot")+ geom_point() +
facet_grid(.~DayOfTheWeek, scales="free",space="free")
The following Qplot above discovers differences in attendance rates of medical appoinments between male patients versus female patients. Out of the seven days of the week, the box plot for Sunday is quite small compared to the other days of the week. This is due to the abscence of females having medical appointments on Sunday. In other words, no female patient in the dataset had a medical appointment on Sunday according to the dataset. Additionally, there are very few males within the datset that have medical appointments scheduled for Saturday. The information from the boxplot indicates that there are more younger female patients who attend medical appointments than males. As shown from Monday through Saturday, the box plot for females are longer which indicates greater variability. Consequently, males are more likely to attend medical appointments overall.
ds <- ddply(d_csv3, .(Status), summarise, mean = mean(Age), sd = sd(Age))
ggplot()+
geom_point(data= d_csv3, aes(x=Status, y=Age)) + scale_y_continuous() + scale_x_discrete() + xlab("Show Versus no Show") +
geom_point(data=ds, aes(x=Status, y = mean, colour = 'red', size = 3)) +
geom_errorbar(data=ds, aes(x = Status, y = mean, ymin = mean - sd, ymax = mean + sd),
colour = 'red', width = 0.4) +
facet_grid(Gender ~ DayOfTheWeek, margins = TRUE)
## Warning: Ignoring unknown aesthetics: y
The boxplot created utilizing ggplot provides us with a similar result as the qplot. Although younger females attend more medical appointments, males are more likely to attend medical appointments overall as indicated by the variability from the bars in each individual box. Similar to the previous chart, data for Sunday is not explicit as there are potenital N/a (missing) values in the dataset. Ultimately, males are more likely to show up to their medical appointments although females are slightly more likely to show up on Saturdays.
d_csv1 <- select(d_csv, Stat, Age, Gender, DayOfTheWeek)
stargazer(d_csv1, summary = TRUE, type = "html")
| Statistic | N | Mean | St. Dev. | Min | Max |
stargazer(head(d_csv1), summary = FALSE, type = "html")
| Stat | Age | Gender | DayOfTheWeek | |
| 1 | 1 | 19 | M | 4 |
| 2 | 1 | 24 | F | 4 |
| 3 | 1 | 4 | F | 3 |
| 4 | 1 | 5 | M | 5 |
| 5 | 1 | 38 | M | 3 |
| 6 | 0 | 5 | F | 3 |
m0 <- glm(Stat ~ Gender, family = binomial, data = d_csv)
m1 <- glm(Stat ~ Gender + DayOfTheWeek + Age, family = binomial, data = d_csv)
m2 <- glm(Stat ~ Gender*DayOfTheWeek + Age, family = binomial, data = d_csv)
stargazer(m0, m1, m2, type = "html", title="3 Logit Models with glm")
| Dependent variable: | |||
| Stat | |||
| (1) | (2) | (3) | |
| GenderM | -0.053*** | -0.002 | -0.266** |
| (0.008) | (0.009) | (0.123) | |
| DayOfTheWeek2 | -0.765 | -0.765 | |
| (1.098) | (1.098) | ||
| DayOfTheWeek3 | -0.610 | -0.608 | |
| (1.098) | (1.098) | ||
| DayOfTheWeek4 | -0.651 | -0.664 | |
| (1.098) | (1.098) | ||
| DayOfTheWeek5 | -0.631 | -0.634 | |
| (1.098) | (1.098) | ||
| DayOfTheWeek6 | -0.695 | -0.681 | |
| (1.098) | (1.098) | ||
| DayOfTheWeek7 | -1.033 | -0.958 | |
| (1.099) | (1.099) | ||
| Age | 0.010*** | 0.010*** | |
| (0.0002) | (0.0002) | ||
| GenderM:DayOfTheWeek2 | 0.264** | ||
| (0.124) | |||
| GenderM:DayOfTheWeek3 | 0.258** | ||
| (0.124) | |||
| GenderM:DayOfTheWeek4 | 0.301** | ||
| (0.124) | |||
| GenderM:DayOfTheWeek5 | 0.274** | ||
| (0.124) | |||
| GenderM:DayOfTheWeek6 | 0.222* | ||
| (0.125) | |||
| GenderM:DayOfTheWeek7 | |||
| Constant | 0.853*** | 1.151 | 1.151 |
| (0.005) | (1.098) | (1.098) | |
| Observations | 300,000 | 300,000 | 300,000 |
| Log Likelihood | -183,854.500 | -182,254.400 | -182,247.700 |
| Akaike Inf. Crit. | 367,713.000 | 364,526.800 | 364,523.300 |
| Note: | p<0.1; p<0.05; p<0.01 | ||
(Curlee 1970)
(Thomas et al. 1995)
Curlee, Joan. 1970. “A Comparison of Male and Female Patients at an Alcoholism Treatment Center.” The Journal of Psychology 74 (2). Taylor & Francis: 239–47.
Thomas, David L, Jonathan M Zenilman, Harvey J Alter, James W Shih, Noya Galai, Anthony V Carella, and Thomas C Quinn. 1995. “Sexual Transmission of Hepatitis c Virus Among Patients Attending Sexually Transmitted Diseases Clinics in Baltimore—an Analysis of 309 Sex Partnerships.” Journal of Infectious Diseases 171 (4). Oxford University Press: 768–75.