library(readr)
d_csv <- read_csv("C:/Users/AHMED/Desktop/noShowDoc.csv", col_names = TRUE)
## Parsed with column specification:
## cols(
## Age = col_integer(),
## Gender = col_character(),
## AppointmentRegistration = col_datetime(format = ""),
## ApointmentData = col_datetime(format = ""),
## DayOfTheWeek = col_character(),
## Status = col_character(),
## Stat = col_integer(),
## Diabetes = col_integer(),
## Alcoolism = col_integer(),
## HiperTension = col_integer(),
## Handcap = col_integer(),
## Smokes = col_integer(),
## Scholarship = col_integer(),
## Tuberculosis = col_integer(),
## Sms_Reminder = col_integer(),
## AwaitingTime = col_integer()
## )
head(d_csv)
## # A tibble: 6 × 16
## Age Gender AppointmentRegistration ApointmentData DayOfTheWeek Status
## <int> <chr> <dttm> <dttm> <chr> <chr>
## 1 19 M 2014-12-16 14:46:25 2015-01-14 Wednesday Show-Up
## 2 24 F 2015-08-18 07:01:26 2015-08-19 Wednesday Show-Up
## 3 4 F 2014-02-17 12:53:46 2014-02-18 Tuesday Show-Up
## 4 5 M 2014-07-23 17:02:11 2014-08-07 Thursday Show-Up
## 5 38 M 2015-10-21 15:20:09 2015-10-27 Tuesday Show-Up
## 6 5 F 2014-06-17 06:47:27 2014-07-22 Tuesday No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## # HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## # Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(sjmisc)
d_csv$DayOfTheWeek <- recode(d_csv$DayOfTheWeek, "Sunday" = 1, "Monday" = 2, "Tuesday" =3, "Wednesday" = 4, "Thursday" = 5, "Friday" = 6, "Saturday" = 7)
head(d_csv)
## # A tibble: 6 × 16
## Age Gender AppointmentRegistration ApointmentData DayOfTheWeek Status
## <int> <chr> <dttm> <dttm> <dbl> <chr>
## 1 19 M 2014-12-16 14:46:25 2015-01-14 4 Show-Up
## 2 24 F 2015-08-18 07:01:26 2015-08-19 4 Show-Up
## 3 4 F 2014-02-17 12:53:46 2014-02-18 3 Show-Up
## 4 5 M 2014-07-23 17:02:11 2014-08-07 5 Show-Up
## 5 38 M 2015-10-21 15:20:09 2015-10-27 3 Show-Up
## 6 5 F 2014-06-17 06:47:27 2014-07-22 3 No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## # HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## # Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>
library(Zelig)
data(d_csv)
## Warning in data(d_csv): data set 'd_csv' not found
head(d_csv)
## # A tibble: 6 × 16
## Age Gender AppointmentRegistration ApointmentData DayOfTheWeek Status
## <int> <chr> <dttm> <dttm> <dbl> <chr>
## 1 19 M 2014-12-16 14:46:25 2015-01-14 4 Show-Up
## 2 24 F 2015-08-18 07:01:26 2015-08-19 4 Show-Up
## 3 4 F 2014-02-17 12:53:46 2014-02-18 3 Show-Up
## 4 5 M 2014-07-23 17:02:11 2014-08-07 5 Show-Up
## 5 38 M 2015-10-21 15:20:09 2015-10-27 3 Show-Up
## 6 5 F 2014-06-17 06:47:27 2014-07-22 3 No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## # HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## # Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>
library(radiant.data)
## Loading required package: magrittr
## Loading required package: ggplot2
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
## Loading required package: tidyr
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
## The following object is masked from 'package:sjmisc':
##
## replace_na
##
## Attaching package: 'radiant.data'
## The following object is masked from 'package:ggplot2':
##
## diamonds
## The following objects are masked from 'package:sjmisc':
##
## center, is_empty
## The following object is masked from 'package:dplyr':
##
## mutate_each
data(d_csv)
## Warning in data(d_csv): data set 'd_csv' not found
head(d_csv)
## # A tibble: 6 × 16
## Age Gender AppointmentRegistration ApointmentData DayOfTheWeek Status Stat Diabetes Alcoolism HiperTension Handcap Smokes Scholarship Tuberculosis Sms_Reminder AwaitingTime
## <int> <chr> <dttm> <dttm> <dbl> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 19 M 2014-12-16 14:46:25 2015-01-14 4 Show-Up 1 0 0 0 0 0 0 0 0 -29
## 2 24 F 2015-08-18 07:01:26 2015-08-19 4 Show-Up 1 0 0 0 0 0 0 0 0 -1
## 3 4 F 2014-02-17 12:53:46 2014-02-18 3 Show-Up 1 0 0 0 0 0 0 0 0 -1
## 4 5 M 2014-07-23 17:02:11 2014-08-07 5 Show-Up 1 0 0 0 0 0 0 0 1 -15
## 5 38 M 2015-10-21 15:20:09 2015-10-27 3 Show-Up 1 0 0 0 0 0 0 0 1 -6
## 6 5 F 2014-06-17 06:47:27 2014-07-22 3 No-Show 0 0 0 0 0 0 0 0 1 -35
library(DescTools)
## Warning: package 'DescTools' was built under R version 3.3.3
##
## Attaching package: 'DescTools'
## The following objects are masked from 'package:Zelig':
##
## Median, Mode
d_csv$DayOfTheWeek <-factor(d_csv$DayOfTheWeek)
m0 <- glm(Stat ~ Gender, family = binomial, data = d_csv)
summary(m0)
##
## Call:
## glm(formula = Stat ~ Gender, family = binomial, data = d_csv)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5546 -1.5306 0.8424 0.8424 0.8614
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.853492 0.004879 174.917 < 0.0000000000000002 ***
## GenderM -0.053211 0.008414 -6.324 0.000000000255 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 367749 on 299999 degrees of freedom
## Residual deviance: 367709 on 299998 degrees of freedom
## AIC: 367713
##
## Number of Fisher Scoring iterations: 4
m1 <- glm(Stat ~ Gender + DayOfTheWeek + Age, family = binomial, data = d_csv)
summary(m1)
##
## Call:
## glm(formula = Stat ~ Gender + DayOfTheWeek + Age, family = binomial,
## data = d_csv)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8668 -1.4323 0.7847 0.8745 1.1278
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1512331 1.0975194 1.049 0.294
## GenderM -0.0024554 0.0085184 -0.288 0.773
## DayOfTheWeek2 -0.7648770 1.0975285 -0.697 0.486
## DayOfTheWeek3 -0.6098948 1.0975284 -0.556 0.578
## DayOfTheWeek4 -0.6514993 1.0975276 -0.594 0.553
## DayOfTheWeek5 -0.6310186 1.0975300 -0.575 0.565
## DayOfTheWeek6 -0.6954399 1.0975339 -0.634 0.526
## DayOfTheWeek7 -1.0333612 1.0989089 -0.940 0.347
## Age 0.0096842 0.0001783 54.313 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 367749 on 299999 degrees of freedom
## Residual deviance: 364509 on 299991 degrees of freedom
## AIC: 364527
##
## Number of Fisher Scoring iterations: 4
m2 <- glm(Stat ~ Gender*DayOfTheWeek + Age, family = binomial, data = d_csv)
summary(m2)
##
## Call:
## glm(formula = Stat ~ Gender * DayOfTheWeek + Age, family = binomial,
## data = d_csv)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8732 -1.4325 0.7848 0.8739 1.2045
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1511336 1.0975204 1.049 0.2942
## GenderM -0.2662785 0.1230512 -2.164 0.0305 *
## DayOfTheWeek2 -0.7648700 1.0975439 -0.697 0.4859
## DayOfTheWeek3 -0.6080630 1.0975432 -0.554 0.5796
## DayOfTheWeek4 -0.6641008 1.0975423 -0.605 0.5451
## DayOfTheWeek5 -0.6343655 1.0975461 -0.578 0.5633
## DayOfTheWeek6 -0.6811631 1.0975523 -0.621 0.5348
## DayOfTheWeek7 -0.9579932 1.0994984 -0.871 0.3836
## Age 0.0096863 0.0001783 54.320 <0.0000000000000002 ***
## GenderM:DayOfTheWeek2 0.2638653 0.1244623 2.120 0.0340 *
## GenderM:DayOfTheWeek3 0.2582795 0.1244897 2.075 0.0380 *
## GenderM:DayOfTheWeek4 0.3014304 0.1244328 2.422 0.0154 *
## GenderM:DayOfTheWeek5 0.2736872 0.1244969 2.198 0.0279 *
## GenderM:DayOfTheWeek6 0.2216500 0.1246689 1.778 0.0754 .
## GenderM:DayOfTheWeek7 NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 367749 on 299999 degrees of freedom
## Residual deviance: 364495 on 299986 degrees of freedom
## AIC: 364523
##
## Number of Fisher Scoring iterations: 4
library(texreg)
## Version: 1.36.18
## Date: 2016-10-22
## Author: Philip Leifeld (University of Glasgow)
##
## Please cite the JSS article in your publications -- see citation("texreg").
##
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
##
## extract
## The following object is masked from 'package:magrittr':
##
## extract
screenreg(list(m0, m1, m2))
##
## =====================================================================
## Model 1 Model 2 Model 3
## ---------------------------------------------------------------------
## (Intercept) 0.85 *** 1.15 1.15
## (0.00) (1.10) (1.10)
## GenderM -0.05 *** -0.00 -0.27 *
## (0.01) (0.01) (0.12)
## DayOfTheWeek2 -0.76 -0.76
## (1.10) (1.10)
## DayOfTheWeek3 -0.61 -0.61
## (1.10) (1.10)
## DayOfTheWeek4 -0.65 -0.66
## (1.10) (1.10)
## DayOfTheWeek5 -0.63 -0.63
## (1.10) (1.10)
## DayOfTheWeek6 -0.70 -0.68
## (1.10) (1.10)
## DayOfTheWeek7 -1.03 -0.96
## (1.10) (1.10)
## Age 0.01 *** 0.01 ***
## (0.00) (0.00)
## GenderM:DayOfTheWeek2 0.26 *
## (0.12)
## GenderM:DayOfTheWeek3 0.26 *
## (0.12)
## GenderM:DayOfTheWeek4 0.30 *
## (0.12)
## GenderM:DayOfTheWeek5 0.27 *
## (0.12)
## GenderM:DayOfTheWeek6 0.22
## (0.12)
## ---------------------------------------------------------------------
## AIC 367712.96 364526.82 364523.32
## BIC 367734.18 364622.32 364671.88
## Log Likelihood -183854.48 -182254.41 -182247.66
## Deviance 367708.96 364508.82 364495.32
## Num. obs. 300000 300000 300000
## =====================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
lmtest::lrtest(m0, m1, m2)
## Likelihood ratio test
##
## Model 1: Stat ~ Gender
## Model 2: Stat ~ Gender + DayOfTheWeek + Age
## Model 3: Stat ~ Gender * DayOfTheWeek + Age
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 2 -183854
## 2 9 -182254 7 3200.139 < 0.0000000000000002 ***
## 3 14 -182248 5 13.504 0.01909 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#library(visreg)
#visreg(m2, "Age", scale = "response")
#library(visreg)
#visreg(m1, "Gender", by = "DayOfTheWeek", scale = "response")
#library(visreg)
#visreg(m2, "DayOfTheWeek", by = "Gender", scale = "response")
#library(visreg)
#visreg(m2, "DayOfTheWeek", by = "Gender")
#library(visreg)
#visreg(m2, "Age", by = "Gender", scale = "response")
#library(visreg)
#visreg(m2, "Age", by = "Gender")
The following analysis displays information from a dataset obtain from kaggle. The original dataset can be obtained from this link https://www.kaggle.com/joniarroba/noshowappointments . The dataset that I chose for this assignment observes the commitment of patients that attend scheduled appointments. In other words, the dataset indicates whether patients tend to show up to their medical appointments or not. Although the dataset contains 16 variables in total, I will only use four of those variables (Age, Gender, Stat, and DayOfTheWeek) for this particular analysis. My objective for this assignment is to visualize and observe whether it is males who are more likely compared or females that are more likely to miss their scheduled appointments. The variables Age, Gender and DayOfTheWeek are the independent variables and indicate the age of the patient, their gender and the day of the week that they had their appointment respectively. Stat is our dependent variable and tells us whether the respondent visited their health practitioner for their scheduled medical appointment.
The first step I took to begin the analysis was to download the csv file from the kaggle website and upload the csv file into R. Second, since my DayOfTheWeek variable was categorical, I recoded the variable so that each Day would represent a number from 1-7. For instance, Sunday is shown as 1, Monday is represented as 2, Tuesday is regarded as 3 and so on. However, since I am using this categorical variable in my analysis, I also needed to utilize the DescTools package so that I can reference at least one of the days into my models. In other words, if I did not perform this step, my Intercept in the models will only include gender (in this case Female) and age (which is the equivalent of 0 in the intercept). Because I need to have a starting point with at least one day for the intercepts for my models, I had to factor DayOfTheWeek which would allow Sunday (1) to be included in the Intercept. Once the following procedures were completed, I continued on using three models for this analysis.
In Model 0, the results show us that the relationship between gender and showing up to a medical appointment is statistically significant. However, the results clearly indicate that there is a gender difference of attending a medical appointment. Male patients are more likely to not show up for an appointment (-2.46) than females. Additionally, this result from the analysis is also statistically significant. Therefore, from the first model, we can infer that males are more likely to not show up for an appointment than females which entails that males are more than likely to not to discuss their medical condition to a doctor compared to females.
In Model 1, the results provide a clearer picture. Similar to the first model, females are more likely to show up to medical appointments than males. However, the relationship between gender and showing up to a medical appointment is not statistically significant. Age is the only coefficient in the model that is statistically significant. This shows that age is increased by 0.01 for every time someone goes to a medical appointment. Although both males and females are not likely to attend a medical appointment throughout any of the seven days of the week, from the data analyzed, females are slightly more likely to show up to a medical appointment. Despite this finding, females and males showing up to an appointment to any day of the week is not statistically significant. This tells us that even though females are more likely to continue to attend medical appointments relative to males, the more older a patient gets, the more likely they will attend an appointment.
In Model 2, the results from the analysis show us an entirely different scenario. Unlike Model 1, an interaction term was created between Gender and DayOfTheWeek. Compared to the first two models, females were more likely to attend a medical appointment. However, model 3 shows that other than Sundays and Saturdays (Data was N/A for males for Saturdays), males are more likely to attend an appointment between Monday to Friday. Furthermore, the data tells us that for each of these five days, the relationship is statistically significant between Males and attending an appointment on Monday, Tuesday, Wednesday, Thursday, and Friday. On the other hand, females are less likely to attend their appointments on any of these five days while the relationship for all of these coefficients are not statistically significant. Although, females are more likely to attend appointments on Sundays, the relationship is not statistically significant. As a result, the data suggests that males are more likely to attend medical appointments than females throughout the weekdays. Furthermore, males are more likely to attend an appointment on Wednesday (0.30) compared to any other day of the week. Ultimately, while aging influences both genders to attend their appointments more often, males are more likely to go to the doctor on the weekdays while females were seen to go on the Weekends (Only Sunday and not statistically significant as well).
After working with and analyzing all three models, a likelihood ratio test was used to see which model did a better job at detailing which most best fit the data. After the ratio test was done, it was observed that model 2 was the most significant out of the three.