Homework 04 Assignment

library(readr)

d_csv <- read_csv("C:/Users/AHMED/Desktop/noShowDoc.csv", col_names = TRUE)

## Parsed with column specification:
## cols(
##   Age = col_integer(),
##   Gender = col_character(),
##   AppointmentRegistration = col_datetime(format = ""),
##   ApointmentData = col_datetime(format = ""),
##   DayOfTheWeek = col_character(),
##   Status = col_character(),
##   Stat = col_integer(),
##   Diabetes = col_integer(),
##   Alcoolism = col_integer(),
##   HiperTension = col_integer(),
##   Handcap = col_integer(),
##   Smokes = col_integer(),
##   Scholarship = col_integer(),
##   Tuberculosis = col_integer(),
##   Sms_Reminder = col_integer(),
##   AwaitingTime = col_integer()
## )

head(d_csv)

## # A tibble: 6 × 16
##     Age Gender AppointmentRegistration ApointmentData DayOfTheWeek  Status
##   <int>  <chr>                  <dttm>         <dttm>        <chr>   <chr>
## 1    19      M     2014-12-16 14:46:25     2015-01-14    Wednesday Show-Up
## 2    24      F     2015-08-18 07:01:26     2015-08-19    Wednesday Show-Up
## 3     4      F     2014-02-17 12:53:46     2014-02-18      Tuesday Show-Up
## 4     5      M     2014-07-23 17:02:11     2014-08-07     Thursday Show-Up
## 5    38      M     2015-10-21 15:20:09     2015-10-27      Tuesday Show-Up
## 6     5      F     2014-06-17 06:47:27     2014-07-22      Tuesday No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## #   HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## #   Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(sjmisc)
d_csv$DayOfTheWeek <- recode(d_csv$DayOfTheWeek, "Sunday" = 1, "Monday" = 2, "Tuesday" =3, "Wednesday" = 4, "Thursday" = 5, "Friday" = 6, "Saturday" = 7)

head(d_csv)

## # A tibble: 6 × 16
##     Age Gender AppointmentRegistration ApointmentData DayOfTheWeek  Status
##   <int>  <chr>                  <dttm>         <dttm>        <dbl>   <chr>
## 1    19      M     2014-12-16 14:46:25     2015-01-14            4 Show-Up
## 2    24      F     2015-08-18 07:01:26     2015-08-19            4 Show-Up
## 3     4      F     2014-02-17 12:53:46     2014-02-18            3 Show-Up
## 4     5      M     2014-07-23 17:02:11     2014-08-07            5 Show-Up
## 5    38      M     2015-10-21 15:20:09     2015-10-27            3 Show-Up
## 6     5      F     2014-06-17 06:47:27     2014-07-22            3 No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## #   HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## #   Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>

library(Zelig)
data(d_csv)

## Warning in data(d_csv): data set 'd_csv' not found

head(d_csv)

## # A tibble: 6 × 16
##     Age Gender AppointmentRegistration ApointmentData DayOfTheWeek  Status
##   <int>  <chr>                  <dttm>         <dttm>        <dbl>   <chr>
## 1    19      M     2014-12-16 14:46:25     2015-01-14            4 Show-Up
## 2    24      F     2015-08-18 07:01:26     2015-08-19            4 Show-Up
## 3     4      F     2014-02-17 12:53:46     2014-02-18            3 Show-Up
## 4     5      M     2014-07-23 17:02:11     2014-08-07            5 Show-Up
## 5    38      M     2015-10-21 15:20:09     2015-10-27            3 Show-Up
## 6     5      F     2014-06-17 06:47:27     2014-07-22            3 No-Show
## # ... with 10 more variables: Stat <int>, Diabetes <int>, Alcoolism <int>,
## #   HiperTension <int>, Handcap <int>, Smokes <int>, Scholarship <int>,
## #   Tuberculosis <int>, Sms_Reminder <int>, AwaitingTime <int>

library(radiant.data)

## Loading required package: magrittr

## Loading required package: ggplot2

## Loading required package: lubridate

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

## Loading required package: tidyr

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:magrittr':
## 
##     extract

## The following object is masked from 'package:sjmisc':
## 
##     replace_na

## 
## Attaching package: 'radiant.data'

## The following object is masked from 'package:ggplot2':
## 
##     diamonds

## The following objects are masked from 'package:sjmisc':
## 
##     center, is_empty

## The following object is masked from 'package:dplyr':
## 
##     mutate_each

data(d_csv)

## Warning in data(d_csv): data set 'd_csv' not found

head(d_csv)

## # A tibble: 6 × 16
##     Age Gender AppointmentRegistration ApointmentData DayOfTheWeek  Status  Stat Diabetes Alcoolism HiperTension Handcap Smokes Scholarship Tuberculosis Sms_Reminder AwaitingTime
##   <int>  <chr>                  <dttm>         <dttm>        <dbl>   <chr> <int>    <int>     <int>        <int>   <int>  <int>       <int>        <int>        <int>        <int>
## 1    19      M     2014-12-16 14:46:25     2015-01-14            4 Show-Up     1        0         0            0       0      0           0            0            0          -29
## 2    24      F     2015-08-18 07:01:26     2015-08-19            4 Show-Up     1        0         0            0       0      0           0            0            0           -1
## 3     4      F     2014-02-17 12:53:46     2014-02-18            3 Show-Up     1        0         0            0       0      0           0            0            0           -1
## 4     5      M     2014-07-23 17:02:11     2014-08-07            5 Show-Up     1        0         0            0       0      0           0            0            1          -15
## 5    38      M     2015-10-21 15:20:09     2015-10-27            3 Show-Up     1        0         0            0       0      0           0            0            1           -6
## 6     5      F     2014-06-17 06:47:27     2014-07-22            3 No-Show     0        0         0            0       0      0           0            0            1          -35

library(DescTools)

## Warning: package 'DescTools' was built under R version 3.3.3

## 
## Attaching package: 'DescTools'

## The following objects are masked from 'package:Zelig':
## 
##     Median, Mode

d_csv$DayOfTheWeek <-factor(d_csv$DayOfTheWeek)

m0 <- glm(Stat ~ Gender, family = binomial, data = d_csv)
summary(m0)

## 
## Call:
## glm(formula = Stat ~ Gender, family = binomial, data = d_csv)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5546  -1.5306   0.8424   0.8424   0.8614  
## 
## Coefficients:
##              Estimate Std. Error z value             Pr(>|z|)    
## (Intercept)  0.853492   0.004879 174.917 < 0.0000000000000002 ***
## GenderM     -0.053211   0.008414  -6.324       0.000000000255 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 367749  on 299999  degrees of freedom
## Residual deviance: 367709  on 299998  degrees of freedom
## AIC: 367713
## 
## Number of Fisher Scoring iterations: 4

m1 <- glm(Stat ~ Gender + DayOfTheWeek + Age, family = binomial, data = d_csv)
summary(m1)

## 
## Call:
## glm(formula = Stat ~ Gender + DayOfTheWeek + Age, family = binomial, 
##     data = d_csv)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8668  -1.4323   0.7847   0.8745   1.1278  
## 
## Coefficients:
##                 Estimate Std. Error z value            Pr(>|z|)    
## (Intercept)    1.1512331  1.0975194   1.049               0.294    
## GenderM       -0.0024554  0.0085184  -0.288               0.773    
## DayOfTheWeek2 -0.7648770  1.0975285  -0.697               0.486    
## DayOfTheWeek3 -0.6098948  1.0975284  -0.556               0.578    
## DayOfTheWeek4 -0.6514993  1.0975276  -0.594               0.553    
## DayOfTheWeek5 -0.6310186  1.0975300  -0.575               0.565    
## DayOfTheWeek6 -0.6954399  1.0975339  -0.634               0.526    
## DayOfTheWeek7 -1.0333612  1.0989089  -0.940               0.347    
## Age            0.0096842  0.0001783  54.313 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 367749  on 299999  degrees of freedom
## Residual deviance: 364509  on 299991  degrees of freedom
## AIC: 364527
## 
## Number of Fisher Scoring iterations: 4

m2 <- glm(Stat ~ Gender*DayOfTheWeek + Age, family = binomial, data = d_csv)
summary(m2)

## 
## Call:
## glm(formula = Stat ~ Gender * DayOfTheWeek + Age, family = binomial, 
##     data = d_csv)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8732  -1.4325   0.7848   0.8739   1.2045  
## 
## Coefficients: (1 not defined because of singularities)
##                         Estimate Std. Error z value            Pr(>|z|)    
## (Intercept)            1.1511336  1.0975204   1.049              0.2942    
## GenderM               -0.2662785  0.1230512  -2.164              0.0305 *  
## DayOfTheWeek2         -0.7648700  1.0975439  -0.697              0.4859    
## DayOfTheWeek3         -0.6080630  1.0975432  -0.554              0.5796    
## DayOfTheWeek4         -0.6641008  1.0975423  -0.605              0.5451    
## DayOfTheWeek5         -0.6343655  1.0975461  -0.578              0.5633    
## DayOfTheWeek6         -0.6811631  1.0975523  -0.621              0.5348    
## DayOfTheWeek7         -0.9579932  1.0994984  -0.871              0.3836    
## Age                    0.0096863  0.0001783  54.320 <0.0000000000000002 ***
## GenderM:DayOfTheWeek2  0.2638653  0.1244623   2.120              0.0340 *  
## GenderM:DayOfTheWeek3  0.2582795  0.1244897   2.075              0.0380 *  
## GenderM:DayOfTheWeek4  0.3014304  0.1244328   2.422              0.0154 *  
## GenderM:DayOfTheWeek5  0.2736872  0.1244969   2.198              0.0279 *  
## GenderM:DayOfTheWeek6  0.2216500  0.1246689   1.778              0.0754 .  
## GenderM:DayOfTheWeek7         NA         NA      NA                  NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 367749  on 299999  degrees of freedom
## Residual deviance: 364495  on 299986  degrees of freedom
## AIC: 364523
## 
## Number of Fisher Scoring iterations: 4

library(texreg)

## Version:  1.36.18
## Date:     2016-10-22
## Author:   Philip Leifeld (University of Glasgow)
## 
## Please cite the JSS article in your publications -- see citation("texreg").

## 
## Attaching package: 'texreg'

## The following object is masked from 'package:tidyr':
## 
##     extract

## The following object is masked from 'package:magrittr':
## 
##     extract

screenreg(list(m0, m1, m2))

## 
## =====================================================================
##                        Model 1         Model 2         Model 3       
## ---------------------------------------------------------------------
## (Intercept)                  0.85 ***        1.15            1.15    
##                             (0.00)          (1.10)          (1.10)   
## GenderM                     -0.05 ***       -0.00           -0.27 *  
##                             (0.01)          (0.01)          (0.12)   
## DayOfTheWeek2                               -0.76           -0.76    
##                                             (1.10)          (1.10)   
## DayOfTheWeek3                               -0.61           -0.61    
##                                             (1.10)          (1.10)   
## DayOfTheWeek4                               -0.65           -0.66    
##                                             (1.10)          (1.10)   
## DayOfTheWeek5                               -0.63           -0.63    
##                                             (1.10)          (1.10)   
## DayOfTheWeek6                               -0.70           -0.68    
##                                             (1.10)          (1.10)   
## DayOfTheWeek7                               -1.03           -0.96    
##                                             (1.10)          (1.10)   
## Age                                          0.01 ***        0.01 ***
##                                             (0.00)          (0.00)   
## GenderM:DayOfTheWeek2                                        0.26 *  
##                                                             (0.12)   
## GenderM:DayOfTheWeek3                                        0.26 *  
##                                                             (0.12)   
## GenderM:DayOfTheWeek4                                        0.30 *  
##                                                             (0.12)   
## GenderM:DayOfTheWeek5                                        0.27 *  
##                                                             (0.12)   
## GenderM:DayOfTheWeek6                                        0.22    
##                                                             (0.12)   
## ---------------------------------------------------------------------
## AIC                     367712.96       364526.82       364523.32    
## BIC                     367734.18       364622.32       364671.88    
## Log Likelihood         -183854.48      -182254.41      -182247.66    
## Deviance                367708.96       364508.82       364495.32    
## Num. obs.               300000          300000          300000       
## =====================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

lmtest::lrtest(m0, m1, m2)

## Likelihood ratio test
## 
## Model 1: Stat ~ Gender
## Model 2: Stat ~ Gender + DayOfTheWeek + Age
## Model 3: Stat ~ Gender * DayOfTheWeek + Age
##   #Df  LogLik Df    Chisq           Pr(>Chisq)    
## 1   2 -183854                                     
## 2   9 -182254  7 3200.139 < 0.0000000000000002 ***
## 3  14 -182248  5   13.504              0.01909 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#library(visreg)
#visreg(m2, "Age", scale = "response")

#library(visreg)
#visreg(m1, "Gender", by = "DayOfTheWeek", scale = "response")

#library(visreg)
#visreg(m2, "DayOfTheWeek", by = "Gender", scale = "response")

#library(visreg)
#visreg(m2, "DayOfTheWeek", by = "Gender")

#library(visreg)
#visreg(m2, "Age", by = "Gender", scale = "response")

#library(visreg)
#visreg(m2, "Age", by = "Gender")

The following analysis displays information from a dataset obtain from kaggle. The original dataset can be obtained from this link https://www.kaggle.com/joniarroba/noshowappointments . The dataset that I chose for this assignment observes the commitment of patients that attend scheduled appointments. In other words, the dataset indicates whether patients tend to show up to their medical appointments or not. Although the dataset contains 16 variables in total, I will only use four of those variables (Age, Gender, Stat, and DayOfTheWeek) for this particular analysis. My objective for this assignment is to visualize and observe whether it is males who are more likely compared or females that are more likely to miss their scheduled appointments. The variables Age, Gender and DayOfTheWeek are the independent variables and indicate the age of the patient, their gender and the day of the week that they had their appointment respectively. Stat is our dependent variable and tells us whether the respondent visited their health practitioner for their scheduled medical appointment.

The first step I took to begin the analysis was to download the csv file from the kaggle website and upload the csv file into R. Second, since my DayOfTheWeek variable was categorical, I recoded the variable so that each Day would represent a number from 1-7. For instance, Sunday is shown as 1, Monday is represented as 2, Tuesday is regarded as 3 and so on. However, since I am using this categorical variable in my analysis, I also needed to utilize the DescTools package so that I can reference at least one of the days into my models. In other words, if I did not perform this step, my Intercept in the models will only include gender (in this case Female) and age (which is the equivalent of 0 in the intercept). Because I need to have a starting point with at least one day for the intercepts for my models, I had to factor DayOfTheWeek which would allow Sunday (1) to be included in the Intercept. Once the following procedures were completed, I continued on using three models for this analysis.

In Model 0, the results show us that the relationship between gender and showing up to a medical appointment is statistically significant. However, the results clearly indicate that there is a gender difference of attending a medical appointment. Male patients are more likely to not show up for an appointment (-2.46) than females. Additionally, this result from the analysis is also statistically significant. Therefore, from the first model, we can infer that males are more likely to not show up for an appointment than females which entails that males are more than likely to not to discuss their medical condition to a doctor compared to females.

In Model 1, the results provide a clearer picture. Similar to the first model, females are more likely to show up to medical appointments than males. However, the relationship between gender and showing up to a medical appointment is not statistically significant. Age is the only coefficient in the model that is statistically significant. This shows that age is increased by 0.01 for every time someone goes to a medical appointment. Although both males and females are not likely to attend a medical appointment throughout any of the seven days of the week, from the data analyzed, females are slightly more likely to show up to a medical appointment. Despite this finding, females and males showing up to an appointment to any day of the week is not statistically significant. This tells us that even though females are more likely to continue to attend medical appointments relative to males, the more older a patient gets, the more likely they will attend an appointment.

In Model 2, the results from the analysis show us an entirely different scenario. Unlike Model 1, an interaction term was created between Gender and DayOfTheWeek. Compared to the first two models, females were more likely to attend a medical appointment. However, model 3 shows that other than Sundays and Saturdays (Data was N/A for males for Saturdays), males are more likely to attend an appointment between Monday to Friday. Furthermore, the data tells us that for each of these five days, the relationship is statistically significant between Males and attending an appointment on Monday, Tuesday, Wednesday, Thursday, and Friday. On the other hand, females are less likely to attend their appointments on any of these five days while the relationship for all of these coefficients are not statistically significant. Although, females are more likely to attend appointments on Sundays, the relationship is not statistically significant. As a result, the data suggests that males are more likely to attend medical appointments than females throughout the weekdays. Furthermore, males are more likely to attend an appointment on Wednesday (0.30) compared to any other day of the week. Ultimately, while aging influences both genders to attend their appointments more often, males are more likely to go to the doctor on the weekdays while females were seen to go on the Weekends (Only Sunday and not statistically significant as well).

After working with and analyzing all three models, a likelihood ratio test was used to see which model did a better job at detailing which most best fit the data. After the ratio test was done, it was observed that model 2 was the most significant out of the three.

Homework 04 Assignment

Fahad Ahmed

March 8, 2017