For this week’s assignment, I will try to find the odds of a complaint by MTA customers using the “MTA Customer Feedback Data: Beginning 2014” dataset from NYC Open Data. I had to change my “NY Bus Breakdown and Delays” dataset from the previous week because there were not enough variables for me to play around. However, both datasets are part of the NYC Open Data Transporation category and my interest.
Depedent Variable
Independet variables
I unloaded the functions I will be using/I might use.
library(readr)
library(dplyr)
library(Zelig)
library(texreg)
library(pander)
library(visreg)
library(effects)
library(sjlabelled)
I imported the dataset to R.
mta_feedback<-read_csv("C:/Users/wroni/OneDrive/Documents/QC MADASR/SOC 712/mta_feedback.csv")
## Parsed with column specification:
## cols(
## Agency = col_character(),
## `Commendation or Complaint` = col_character(),
## `Subject Matter` = col_character(),
## `Subject Detail` = col_character(),
## `Issue Detail` = col_character(),
## Year = col_double(),
## Quarter = col_double(),
## `Branch/Line/Route` = col_character()
## )
I removed the labels of the variables, which will allow me to recode the dependent variable into a binary.
mta_feedback2<-sjlabelled::remove_all_labels(mta_feedback)
head(mta_feedback2)
Since I will be focusing on the odds of a complaint by MTA customers, I recoded the dependent vaiable between 0 (no complaint) or 1 (complaint).
mta_feedback3 <- mutate(mta_feedback2, complaint_binary= recode(Commendation.or.Complaint,`Complaint` = 1, `Commendation` = 0))
head(mta_feedback3)
Model 1 shows the effect Agency has on the odds of a complaint by MTA customers. The odds of a complaint increases for Agency Subways by .0376623 (statistically significant). However, the odds of a complaint decreases for Agency NYC Buses by .005002 (not statistically signifcant) and Agency Metro-North Railroad by 0.183167 (statistically significant).
m1 <- glm(complaint_binary ~ Agency, family = binomial, data = mta_feedback3)
summary(m1)
##
## Call:
## glm(formula = complaint_binary ~ Agency, family = binomial, data = mta_feedback3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.6476 0.2470 0.2979 0.2979 0.3250
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.097879 0.031901 97.109 < 2e-16 ***
## AgencyMetro-North Railroad -0.183167 0.046277 -3.958 7.56e-05 ***
## AgencyNYC Buses -0.005002 0.034467 -0.145 0.885
## AgencySubways 0.376623 0.037277 10.103 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 91621 on 275669 degrees of freedom
## Residual deviance: 91259 on 275666 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 91267
##
## Number of Fisher Scoring iterations: 6
Model 2 shows the effect Agency and Subject Matter has on the odds of a complaint by MTA customers. The odds of a complaint increases for Subject Matter Complaint by 14.33084 (not significant), Subject Matter Construction/Capital Projects by 14.1971 (not significant), Subject Matter Customers by 13.03700 (not significant), Subject Reasonable Modificaitons by 13.96709 (not significant), Subject Matter Travel Distruption / Trip Problems by 2.75878 (significant), Subject Matter Trains by 1.23333 (significant), and Subject Matter Station/Bus Stop/Facility/Structure by 1.75159 (significant). THe odds of a complaint decreases for Subject Matter Employees by 1.66550 (significant) and Public Hearing by 1.80039 (not sigificant).
m2 <- glm(complaint_binary ~ Agency + Subject.Matter, family = binomial, data = mta_feedback3)
summary(m2)
##
## Call:
## glm(formula = complaint_binary ~ Agency + Subject.Matter, family = binomial,
## data = mta_feedback3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5206 0.0638 0.1366 0.3823 0.7213
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 3.01393 0.08944
## AgencyMetro-North Railroad 0.22130 0.04947
## AgencyNYC Buses 1.23122 0.03788
## AgencySubways 0.42242 0.03982
## Subject.MatterCommendation -20.80130 1097.24701
## Subject.MatterComplaint 14.33084 3956.18034
## Subject.MatterConstruction / Capital Projects 14.12971 144.07561
## Subject.MatterCustomer 14.03700 53.78364
## Subject.MatterEmployees -1.66550 0.08224
## Subject.MatterFerry Service - Hudson River 14.33084 1398.72096
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls 1.20755 0.18512
## Subject.MatterMTA Agency Cars / Trucks 0.45223 0.51197
## Subject.MatterPolicies, Rules & Regulations 0.26434 0.11551
## Subject.MatterPublic Hearing -1.80039 0.42896
## Subject.MatterReasonable Modification 13.96709 643.29605
## Subject.MatterSchedules / Reservations 0.23576 0.13370
## Subject.MatterStation /Bus Stop /Facility /Structure 1.75159 0.10399
## Subject.MatterTelephone / Website / Mobile Apps 0.95660 0.14067
## Subject.MatterTrains 1.23333 0.10704
## Subject.MatterTravel Disruption / Trip Problem 2.75878 0.13295
## z value Pr(>|z|)
## (Intercept) 33.699 < 2e-16 ***
## AgencyMetro-North Railroad 4.473 7.71e-06 ***
## AgencyNYC Buses 32.504 < 2e-16 ***
## AgencySubways 10.609 < 2e-16 ***
## Subject.MatterCommendation -0.019 0.9849
## Subject.MatterComplaint 0.004 0.9971
## Subject.MatterConstruction / Capital Projects 0.098 0.9219
## Subject.MatterCustomer 0.261 0.7941
## Subject.MatterEmployees -20.251 < 2e-16 ***
## Subject.MatterFerry Service - Hudson River 0.010 0.9918
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls 6.523 6.88e-11 ***
## Subject.MatterMTA Agency Cars / Trucks 0.883 0.3771
## Subject.MatterPolicies, Rules & Regulations 2.288 0.0221 *
## Subject.MatterPublic Hearing -4.197 2.70e-05 ***
## Subject.MatterReasonable Modification 0.022 0.9827
## Subject.MatterSchedules / Reservations 1.763 0.0778 .
## Subject.MatterStation /Bus Stop /Facility /Structure 16.844 < 2e-16 ***
## Subject.MatterTelephone / Website / Mobile Apps 6.800 1.04e-11 ***
## Subject.MatterTrains 11.522 < 2e-16 ***
## Subject.MatterTravel Disruption / Trip Problem 20.751 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 91621 on 275669 degrees of freedom
## Residual deviance: 76136 on 275650 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 76176
##
## Number of Fisher Scoring iterations: 16
Model 3 shows the effect Agency, Subject Matter, Year and Quarter has on the odds of a complaint by MTA customers. This model also shows the interaction between Quarter and Year. The odds of a complaint increases for Quarter by 8.264 (not significant). The odds of a complaint decreases for Year by 4.808 (not significant). For the interaction terms, the odds of a complaint decreases for Year:Quarter by 4.088 (not significant).
m3 <- glm(complaint_binary ~ Agency + Subject.Matter + Year * Quarter, family = binomial, data = mta_feedback3)
summary(m3)
##
## Call:
## glm(formula = complaint_binary ~ Agency + Subject.Matter + Year *
## Quarter, family = binomial, data = mta_feedback3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5445 0.0653 0.1381 0.3796 0.7380
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1.265e+01 4.166e+01
## AgencyMetro-North Railroad 2.241e-01 4.948e-02
## AgencyNYC Buses 1.233e+00 3.789e-02
## AgencySubways 4.262e-01 3.984e-02
## Subject.MatterCommendation -2.079e+01 1.097e+03
## Subject.MatterComplaint 1.425e+01 3.956e+03
## Subject.MatterConstruction / Capital Projects 1.415e+01 1.441e+02
## Subject.MatterCustomer 1.405e+01 5.378e+01
## Subject.MatterEmployees -1.662e+00 8.226e-02
## Subject.MatterFerry Service - Hudson River 1.433e+01 1.399e+03
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls 1.209e+00 1.851e-01
## Subject.MatterMTA Agency Cars / Trucks 4.564e-01 5.120e-01
## Subject.MatterPolicies, Rules & Regulations 2.676e-01 1.155e-01
## Subject.MatterPublic Hearing -1.815e+00 4.290e-01
## Subject.MatterReasonable Modification 1.398e+01 6.432e+02
## Subject.MatterSchedules / Reservations 2.424e-01 1.337e-01
## Subject.MatterStation /Bus Stop /Facility /Structure 1.758e+00 1.040e-01
## Subject.MatterTelephone / Website / Mobile Apps 9.577e-01 1.407e-01
## Subject.MatterTrains 1.238e+00 1.071e-01
## Subject.MatterTravel Disruption / Trip Problem 2.764e+00 1.330e-01
## Year -4.808e-03 2.066e-02
## Quarter 8.264e+00 1.383e+01
## Year:Quarter -4.088e-03 6.857e-03
## z value Pr(>|z|)
## (Intercept) 0.304 0.7615
## AgencyMetro-North Railroad 4.530 5.91e-06 ***
## AgencyNYC Buses 32.537 < 2e-16 ***
## AgencySubways 10.698 < 2e-16 ***
## Subject.MatterCommendation -0.019 0.9849
## Subject.MatterComplaint 0.004 0.9971
## Subject.MatterConstruction / Capital Projects 0.098 0.9217
## Subject.MatterCustomer 0.261 0.7939
## Subject.MatterEmployees -20.207 < 2e-16 ***
## Subject.MatterFerry Service - Hudson River 0.010 0.9918
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls 6.532 6.50e-11 ***
## Subject.MatterMTA Agency Cars / Trucks 0.891 0.3727
## Subject.MatterPolicies, Rules & Regulations 2.316 0.0205 *
## Subject.MatterPublic Hearing -4.230 2.34e-05 ***
## Subject.MatterReasonable Modification 0.022 0.9827
## Subject.MatterSchedules / Reservations 1.812 0.0699 .
## Subject.MatterStation /Bus Stop /Facility /Structure 16.900 < 2e-16 ***
## Subject.MatterTelephone / Website / Mobile Apps 6.808 9.91e-12 ***
## Subject.MatterTrains 11.564 < 2e-16 ***
## Subject.MatterTravel Disruption / Trip Problem 20.787 < 2e-16 ***
## Year -0.233 0.8160
## Quarter 0.598 0.5501
## Year:Quarter -0.596 0.5511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 91621 on 275669 degrees of freedom
## Residual deviance: 76124 on 275647 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 76170
##
## Number of Fisher Scoring iterations: 16
table1 <- htmlreg(list(m1, m2, m3), doctype= FALSE)
pander(table1)
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | 3.10*** | 3.01*** | 12.65 | |
| (0.03) | (0.09) | (41.66) | ||
| AgencyMetro-North Railroad | -0.18*** | 0.22*** | 0.22*** | |
| (0.05) | (0.05) | (0.05) | ||
| AgencyNYC Buses | -0.01 | 1.23*** | 1.23*** | |
| (0.03) | (0.04) | (0.04) | ||
| AgencySubways | 0.38*** | 0.42*** | 0.43*** | |
| (0.04) | (0.04) | (0.04) | ||
| Subject.MatterCommendation | -20.80 | -20.79 | ||
| (1097.25) | (1097.13) | |||
| Subject.MatterComplaint | 14.33 | 14.25 | ||
| (3956.18) | (3956.18) | |||
| Subject.MatterConstruction / Capital Projects | 14.13 | 14.15 | ||
| (144.08) | (144.07) | |||
| Subject.MatterCustomer | 14.04 | 14.05 | ||
| (53.78) | (53.78) | |||
| Subject.MatterEmployees | -1.67*** | -1.66*** | ||
| (0.08) | (0.08) | |||
| Subject.MatterFerry Service - Hudson River | 14.33 | 14.33 | ||
| (1398.72) | (1398.55) | |||
| Subject.MatterMetroCard/Tickets/E-Zpass & Tolls | 1.21*** | 1.21*** | ||
| (0.19) | (0.19) | |||
| Subject.MatterMTA Agency Cars / Trucks | 0.45 | 0.46 | ||
| (0.51) | (0.51) | |||
| Subject.MatterPolicies, Rules & Regulations | 0.26* | 0.27* | ||
| (0.12) | (0.12) | |||
| Subject.MatterPublic Hearing | -1.80*** | -1.81*** | ||
| (0.43) | (0.43) | |||
| Subject.MatterReasonable Modification | 13.97 | 13.98 | ||
| (643.30) | (643.18) | |||
| Subject.MatterSchedules / Reservations | 0.24 | 0.24 | ||
| (0.13) | (0.13) | |||
| Subject.MatterStation /Bus Stop /Facility /Structure | 1.75*** | 1.76*** | ||
| (0.10) | (0.10) | |||
| Subject.MatterTelephone / Website / Mobile Apps | 0.96*** | 0.96*** | ||
| (0.14) | (0.14) | |||
| Subject.MatterTrains | 1.23*** | 1.24*** | ||
| (0.11) | (0.11) | |||
| Subject.MatterTravel Disruption / Trip Problem | 2.76*** | 2.76*** | ||
| (0.13) | (0.13) | |||
| Year | -0.00 | |||
| (0.02) | ||||
| Quarter | 8.26 | |||
| (13.83) | ||||
| Year:Quarter | -0.00 | |||
| (0.01) | ||||
| AIC | 91266.63 | 76175.67 | 76170.16 | |
| BIC | 91308.73 | 76386.21 | 76412.28 | |
| Log Likelihood | -45629.31 | -38067.84 | -38062.08 | |
| Deviance | 91258.63 | 76135.67 | 76124.16 | |
| Num. obs. | 275670 | 275670 | 275670 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
anova(m1, m2, m3, test = "Chisq")
Based on the Pander table and ANOVA (likelihood ratio test), Model 3 is the best fit model. Model 3 as the lowest AIC score (76270.16) and lowest residual deviance (76124.16).
visreg(m3,"Quarter", by = "Year", scale="response")
This shows that as the Quarters increase in a year, the the odds of a complaint increase as well. This also shows that as the years increase, the odds of a complaint decreases. This is in line with the interpretation for Model 3.
visreg(m3,"Year", by = "Agency", scale="response")
This shows that odds of a complaint is more likely with NYC Buses, then Subways, then Metro-North Railroad, and finally Long Island Railroad. Just like the previous graph, the odds of a complaint decreases by the year.
In conclusion, the odds of a complaint by MTA customers increaes more with NYC Buses and Subways than Metr-North Railroad and Long Island Railroad.
Furthmore, the odds of a complaint increases if the Subject Matter is Complaint, Construction/Capital Projects, Customers, Reasonable Modification, Travel Distruption/Trip Problems, and others (see Model 3). The odds decreaes for Subject Matter Employees and Public Hearing.
Finally, the odds of a complaint decreases for each year from 2014-2018, increases by Quarter from 1-4. THe interaction between the two variables show that the odds of a complaint decreases, however it is not significant.