Probability of Metropolitan Transportation Authority (MTA) Complaint by Agency, Subject Matter, Quarter, and Year

For this week’s assignment, I will try to find the odds of a complaint by MTA customers using the “MTA Customer Feedback Data: Beginning 2014” dataset from NYC Open Data. I had to change my “NY Bus Breakdown and Delays” dataset from the previous week because there were not enough variables for me to play around. However, both datasets are part of the NYC Open Data Transporation category and my interest.

Depedent Variable

  1. Complaint

Independet variables

  1. Agency: The agency that the e-mail references; Current values are: Long Island Rail Road; Metro-North Rail Road; NYC Buses; Subways.
  2. Subject Matter: Subject of the email; covers a variety of areas such as: Employees; MetroCard/Tickets/EZ-Pass & Tolls, Rules & Regulations; Station/Bus Stop/Facility/Structure; and many others.
  3. Quarter: The quarter in which the complaint or commendation was entered. 1 = Jan - Mar; 2 = Apr - Jun; 3 = Jul - Sep; 4 = Oct - Dec.
  4. Year: The year the complaint or commendation was entered.

Packages

I unloaded the functions I will be using/I might use.

library(readr)
library(dplyr)
library(Zelig)
library(texreg)
library(pander)
library(visreg)
library(effects)
library(sjlabelled)

Opening the dataset

I imported the dataset to R.

mta_feedback<-read_csv("C:/Users/wroni/OneDrive/Documents/QC MADASR/SOC 712/mta_feedback.csv")
## Parsed with column specification:
## cols(
##   Agency = col_character(),
##   `Commendation or Complaint` = col_character(),
##   `Subject Matter` = col_character(),
##   `Subject Detail` = col_character(),
##   `Issue Detail` = col_character(),
##   Year = col_double(),
##   Quarter = col_double(),
##   `Branch/Line/Route` = col_character()
## )

Removing the labels

I removed the labels of the variables, which will allow me to recode the dependent variable into a binary.

mta_feedback2<-sjlabelled::remove_all_labels(mta_feedback)

Summary of variables (labels removed)

head(mta_feedback2)

Recoding the dataset

Since I will be focusing on the odds of a complaint by MTA customers, I recoded the dependent vaiable between 0 (no complaint) or 1 (complaint).

mta_feedback3 <- mutate(mta_feedback2, complaint_binary= recode(Commendation.or.Complaint,`Complaint` = 1, `Commendation` = 0))

Summary of variables (dependent recoded)

head(mta_feedback3)

Model 1

Model 1 shows the effect Agency has on the odds of a complaint by MTA customers. The odds of a complaint increases for Agency Subways by .0376623 (statistically significant). However, the odds of a complaint decreases for Agency NYC Buses by .005002 (not statistically signifcant) and Agency Metro-North Railroad by 0.183167 (statistically significant).

m1 <- glm(complaint_binary ~ Agency, family = binomial, data = mta_feedback3)
summary(m1)
## 
## Call:
## glm(formula = complaint_binary ~ Agency, family = binomial, data = mta_feedback3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6476   0.2470   0.2979   0.2979   0.3250  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 3.097879   0.031901  97.109  < 2e-16 ***
## AgencyMetro-North Railroad -0.183167   0.046277  -3.958 7.56e-05 ***
## AgencyNYC Buses            -0.005002   0.034467  -0.145    0.885    
## AgencySubways               0.376623   0.037277  10.103  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 91621  on 275669  degrees of freedom
## Residual deviance: 91259  on 275666  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 91267
## 
## Number of Fisher Scoring iterations: 6

Model 2

Model 2 shows the effect Agency and Subject Matter has on the odds of a complaint by MTA customers. The odds of a complaint increases for Subject Matter Complaint by 14.33084 (not significant), Subject Matter Construction/Capital Projects by 14.1971 (not significant), Subject Matter Customers by 13.03700 (not significant), Subject Reasonable Modificaitons by 13.96709 (not significant), Subject Matter Travel Distruption / Trip Problems by 2.75878 (significant), Subject Matter Trains by 1.23333 (significant), and Subject Matter Station/Bus Stop/Facility/Structure by 1.75159 (significant). THe odds of a complaint decreases for Subject Matter Employees by 1.66550 (significant) and Public Hearing by 1.80039 (not sigificant).

m2 <- glm(complaint_binary ~ Agency + Subject.Matter, family = binomial, data = mta_feedback3)
summary(m2)
## 
## Call:
## glm(formula = complaint_binary ~ Agency + Subject.Matter, family = binomial, 
##     data = mta_feedback3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5206   0.0638   0.1366   0.3823   0.7213  
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                             3.01393    0.08944
## AgencyMetro-North Railroad                              0.22130    0.04947
## AgencyNYC Buses                                         1.23122    0.03788
## AgencySubways                                           0.42242    0.03982
## Subject.MatterCommendation                            -20.80130 1097.24701
## Subject.MatterComplaint                                14.33084 3956.18034
## Subject.MatterConstruction / Capital Projects          14.12971  144.07561
## Subject.MatterCustomer                                 14.03700   53.78364
## Subject.MatterEmployees                                -1.66550    0.08224
## Subject.MatterFerry Service - Hudson River             14.33084 1398.72096
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls         1.20755    0.18512
## Subject.MatterMTA Agency Cars / Trucks                  0.45223    0.51197
## Subject.MatterPolicies, Rules & Regulations             0.26434    0.11551
## Subject.MatterPublic Hearing                           -1.80039    0.42896
## Subject.MatterReasonable Modification                  13.96709  643.29605
## Subject.MatterSchedules / Reservations                  0.23576    0.13370
## Subject.MatterStation /Bus Stop /Facility /Structure    1.75159    0.10399
## Subject.MatterTelephone / Website / Mobile Apps         0.95660    0.14067
## Subject.MatterTrains                                    1.23333    0.10704
## Subject.MatterTravel Disruption / Trip Problem          2.75878    0.13295
##                                                      z value Pr(>|z|)    
## (Intercept)                                           33.699  < 2e-16 ***
## AgencyMetro-North Railroad                             4.473 7.71e-06 ***
## AgencyNYC Buses                                       32.504  < 2e-16 ***
## AgencySubways                                         10.609  < 2e-16 ***
## Subject.MatterCommendation                            -0.019   0.9849    
## Subject.MatterComplaint                                0.004   0.9971    
## Subject.MatterConstruction / Capital Projects          0.098   0.9219    
## Subject.MatterCustomer                                 0.261   0.7941    
## Subject.MatterEmployees                              -20.251  < 2e-16 ***
## Subject.MatterFerry Service - Hudson River             0.010   0.9918    
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls        6.523 6.88e-11 ***
## Subject.MatterMTA Agency Cars / Trucks                 0.883   0.3771    
## Subject.MatterPolicies, Rules & Regulations            2.288   0.0221 *  
## Subject.MatterPublic Hearing                          -4.197 2.70e-05 ***
## Subject.MatterReasonable Modification                  0.022   0.9827    
## Subject.MatterSchedules / Reservations                 1.763   0.0778 .  
## Subject.MatterStation /Bus Stop /Facility /Structure  16.844  < 2e-16 ***
## Subject.MatterTelephone / Website / Mobile Apps        6.800 1.04e-11 ***
## Subject.MatterTrains                                  11.522  < 2e-16 ***
## Subject.MatterTravel Disruption / Trip Problem        20.751  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 91621  on 275669  degrees of freedom
## Residual deviance: 76136  on 275650  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 76176
## 
## Number of Fisher Scoring iterations: 16

Model 3

Model 3 shows the effect Agency, Subject Matter, Year and Quarter has on the odds of a complaint by MTA customers. This model also shows the interaction between Quarter and Year. The odds of a complaint increases for Quarter by 8.264 (not significant). The odds of a complaint decreases for Year by 4.808 (not significant). For the interaction terms, the odds of a complaint decreases for Year:Quarter by 4.088 (not significant).

m3 <- glm(complaint_binary ~ Agency + Subject.Matter + Year * Quarter, family = binomial, data = mta_feedback3)
summary(m3)
## 
## Call:
## glm(formula = complaint_binary ~ Agency + Subject.Matter + Year * 
##     Quarter, family = binomial, data = mta_feedback3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5445   0.0653   0.1381   0.3796   0.7380  
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                           1.265e+01  4.166e+01
## AgencyMetro-North Railroad                            2.241e-01  4.948e-02
## AgencyNYC Buses                                       1.233e+00  3.789e-02
## AgencySubways                                         4.262e-01  3.984e-02
## Subject.MatterCommendation                           -2.079e+01  1.097e+03
## Subject.MatterComplaint                               1.425e+01  3.956e+03
## Subject.MatterConstruction / Capital Projects         1.415e+01  1.441e+02
## Subject.MatterCustomer                                1.405e+01  5.378e+01
## Subject.MatterEmployees                              -1.662e+00  8.226e-02
## Subject.MatterFerry Service - Hudson River            1.433e+01  1.399e+03
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls       1.209e+00  1.851e-01
## Subject.MatterMTA Agency Cars / Trucks                4.564e-01  5.120e-01
## Subject.MatterPolicies, Rules & Regulations           2.676e-01  1.155e-01
## Subject.MatterPublic Hearing                         -1.815e+00  4.290e-01
## Subject.MatterReasonable Modification                 1.398e+01  6.432e+02
## Subject.MatterSchedules / Reservations                2.424e-01  1.337e-01
## Subject.MatterStation /Bus Stop /Facility /Structure  1.758e+00  1.040e-01
## Subject.MatterTelephone / Website / Mobile Apps       9.577e-01  1.407e-01
## Subject.MatterTrains                                  1.238e+00  1.071e-01
## Subject.MatterTravel Disruption / Trip Problem        2.764e+00  1.330e-01
## Year                                                 -4.808e-03  2.066e-02
## Quarter                                               8.264e+00  1.383e+01
## Year:Quarter                                         -4.088e-03  6.857e-03
##                                                      z value Pr(>|z|)    
## (Intercept)                                            0.304   0.7615    
## AgencyMetro-North Railroad                             4.530 5.91e-06 ***
## AgencyNYC Buses                                       32.537  < 2e-16 ***
## AgencySubways                                         10.698  < 2e-16 ***
## Subject.MatterCommendation                            -0.019   0.9849    
## Subject.MatterComplaint                                0.004   0.9971    
## Subject.MatterConstruction / Capital Projects          0.098   0.9217    
## Subject.MatterCustomer                                 0.261   0.7939    
## Subject.MatterEmployees                              -20.207  < 2e-16 ***
## Subject.MatterFerry Service - Hudson River             0.010   0.9918    
## Subject.MatterMetroCard/Tickets/E-Zpass & Tolls        6.532 6.50e-11 ***
## Subject.MatterMTA Agency Cars / Trucks                 0.891   0.3727    
## Subject.MatterPolicies, Rules & Regulations            2.316   0.0205 *  
## Subject.MatterPublic Hearing                          -4.230 2.34e-05 ***
## Subject.MatterReasonable Modification                  0.022   0.9827    
## Subject.MatterSchedules / Reservations                 1.812   0.0699 .  
## Subject.MatterStation /Bus Stop /Facility /Structure  16.900  < 2e-16 ***
## Subject.MatterTelephone / Website / Mobile Apps        6.808 9.91e-12 ***
## Subject.MatterTrains                                  11.564  < 2e-16 ***
## Subject.MatterTravel Disruption / Trip Problem        20.787  < 2e-16 ***
## Year                                                  -0.233   0.8160    
## Quarter                                                0.598   0.5501    
## Year:Quarter                                          -0.596   0.5511    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 91621  on 275669  degrees of freedom
## Residual deviance: 76124  on 275647  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 76170
## 
## Number of Fisher Scoring iterations: 16

Information Criteria

table1 <- htmlreg(list(m1, m2, m3), doctype= FALSE)

pander(table1)
Statistical models
Model 1 Model 2 Model 3
(Intercept) 3.10*** 3.01*** 12.65
(0.03) (0.09) (41.66)
AgencyMetro-North Railroad -0.18*** 0.22*** 0.22***
(0.05) (0.05) (0.05)
AgencyNYC Buses -0.01 1.23*** 1.23***
(0.03) (0.04) (0.04)
AgencySubways 0.38*** 0.42*** 0.43***
(0.04) (0.04) (0.04)
Subject.MatterCommendation -20.80 -20.79
(1097.25) (1097.13)
Subject.MatterComplaint 14.33 14.25
(3956.18) (3956.18)
Subject.MatterConstruction / Capital Projects 14.13 14.15
(144.08) (144.07)
Subject.MatterCustomer 14.04 14.05
(53.78) (53.78)
Subject.MatterEmployees -1.67*** -1.66***
(0.08) (0.08)
Subject.MatterFerry Service - Hudson River 14.33 14.33
(1398.72) (1398.55)
Subject.MatterMetroCard/Tickets/E-Zpass & Tolls 1.21*** 1.21***
(0.19) (0.19)
Subject.MatterMTA Agency Cars / Trucks 0.45 0.46
(0.51) (0.51)
Subject.MatterPolicies, Rules & Regulations 0.26* 0.27*
(0.12) (0.12)
Subject.MatterPublic Hearing -1.80*** -1.81***
(0.43) (0.43)
Subject.MatterReasonable Modification 13.97 13.98
(643.30) (643.18)
Subject.MatterSchedules / Reservations 0.24 0.24
(0.13) (0.13)
Subject.MatterStation /Bus Stop /Facility /Structure 1.75*** 1.76***
(0.10) (0.10)
Subject.MatterTelephone / Website / Mobile Apps 0.96*** 0.96***
(0.14) (0.14)
Subject.MatterTrains 1.23*** 1.24***
(0.11) (0.11)
Subject.MatterTravel Disruption / Trip Problem 2.76*** 2.76***
(0.13) (0.13)
Year -0.00
(0.02)
Quarter 8.26
(13.83)
Year:Quarter -0.00
(0.01)
AIC 91266.63 76175.67 76170.16
BIC 91308.73 76386.21 76412.28
Log Likelihood -45629.31 -38067.84 -38062.08
Deviance 91258.63 76135.67 76124.16
Num. obs. 275670 275670 275670
p < 0.001, p < 0.01, p < 0.05

ANOVA

anova(m1, m2, m3, test = "Chisq")

Based on the Pander table and ANOVA (likelihood ratio test), Model 3 is the best fit model. Model 3 as the lowest AIC score (76270.16) and lowest residual deviance (76124.16).

Plotting

visreg(m3,"Quarter", by = "Year", scale="response")

This shows that as the Quarters increase in a year, the the odds of a complaint increase as well. This also shows that as the years increase, the odds of a complaint decreases. This is in line with the interpretation for Model 3.

visreg(m3,"Year", by = "Agency", scale="response")

This shows that odds of a complaint is more likely with NYC Buses, then Subways, then Metro-North Railroad, and finally Long Island Railroad. Just like the previous graph, the odds of a complaint decreases by the year.

Conclusion

In conclusion, the odds of a complaint by MTA customers increaes more with NYC Buses and Subways than Metr-North Railroad and Long Island Railroad.

Furthmore, the odds of a complaint increases if the Subject Matter is Complaint, Construction/Capital Projects, Customers, Reasonable Modification, Travel Distruption/Trip Problems, and others (see Model 3). The odds decreaes for Subject Matter Employees and Public Hearing.

Finally, the odds of a complaint decreases for each year from 2014-2018, increases by Quarter from 1-4. THe interaction between the two variables show that the odds of a complaint decreases, however it is not significant.