Data Preparation & Description

The dataset consists of IT support tickets for a small liberal arts college in NYC. The IT support tickets encompass technical, computer, software and access issues for students, faculty and staff for the full year 2021 through Nov 4, 2021. This covers one full academic year Aug 2021 - June 2022 (fall and spring). Columns that identify the college and its students and employees have been removed.

#Libraries
library(tidyverse)
library(lubridate)
library(psych)
library(ggplot2)
library(scales)

#Load the data
it_support_tix <- read.csv("https://raw.githubusercontent.com/johnnydrodriguez/data606project/main/IT_Tickets_2022_2021.csv", na.strings=c("","NA"))

#Converts character date column into dates
it_support_tix$resolved_at <- mdy_hm(it_support_tix$resolved_at)
it_support_tix$opened_at <- mdy_hm(it_support_tix$opened_at)

#Calculates the ticket age (date resolved - date opened)
it_support_tix <- it_support_tix %>% 
  mutate(age_at_resolution_days =  round(difftime(it_support_tix$resolved_at, it_support_tix$opened_at, units = "days"), digits = 2))

#To create summary statistics, the age_at_resolution is converted to numeric
it_support_tix$age_at_resolution_days <- as.numeric(it_support_tix$age_at_resolution_days)

glimpse(it_support_tix)
## Rows: 14,069
## Columns: 12
## $ number                 <chr> "INC0120526", "INC0120422", "INC0120775", "INC0…
## $ contact_type           <chr> "Self-service", "Email", "Phone", "Email", "Ema…
## $ u_wait_reason          <chr> NA, NA, NA, NA, NA, NA, "Waiting for Pickup", N…
## $ assignment_group       <chr> "Service Desk", "Service Desk", "Service Desk",…
## $ closed_at              <chr> "11/4/22 18:00", "11/4/22 18:00", NA, "11/4/22 …
## $ resolved_at            <dttm> 2022-11-01 18:00:00, 2022-11-01 18:00:00, NA, …
## $ u_subcategory          <chr> "Computer (Desktop/Laptop)", "ERP", "Desktop Ap…
## $ u_symptom              <chr> "How To/Question", "Configure/Modify", "Securit…
## $ opened_at              <dttm> 2022-10-21 17:03:00, 2022-10-18 10:52:00, 2022…
## $ sys_mod_count          <int> 11, 9, 5, 7, 14, 14, 14, 4, 1, 3, 4, 4, 2, 12, …
## $ reassignment_count     <int> 0, 2, 0, 1, 1, 2, 1, 1, 0, 0, 0, 0, 0, 1, 5, 0,…
## $ age_at_resolution_days <dbl> 11.04, 14.30, NA, 0.31, NA, NA, NA, 0.06, NA, 0…

Research question

Does the Contact type (the method the user first initiates the support ticket) predict the age of ticket at resolution?

Why this matters to IT operations managers: IT support principles typically promote resolution on first contact over the shortest period of time until resolution. IT operations managers will attempt to funnel requests into contact channels that allow IT analysts to resolve issues as quickly as possible.

Cases

Each case represents an IT support incident ie, a user has been affected by a technical issue that needs to be resolved. There are 14,069 cases and 12 variables.

Data collection

Each incident is either system-generated when the IT support request is made through email or the self service portal. A support ticket is manually created by an IT analyst when a user makes a support request via phone or walks into the support office.

Type of study

This is an observational study.

Data Source

The data is exported from an IT support database which stores data on each support interaction.

Dependent Variable

The dependent variable is ticket age (in days) until resolution (date resolved - date created). This value is numeric.

Independent Variable(s)

The independent variable is the contact type - ie, one of 4 methods the user can use to initiate a support requests: email, phone, walk-in, self-service.

Relevant summary statistics

The distribution of the age at resolution is heavily skewed to the right. Analysis that depends on normal distribution approximations may not be possible.

# Summary stats of the age at resolution
describe(it_support_tix$age_at_resolution_days)
##    vars     n  mean    sd median trimmed  mad min    max  range skew kurtosis
## X1    1 13849 10.53 30.06   1.19     3.8 1.73  -3 438.21 441.21 5.98    45.47
##      se
## X1 0.26
# More Summary stats of the age at resolution
summary(it_support_tix$age_at_resolution_days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   -3.00    0.06    1.19   10.53    9.01  438.21     220
#Proportional table of suppport tickets by the contact type
prop.table(table(it_support_tix$contact_type, useNA='ifany')) * 100
## 
##        Email        Phone Self-service      Walk-in         <NA> 
##    59.734167    12.723008    22.105338     4.193617     1.243870
# Summary stats of age at resolutions grouped by the contact type
describeBy(it_support_tix$age_at_resolution_days, 
           group = it_support_tix$contact_type, mat=TRUE)
##     item       group1 vars    n      mean       sd median  trimmed      mad
## X11    1        Email    1 8287 11.271945 31.41242   2.03 4.254765 2.965200
## X12    2        Phone    1 1775  5.052231 22.52696   0.04 1.204898 0.044478
## X13    3 Self-service    1 3038 13.044668 31.85594   2.91 5.162418 4.255062
## X14    4      Walk-in    1  576  5.262066 20.81400   0.04 1.097900 0.044478
##       min    max  range     skew kurtosis        se
## X11 -3.00 438.21 441.21 6.030520 46.40055 0.3450665
## X12 -0.03 328.51 328.54 8.992980 94.96848 0.5346915
## X13 -2.99 328.96 331.95 4.649317 26.39643 0.5779583
## X14 -0.08 257.91 257.99 7.706789 72.17335 0.8672502
# Distribution of tickets by age at resolution
ggplot(it_support_tix, aes(x=age_at_resolution_days)) + geom_histogram(binwidth = 20)