The analysis of the given datasets which are as following is done on R.
We have included the code within the documentation within the document to let the technical users leverage it.First we will be analyzing the Service Desk dataset.
library(readxl)
ServiceDeskModified <- read_excel("D:/UIC/3rd Semester/Capstone/datafrommwrd/ServiceDeskModified.xlsx")
Sdm<-ServiceDeskModified
Function to know which column has categorical values so that it can be converted to factor variables.Have to use nrow , length does not work for a tibble
differentValues<-data.frame()
for(i in 1:ncol(Sdm)){
differentValues[i,1]<-nrow(unique(Sdm[,i]))
}
colsFactor<-Sdm[!seq(from = 1,to = ncol(Sdm),by = 1) %in% which(differentValues>20)]
Sdm[,c(2:4,6,7,8)]<-lapply(colsFactor,FUN =factor)
Adding few columns after change in requirements
library(readxl)
ServiceDesk_Data <- read_excel("D:/UIC/MyResearch/Mwrd/mwrdAnalysis/ServiceDesk Data.xlsx",
col_types = c("numeric", "numeric", "text",
"text", "text", "text", "text", "text",
"text", "text", "date", "date", "text",
"text", "numeric", "numeric", "numeric",
"text", "text", "text", "text", "text",
"text", "text", "text", "text", "text",
"text", "text", "text", "text", "text"))
ServiceDesk_Data<-ServiceDesk_Data[,c(4,11,12,18)]
Sdm<-cbind(Sdm,ServiceDesk_Data)
Let’s take a look at the dataset and the variables after a bit of modification
ID | STATE | PRIORITY | CATEGORY | SUBCATEGORY | IS_SERVICE_REQUEST | SITE | DEPARTMENT | DESCRIPTION_NO_HTML | CREATED_AT | UPDATED_AT | CREATED_BY |
---|---|---|---|---|---|---|---|---|---|---|---|
18622955 | Assigned | Medium | IT Audio/Visual/Conference | In-District Video Conference Setup | TRUE | Stickney Water Reclamation Plant | Monitoring & Research | The IT Department is happy to assist with your upcoming event. Services offered are: -Audio/Video Conference Bridge -Recoding A/V Live streaming (Board Room Only) -Projector -Setup and Breakdown of Laptops -Printer -Onsite IT Support Assistance Needed Please share the Room, Date/Time and description of your event. | 2017-10-04 13:22:47 | 2017-10-04 14:05:43 | OKAL, JOHN |
18622953 | Assigned | None | Core-Applications | MWRD Website Requests | TRUE | MOB | General Administration | The ITD department will process website update requests submitted through the Service Desk. Please provide the following information to initiate the process. Name of QA Approver(s) - Please add as a CC to the request if they are in addition to yourself. Is this a Press Release ? Is this a New Page or URL ? URL path of the content to be updated Posting deadline or time details Who can see this content (Public, Internal, Both) New/Replacement content description Please include attachment files as needed using the “Attach Files” hyperlink | 2017-10-04 13:22:43 | 2017-10-04 13:23:21 | LARKIN, DIANE |
18622457 | New | Medium | ServiceDesk - Triage | NA | FALSE | Northside/Obrien Plant | Monitoring & Research | The chip key for the pool vehicle 1704 is not functioning. Gets an error message like this “key not in the file system” Raju Antony Environmental Specialist MWRDGC|Industrial Waste Division O’Brien Water Reclamation Plant 3500 Howard St., Skokie, Illinois 600076-4013 p: 847-568-8237 | f: 847-568-8240 raju.antony@mwrd.org | 2017-10-04 13:09:20 | 2017-10-04 13:09:20 | ANTONY, RAJU |
18622304 | Resolved | Medium | ServiceDesk - Triage | Re-Entered as a Service Request | FALSE | Stickney Water Reclamation Plant | Monitoring & Research | See Below. Marie Biron Adminstrative Specialist Environmental Monitoring and Research Division Monitoring and Research Department 6001 W Pershing Road Cicero, IL 60804 PH:708.588.4264 Email: marie.biron@mwrd.org From: Biron, Marie Sent: Wednesday, October 04, 2017 1:03 PM To: Help Desk (MOB) HelpD@mwrd.org Cc: Nwoko, Marcel NwokoM@mwrd.org; Weaver, John WeaverJ@mwrd.org Subject: Special M&R Seminar We have a special seminar taking place tomorrow in the following conference rooms: Egan Auditorium KWRP Admin Building Conference Room LASMA Auditorium NSWRP Admin Building Room 115 PCBX Conference Room (swrp) ALD Tech Services Conference Room (swrp) 7th Floor Corner Training Room MOBA The seminar is taking place from 8:45 am to 9:45 am. I will need assistance making sure the rooms are all connected. I have already notified the contacts for each room. Marie Biron Adminstrative Specialist Environmental Monitoring and Research Division Monitoring and Research Department 6001 W Pershing Road Cicero, IL 60804 PH:708.588.4264 Email: marie.biron@mwrd.org | 2017-10-04 13:05:15 | 2017-10-04 13:23:58 | BIRON, MARIE |
18622045 | Resolved | Medium | Security Modification | Network Account Password Reset | TRUE | Stickney Water Reclamation Plant | Maintenance & Operations | The IT Service desk can assist you with unlocking and/or resetting your Network, SAP, ReadSoft, Portal or voicemail password. Please provide your Employees ID, network login username and a contact telephone number where you can be reached. For security reasons, the service desk will need to contact you on your phone number and request an additional security details before modifying your account. | 2017-10-04 12:59:56 | 2017-10-04 13:02:33 | OKAL, JOHN |
How many tickets have been resolved
From the plot we see that majority of them are closed ? We don’t have the date embedded in the data source file else we would have calculated
What category are most of my tickets ?
From the plot we see that majority of them are closed ? We don’t have the date embedded in the data source file else we would have calculated
Which Plant are raising the most requests for tickets ?
From the plot we see that majority of them are closed ? We don’t have the date embedded in the data source file else we would have calculated
Which department is raising most of the tickets ?
From the plot we see that majority of them are closed
We want to see who has created most of the tickets
library(knitr)
Sdm$CREATED_BY<-as.factor(Sdm$CREATED_BY)
Most_Tickets_Created<-as.data.frame(head(sort(table(Sdm$CREATED_BY),decreasing = T),n=50))
kable(Most_Tickets_Created,caption = "Most_Tickets_Created")
Var1 | Freq |
---|---|
UKPONG, IMEH | 2116 |
OKAL, JOHN | 1899 |
JONES, RON | 837 |
WRIGHT, SEMAJ | 232 |
SMITH, CHRISTOPHER | 219 |
LARKIN, DIANE | 202 |
KODY, JOSEPH | 137 |
YOUNG, KEVIN | 104 |
REICHLING, JOHN | 95 |
RODGERS, GERALD | 85 |
JEFFERSON, DEJIAH | 82 |
NWOKO, MARCEL | 58 |
ROBERTS, LASHUN | 55 |
CHAPPELL, YVONNE | 53 |
MCENERY, THOMAS | 52 |
FOY, CASSANDRA | 48 |
MALONE, LARRY | 48 |
SANCHEZ, VICTOR | 41 |
JOHNSTON, SAM | 40 |
MIKA, KEVIN | 31 |
SIA, JOEL | 28 |
KELLY, SEAN | 26 |
DIZON, ARTURO | 25 |
KANE, PATRICK | 25 |
ECKLUND, SCOTT | 24 |
RAHMAN, ABDUR | 23 |
LOPATKA, MARK | 20 |
MCCAGUE, RAYMOND | 18 |
THOMPSON, RONALD | 17 |
KAZMIERCZAK, KIM | 16 |
MONDLOCK, ROGER | 16 |
KHALID, ARSHAD | 15 |
MUSCARELLO, THOMAS | 15 |
WOJKOVICH, RICHARD | 15 |
CARTER, HUGH | 13 |
DOMINA, JOSHUA | 12 |
EBERHART, ERIC | 11 |
JACKSON, CHENTILE | 11 |
WARREN, ETHAN | 11 |
COLVIN, CYNTHIA | 10 |
SMITH, ROGER | 10 |
CHENG, ERVIN | 9 |
SENESE, SERGIO | 9 |
BONNER, ROXANNE | 8 |
KONKAPAKA, SHARMA | 8 |
SCHRAMM, MICHAEL | 8 |
SONTA, BIRUTE | 8 |
LABAN, AHMAD | 7 |
PATLOGAN, ILA | 7 |
WRIGHT, SHARON | 7 |
Since our concerned departments are MOB and MOBA we are filtering data and then analyzing by department
library(dplyr)
g4<-ggplot(data=(Sdm %>% filter(SITE %in% c("MOBA","MOB"))),aes(x=SITE,Y=DEPARTMENT))+
geom_bar(aes(x=SITE,fill=DEPARTMENT))
plot(g4)
We see that most of the tickets raised are for Information Technology department in MOB and Engineering department in MOBA
Let’s us see the priority of tickets in these two departments
library(dplyr)
g4<-ggplot(data=(Sdm %>% filter(SITE %in% c("MOBA","MOB"))),aes(x=SITE,Y=PRIORITY))+
geom_bar(aes(x=SITE,fill=PRIORITY))
plot(g4)
In both the departments tickets generated are of priority medium
The approach is to to count the frequency of words that matter and remove those words out which are of common English and to remove those words which have zero information content ( from our perspective and client’s perspective )
Step 1 : Collapse all the descriptions into a paragraph Step 2 : Replace punctuations such as ,_:/ etc. Step 3 : Repace all the digits Step 3 : Replace single letter words Step 4 : Replace stopwords words such as a,and,of,the Step 5 : Delete additional words in a retrospective manner Step 6 : Make a textbag of words and after that create a frequency table
##### Text Mining
text<-paste(Sdm$DESCRIPTION_NO_HTML,collapse = "")
write(text,"description.txt")
#punctuation replacement
text2<-gsub(pattern = "\\W",replacement = " ",text)
# digits replacement
text2<-gsub(pattern="\\d",replacement = "",text2)
text2<-gsub(pattern = "\\b[A-z]\\b{1}",replacement = "",text2)
library(tm)
# words removed with no information
text2<-removeWords(text2,stopwords())
library(stringr)
textbagWithoutCleaning<-str_split(text2,pattern="\\s+")
textbagWithoutCleaning<-unlist(textbagWithoutCleaning)
write.csv(head(sort(table(textbagWithoutCleaning),decreasing = T),n=300),file = "withoutcleaning.csv")
text2<-removeWords(text2,words=c("mwrd","Please","org","User","can","will","The","Thanks","user","From","Thank","To","reached","get","MWRD","see","th","like","This","Can","able","If","We","For","know","use","It","Hi","need","needed","needs","please","Help","My","now","trying","used","help","also","via"))
textbag<-str_split(text2,pattern="\\s+")
textbag<-unlist(textbag)
write.csv(head(sort(table(textbag),decreasing = T),n=300),file = "withcleaning.csv")
We will be creating a wordcloud as well as the frequencies of the word
Creating a wordcloud
library(RColorBrewer)
library(wordcloud)
wordcloud(textbag,min.freq = 250,random.order = FALSE,colors =brewer.pal(8,"Dark2"),scale = c(4,0.5))
library(readr)
Change_Request_Form<- read_csv("D:/UIC/3rd Semester/Capstone/datafrommwrd/Change Request Form Modified.csv")
kable(Change_Request_Form[1:5,],caption = "Change Request Form")
Status Indicator | Create Date | Maintenance Date | Responsible IT Area | Impact Category | Purpose Group | Emergency / Time Senstive |
---|---|---|---|---|---|---|
NA | 10/5/2017 13:52 | 10/19/2017 | GIS | Minor | Preventive | NA |
NA | 10/5/2017 11:42 | 10/19/2017 | Network Infrastructure | Major | Enhancement | NA |
NA | 10/5/2017 11:38 | 10/18/2017 | Network Infrastructure | Major | Enhancement | NA |
NA | 9/29/2017 10:36 | 10/11/2017 | Desktop Engineering (Citrix, SCCM) | Minor | Enhancement | NA |
Follow-up (See Status Notes) | 9/27/2017 10:53 | 9/28/2017 | Desktop Engineering (Citrix, SCCM) | Minor | Fix / Patch | NA |
We see that our data frame has dates.But while importing from a CSV it is read as a character.To convert this column for meaningful purposes we will convert it regex in R.Now we will extract the time and add a column for time
library("dplyr")
Change_Request_Form<-mutate(.data = Change_Request_Form,CreatedTime=gsub(pattern ="[0-9]{1,}/[0-9]{1,}/[0-9]{1,}" ,replacement ="" ,x = Change_Request_Form$`Create Date`))
We will pull the dates now.
library("dplyr")
Change_Request_Form<-mutate(.data = Change_Request_Form,CreatedDate=gsub(pattern ="([0-9]{1,}/[0-9]{1,}/[0-9]{1,}).*" ,replacement ="\\1" ,x = Change_Request_Form$`Create Date`))
We need to delete the existing column for Date time and create proper formatting for Date and Time
Change_Request_Form$CreatedDate<-as.Date(x = Change_Request_Form$CreatedDate,format = "%m/%d/%Y")
Change_Request_Form$CreatedTime<-trimws(Change_Request_Form$CreatedTime,which="left")
library("lubridate")
Change_Request_Form$CreatedTime<-hm(Change_Request_Form$CreatedTime)
Change_Request_Form$`Maintenance Date`<-as.Date(x = Change_Request_Form$`Maintenance Date`,format = "%m/%d/%Y")
Change_Request_Form<-Change_Request_Form[,c(1,8,9,3,4,5,6,7)]
We will be creating a difference in dates which is the most important parameter. This will explain us in how many days the response team responded
Change_Request_Form<-mutate(.data = Change_Request_Form,DiffDays= difftime(time1 = Change_Request_Form$`Maintenance Date`,time2 = Change_Request_Form$CreatedDate,units = "days"))
We can get the descriptive statistics for DiffDays
summary(as.numeric(Change_Request_Form$DiffDays))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.00 1.00 9.00 16.14 17.00 262.00
DiffDays<-as.data.frame(as.numeric(Change_Request_Form$DiffDays))
We can see that Min=-5 and Max=262.The mean is 16 days and Median is 9 days. We would like to know how many tickets are solved within a given time duration. We will be creating bins for different tickets
D<-data.frame(IntervalDays=c("(-7)-0","0-7","7-14","14-292"),MaintainFreq=c("13","92","59","70"))
View(D)
g5<-ggplot(data=D)+
geom_col(aes(x=IntervalDays,y=MaintainFreq))
plot(g5)
kable(D[1:4,],caption = "Maintainenance")
IntervalDays | MaintainFreq |
---|---|
(-7)-0 | 13 |
0-7 | 92 |
7-14 | 59 |
14-292 | 70 |
We see that most of the tickets are resolved in more than 11 days.The ideal time should be decided by the organization but we have decided a time of 0-3 days as ideal time. We have observed that some tickets have been resolved even before the Create Date
g6<-ggplot(data = Change_Request_Form)+
geom_bar(aes(x=Change_Request_Form$`Responsible IT Area`,fill=Change_Request_Form$`Purpose Group`))+
theme(axis.text.x =element_text(angle = 45, vjust = 0.5 , hjust = 0.5))
plot(g6)
library(readr)
Onm<- read_csv("D:/UIC/MyResearch/Mwrd/mwrdAnalysis/Outage Notification Modified.csv",
col_types = cols(`Create Date` = col_datetime(format = "%m/%d/%Y %H:%M"),
`Outage End Date` = col_date(format = "%m/%d/%Y"),
`Outage Start Date` = col_date(format = "%m/%d/%Y")))
A look at the dataset
kable(Onm[1:5,],caption = "Outage Notification Form")
Create Date | Status Indicator | Outage Start Date | Outage End Date | Description of Outage | IT Area |
---|---|---|---|---|---|
2017-10-03 11:06:00 | Follow-up | 2017-10-02 | 2017-10-03 | Automatic interpretation and transfer to Verify, automatic transfer to Readsoft Process Director not working. | ERP / SAP |
2017-10-03 10:47:00 | Follow-up | 2017-09-28 | 2017-09-28 | End users called when none of their newly scanned invoices were getting into SAP. Windows logs show this error: |
Faulting application name: eiitrp.exe, version: 5.8.16168.110, time stamp: 0x5762601c Faulting module name: MSVCR120.dll, version: 12.0.21005.1, time stamp: 0x524f7ce6 Exception code: 0xc0000409 Fault offset: 0x000a7666 Faulting process id: 0x8544 Faulting application start time: 0x01d33889d465efdb Faulting application path: E:Files (x86).exe Faulting module path: C:32120.dll Report Id: 12edd495-a47d-11e7-80d4-005056843318 Faulting package full name: Faulting package-relative application ID:
This error would repeat every time the transfer app would run.
Determined a large document in image import might have initially caused the problem; this DLL might have been removed or altered in memory to prevent further damage and was not able to be re-intialized in this session of windows. ERP / SAP
2017-10-03 09:32:00 Reviewed 2017-09-30 2017-10-02 After removing index.html and index.htm from the 2 portal servers; MWRDNEPP1 and MWRDNEPP2, the District’s website to the outside was up and down depending on which server the user was connected to. ERP / SAP
2017-10-02 10:46:00 Reviewed 2017-10-02 2017-10-02 Between 8:48am and 9:52am users could not use the Local Sewers application. (LSS) Multi-Team
2017-09-29 16:03:00 Reviewed 2017-09-29 2017-09-29 During the AT&T DDoS readiness test, public access to mwrd.org was blocked during mitigation of the simulated attack. Network team was told by the mitigation team the blocking of access to the IP address had been cleared, which was not true. Network Infrastructure
How long it takes to close the ticket ?
library("knitr")
library("dplyr")
Onm<-mutate(.data = Onm,OnmDiffDays=difftime(time1 = Onm$`Outage End Date`,time2 = Onm$`Outage Start Date`,units = "days"))
summary(as.numeric(Onm$OnmDiffDays))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s 0.0000 0.0000 0.0000 0.7651 1.0000 30.0000 2
setNames(as.data.frame(table(as.numeric(Onm$OnmDiffDays))),c("Outage Lasted","Frequency"))
Outage Lasted Frequency 1 0 121 2 1 25 3 2 7 4 3 3 5 4 4 6 5 2 7 6 1 8 7 1 9 10 1 10 30 1
Which IT area the tickets belong to ?
g7<-ggplot(data = Onm)+
geom_bar(aes(Onm$`IT Area`,fill=Onm$`IT Area`))+
theme(axis.text.x = element_blank())
plot(g7)
Note that the echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.