The data set had some minor modifications in order to the below presentation to be assembled.
When those modifications are complete, the dataset looks like this:
## Status Monitor Total_Time Status2
## 1 Unconfirmed Vedder 51121 7004
## 2 Unconfirmed Vedder 51121 7004
## 3 Unconfirmed WhiteUS 42644 7004
## 4 Unconfirmed WhiteUS 42644 7004
## 5 Unconfirmed Vedder 58819 7004
## 6 Unconfirmed Vedder 58819 7004
## 7 Unconfirmed IRIC01 99160 7005
## 8 Unconfirmed Vedder 43968 7004
## 9 Unconfirmed Vedder 43968 7004
## 10 Confirmed Vedder 58109 7004
## Description Step.Number
## 1 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 2 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 3 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 4 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 5 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 6 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 7 Step 9 (Open JE_Test_001): Element 'Test Fios,' not found. 9
## 8 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 9 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## 10 Step 11 (Save PDF): 'JE_Test_002' not found. 11
## Date Time Date_Time Day_of_Month Hour Day_of_Week
## 1 2019-04-30 19:02:00 4/30/2019 19:02 30 30 Tuesday
## 2 2019-04-30 19:02:00 4/30/2019 19:02 30 30 Tuesday
## 3 2019-04-30 19:07:00 4/30/2019 19:07 30 30 Tuesday
## 4 2019-04-30 19:07:00 4/30/2019 19:07 30 30 Tuesday
## 5 2019-04-30 19:11:00 4/30/2019 19:11 30 30 Tuesday
## 6 2019-04-30 19:11:00 4/30/2019 19:11 30 30 Tuesday
## 7 2019-04-30 19:17:00 4/30/2019 19:17 30 30 Tuesday
## 8 2019-04-30 19:26:00 4/30/2019 19:26 30 30 Tuesday
## 9 2019-04-30 19:26:00 4/30/2019 19:26 30 30 Tuesday
## 10 2019-04-30 19:28:00 4/30/2019 19:28 30 30 Tuesday
## Status Monitor Total_Time Status2
## Length:30459 Length:30459 Min. : 16 Min. :7001
## Class :character Class :character 1st Qu.: 8836 1st Qu.:7004
## Mode :character Mode :character Median : 44442 Median :7004
## Mean : 48389 Mean :7007
## 3rd Qu.: 83096 3rd Qu.:7009
## Max. :235047 Max. :7020
##
## Description Step.Number Date Time
## Length:30459 Min. : 1.000 Min. :2019-04-30 3:16:00: 44
## Class :character 1st Qu.: 5.000 1st Qu.:2019-05-07 4:08:00: 43
## Mode :character Median : 9.000 Median :2019-05-17 1:00:00: 40
## Mean : 7.564 Mean :2019-05-15 3:14:00: 39
## 3rd Qu.:10.000 3rd Qu.:2019-05-22 3:08:00: 38
## Max. :11.000 Max. :2019-05-30 1:25:00: 37
## (Other):30218
## Date_Time Day_of_Month Hour
## 5/12/2019 17:47: 15 Length:30459 Min. : 1.00
## 5/18/2019 12:05: 14 Class :character 1st Qu.: 8.00
## 5/18/2019 12:39: 14 Mode :character Median :17.00
## 5/18/2019 12:12: 13 Mean :15.19
## 5/18/2019 12:55: 13 3rd Qu.:22.00
## 5/5/2019 2:00 : 13 Max. :30.00
## (Other) :30377
## Day_of_Week
## Length:30459
## Class :character
## Mode :character
##
##
##
##
## [1] "The data in this report is from 2019-04-30 to 2019-05-30"
## [1] "This date range gives us 30459 of data and individual errors to analyze"
## [1] "Within this dataset, we have 37 unique environments"
Further exploration will happen in specific sections below in effort to add context.
We should take a deeper look into WhiteUS and potentially Vedder as they are greatly over the rest of the enterprise.
Further investigation into Stoelle and WDBA vs. WhiteUS and Vedder is warranted in order to understand the differences in the instances (such as user counts, hardware, or configurations) and if there is correlation between the differences and the number of errors.
Please note, the time of day is recorded in 24 hours and is referencing Greenwich Mean Time (GMT).
We have a spike in errors at 3:00 – 4:00 AM which tapers off until 9:00 AM
A second lull starts at 5:00 PM with the lowest number of errors at 8:00 PM before rising again at 9:00 PM
Saturday appears to be the day with the highest number of errors. There is a potential correlation between these errors and maintenance events.
There is a clear spike in errors on the 18th of the month and a slight ramp of errors at the beginning of the month.
The highest number of errors occur during Steps 5, 9, and 11.
The Histogram provides context to the length of time before the site errors. Utilizing the boxplot, allows us to identify environments whose range is outside the norm both positively and negatively.
## [1] "To understand the data below, the mean of the Total time in milliseconds is 48388.851176992 and the median of the same is 44442"
1. Vedder has a high concentration of outliers right at the 150,000ms mark with a very low plot. Combining this with the histogram leads for a strong correlation, between our highest errors time and this site
2. The concentration of outliers on RelateC is troubling, Further investigation will happen below.
3. WhiteUS has a small box plot with outliers in the extreme upper and lower of the range. This could suggest the site is failing consistently at the same average time. Further investigation could be required.
4. Cohan is one of our lower error generating sites, however, the median plot range is relatively high when compared to other sites. This could suggest the site is slow to respond but does not error.
WhiteUS is magnatudes higher than the the rest of the enterprise in terms of error counts. Breaking down the data gives us interesting insight into potential causes.
Unlike the global analysis of Step Number errors, WhiteUS has far more issues with Step 11 than with Steps 5 or 9. Additionally, the highest step errors (5, 9 and 11) are reversed from the global average.
The highest spikes of errors come between May 20th – 22nd. This is not in line with the global errors by day of the month which shows a spike on May 18th. The data does not give us any further details, however, the graph above does give context to the high errors in Step 11.
The outliers on RelateC seem to come from the same day, May 18th, as the high spike in errors for the month.
We have a lull in errors around 8:00 - 9:00 PM, this could potentially be a good time to push updates to the sites.
The analysis has shown that while the majority of sties fall within an acceptable range of each other, there are two instances with extremely high error counts, Vedder and WhiteUS. Further analysis of the two instances with low error counts, Stoelle and WDBA, as well as a comparsion of Stoelle and WDBA vs. WhiteUS and Vedder could provide needed insight into potential reasons for their respective error counts and how to improve performance.
There is a lull in errors around 8:00 PM, which may present a good time to push emergency patches if they are required. Additionally, if those patches could wait until the 5th through the 11th of the month, and preferably a Monday or Tuesday, they will potentially cause less disruptions.
The extreme concentration of errors in Steps 5, 9, and 11 is troubling and requires investigation into the possible causes. A full RCA should go into the cause of the spike in errors on the 18th. Additionally, a RCA could be asked for in the case of WhiteUS on the spikes in errors between May 20th – 22nd.