Introduction - Timeout Trend Analysis

This analysis is conducted to find the trend on esign timeout on multiple store on each day. It is obvious that each store has some percentage of timeout every day. However if the timeout percentage goes beyond a huge value, that means there is something going on for that store. In this analysis, we sampled the esign timeout stats for 3 days (i.e 1-Jan-2019, 31-Dec-2018, 24-Dec-2018) ,calculate the difference in percentage- timeout of two dates and extract the upper bound confidence range.

Load the data

‘df_1_Jan’ dataframe loaded with 1st Jan 2019 Esign Timeout stat.

‘df_31_Dec’ dataframe loaded with 31 Dec 2018 Esign Timeout stat.

‘df_24_Dec’ dataframe loaded with 24th Dec 2018 Esign Timeout stat.

df_1_Jan <- read.delim("C:/MachineLearning/esign_Label_report/esign_stat/1_Jan_2019_5_00.csv", sep = ",", header = T)

df_31_Dec <- read.delim("C:/MachineLearning/esign_Label_report/esign_stat/31_Dec_2018_5_00.csv", sep = ",", header = T)

df_24_Dec <- read.delim("C:/MachineLearning/esign_Label_report/esign_stat/24_Dec_2018_5_00.csv", sep = ",", header = T)

head(df_24_Dec)
##    Store   Database_Name Total_Tag_Count Success In.Progress Timeout
## 1 MCY001 AIMS_CORE_DB_03            2207    1139           0    1068
## 2 MCY002 AIMS_CORE_DB_11            1438    1235           0     203
## 3 MCY003 AIMS_CORE_DB_20            4387    3455           6     926
## 4 MCY004 AIMS_CORE_DB_12            1640    1457           1     182
## 5 MCY005 AIMS_CORE_DB_12            1521    1352           0     169
## 6 MCY006 AIMS_CORE_DB_13            1851    1495           0     356
##   Timeout_per
## 1       48.39
## 2       14.12
## 3       21.11
## 4       11.10
## 5       11.11
## 6       19.23

Compaire the stat between 1st Jan and 24th Dec timeout stat

Lets join the 1st and 24th Dec stat and determine the difference between the time out %

joinDF_1_Jan_24_Dec <- inner_join(df_1_Jan, df_24_Dec, by = c("Store")) %>% mutate(diff = Timeout_per.x -  Timeout_per.y ) %>% select(Store, diff,Database_Name.x, Total_Tag_Count.x, Total_Tag_Count.y, Timeout_per.x,Timeout_per.y )
## Warning: Column `Store` joining factors with different levels, coercing to
## character vector

Plot a histogram of difference of timeout %. Histogram shows a high timeout over a period of one week. The skewed right hand tail in histogram shows there are a significant # of store that are having high timeout.

hist(joinDF_1_Jan_24_Dec$diff, breaks = 100)

filter the stores that below the 95% confidence interval(i.e with zvalue > 1.96 ). These are the store having sudden upnormal dip in the timeout percentage.

library(knitr)
library(kableExtra)

highestTimeOut_part1 <- joinDF_1_Jan_24_Dec %>% filter(diff > 1.96)  %>% arrange(desc(diff))

nrow(highestTimeOut_part1)
## [1] 45

There are around 45 such stores. Lets print it and see the each stat

highestTimeOut_part1 %>%
  kable() %>%
  kable_styling()
Store diff Database_Name.x Total_Tag_Count.x Total_Tag_Count.y Timeout_per.x Timeout_per.y
MCY746 14.96 AIMS_CORE_DB_03 234 234 16.24 1.28
MCY777 10.97 AIMS_CORE_DB_17 319 319 12.54 1.57
MCY701 9.13 AIMS_CORE_DB_04 1752 1674 16.84 7.71
MCY798 8.94 AIMS_CORE_DB_01 302 302 26.16 17.22
MCY173 8.55 AIMS_CORE_DB_18 2827 2964 34.70 26.15
MCY075 7.76 AIMS_CORE_DB_05 2579 2579 32.65 24.89
MCY042 6.60 AIMS_CORE_DB_08 91 91 13.19 6.59
MCY564 6.10 AIMS_CORE_DB_13 377 377 6.90 0.80
MCY677 5.61 AIMS_CORE_DB_02 915 861 12.46 6.85
MCY325 5.58 AIMS_CORE_DB_11 233 233 5.58 0.00
MCY511 5.31 AIMS_CORE_DB_11 1890 2167 10.16 4.85
MCY730 5.10 AIMS_CORE_DB_06 2599 2982 19.08 13.98
MCY806 4.46 AIMS_CORE_DB_02 305 298 10.16 5.70
MCY344 4.45 AIMS_CORE_DB_12 1461 1461 13.62 9.17
MCY837 4.21 AIMS_CORE_DB_14 166 166 6.02 1.81
MCY560 4.12 AIMS_CORE_DB_16 2430 2475 13.37 9.25
MCY788 4.07 AIMS_CORE_DB_05 123 123 29.27 25.20
MCY033 4.03 AIMS_CORE_DB_18 1438 1438 41.72 37.69
MCY023 3.86 AIMS_CORE_DB_17 1716 1922 17.54 13.68
MCY007 3.53 AIMS_CORE_DB_14 1972 1938 22.16 18.63
MCY052 3.37 AIMS_CORE_DB_19 829 1047 19.42 16.05
MCY015 3.16 AIMS_CORE_DB_15 2587 2586 10.28 7.12
MCY772 3.09 AIMS_CORE_DB_08 2134 2134 18.46 15.37
MCY402 2.98 AIMS_CORE_DB_05 235 235 17.45 14.47
MCY690 2.88 AIMS_CORE_DB_03 277 277 4.69 1.81
MCY370 2.82 AIMS_CORE_DB_14 1771 1771 14.85 12.03
MCY006 2.81 AIMS_CORE_DB_13 1851 1851 22.04 19.23
MCY039 2.77 AIMS_CORE_DB_13 1446 1446 8.16 5.39
MCY501 2.73 AIMS_CORE_DB_02 1480 1238 10.00 7.27
MCY002 2.71 AIMS_CORE_DB_11 1438 1438 16.83 14.12
MCY567 2.63 AIMS_CORE_DB_14 228 228 3.07 0.44
MCY174 2.53 AIMS_CORE_DB_01 1991 1992 44.30 41.77
MCY558 2.51 AIMS_CORE_DB_04 1157 1157 5.88 3.37
MCY720 2.51 AIMS_CORE_DB_09 239 239 8.37 5.86
MCY457 2.46 AIMS_CORE_DB_16 324 324 5.86 3.40
MCY053 2.46 AIMS_CORE_DB_04 2863 2860 18.65 16.19
MCY648 2.29 AIMS_CORE_DB_13 217 212 3.23 0.94
MCY497 2.27 AIMS_CORE_DB_05 88 88 3.41 1.14
MCY220 2.27 AIMS_CORE_DB_12 398 398 23.12 20.85
MCY395 2.19 AIMS_CORE_DB_05 320 320 6.88 4.69
MCY716 2.10 AIMS_CORE_DB_06 2400 2375 8.96 6.86
MCY317 2.07 AIMS_CORE_DB_06 290 290 15.17 13.10
MCY045 2.02 AIMS_CORE_DB_02 1366 1365 38.80 36.78
MCY192 2.00 AIMS_CORE_DB_08 249 249 10.84 8.84
MCY172 1.99 AIMS_CORE_DB_05 151 151 10.60 8.61

Lets take top 10 stores having high diff

highestTimeOut_part1 %>%  top_n(10, diff)  %>% kable() %>% kable_styling()
Store diff Database_Name.x Total_Tag_Count.x Total_Tag_Count.y Timeout_per.x Timeout_per.y
MCY746 14.96 AIMS_CORE_DB_03 234 234 16.24 1.28
MCY777 10.97 AIMS_CORE_DB_17 319 319 12.54 1.57
MCY701 9.13 AIMS_CORE_DB_04 1752 1674 16.84 7.71
MCY798 8.94 AIMS_CORE_DB_01 302 302 26.16 17.22
MCY173 8.55 AIMS_CORE_DB_18 2827 2964 34.70 26.15
MCY075 7.76 AIMS_CORE_DB_05 2579 2579 32.65 24.89
MCY042 6.60 AIMS_CORE_DB_08 91 91 13.19 6.59
MCY564 6.10 AIMS_CORE_DB_13 377 377 6.90 0.80
MCY677 5.61 AIMS_CORE_DB_02 915 861 12.46 6.85
MCY325 5.58 AIMS_CORE_DB_11 233 233 5.58 0.00

Lets group it with “SolumDB/Server” and see if these stores are mapped to a particular solumn server

groupby_part1 <- highestTimeOut_part1 %>% group_by(Database_Name.x) %>% summarise(count=n()) %>% arrange(desc(count))

groupby_part1 %>%
  kable() %>%
  kable_styling()
Database_Name.x count
AIMS_CORE_DB_05 6
AIMS_CORE_DB_02 4
AIMS_CORE_DB_13 4
AIMS_CORE_DB_14 4
AIMS_CORE_DB_04 3
AIMS_CORE_DB_06 3
AIMS_CORE_DB_08 3
AIMS_CORE_DB_11 3
AIMS_CORE_DB_01 2
AIMS_CORE_DB_03 2
AIMS_CORE_DB_12 2
AIMS_CORE_DB_16 2
AIMS_CORE_DB_17 2
AIMS_CORE_DB_18 2
AIMS_CORE_DB_09 1
AIMS_CORE_DB_15 1
AIMS_CORE_DB_19 1

Stores with high % timeout is not mapped a particular solum server and dont exhibit some pattern. Time-out % is just random. Lets find the store mapped to the Solum server(AIMS_CORE_DB_05) which is having highest # of timeout.*

highestTimeOut_part1 %>% filter(Database_Name.x == "AIMS_CORE_DB_05")  %>%
  kable() %>%
  kable_styling()
Store diff Database_Name.x Total_Tag_Count.x Total_Tag_Count.y Timeout_per.x Timeout_per.y
MCY075 7.76 AIMS_CORE_DB_05 2579 2579 32.65 24.89
MCY788 4.07 AIMS_CORE_DB_05 123 123 29.27 25.20
MCY402 2.98 AIMS_CORE_DB_05 235 235 17.45 14.47
MCY497 2.27 AIMS_CORE_DB_05 88 88 3.41 1.14
MCY395 2.19 AIMS_CORE_DB_05 320 320 6.88 4.69
MCY172 1.99 AIMS_CORE_DB_05 151 151 10.60 8.61

Lets Compare the stats on 1st Jan and 31st Dec. That is to compare with previous day and determine the trend

joinDF_1_Jan_31_Dec <- inner_join(df_1_Jan, df_31_Dec, by = c("Store")) %>% mutate(diff = Timeout_per.x -  Timeout_per.y ) %>% select(Store, diff,Database_Name.x, Total_Tag_Count.x, Total_Tag_Count.y, Timeout_per.x,Timeout_per.y )

hist(joinDF_1_Jan_31_Dec$diff, breaks = 100)

Histogram shows a comparatively very less timeout % over a period of one day. The distribution is normal and the right hand tail is very thin and also shows overall timeout % has reduced.

highestTimeOut_part2 <- joinDF_1_Jan_31_Dec %>% filter(diff > 1.96)  %>% arrange(desc(diff))
nrow(highestTimeOut_part2)
## [1] 1
highestTimeOut_part2  %>%
  kable() %>%
  kable_styling()
Store diff Database_Name.x Total_Tag_Count.x Total_Tag_Count.y Timeout_per.x Timeout_per.y
MCY052 2.19 AIMS_CORE_DB_19 829 952 19.42 17.23

There is only one store has highest time-out %. The difference is just 2.19.

Conclusion

This way, we can identify stores having a sudden increase in the timeout % for a particular store over a period of time. This can be because of multiple reasons.