This analysis is conducted to find the trend on esign timeout on multiple store on each day. It is obvious that each store has some percentage of timeout every day. However if the timeout percentage goes beyond a huge value, that means there is something going on for that store. In this analysis, we sampled the esign timeout stats for 3 days (i.e 1-Jan-2019, 31-Dec-2018, 24-Dec-2018) ,calculate the difference in percentage- timeout of two dates and extract the upper bound confidence range.
Load the data
‘df_1_Jan’ dataframe loaded with 1st Jan 2019 Esign Timeout stat.
‘df_31_Dec’ dataframe loaded with 31 Dec 2018 Esign Timeout stat.
‘df_24_Dec’ dataframe loaded with 24th Dec 2018 Esign Timeout stat.
df_1_Jan <- read.delim("C:/MachineLearning/esign_Label_report/esign_stat/1_Jan_2019_5_00.csv", sep = ",", header = T)
df_31_Dec <- read.delim("C:/MachineLearning/esign_Label_report/esign_stat/31_Dec_2018_5_00.csv", sep = ",", header = T)
df_24_Dec <- read.delim("C:/MachineLearning/esign_Label_report/esign_stat/24_Dec_2018_5_00.csv", sep = ",", header = T)
head(df_24_Dec)
## Store Database_Name Total_Tag_Count Success In.Progress Timeout
## 1 MCY001 AIMS_CORE_DB_03 2207 1139 0 1068
## 2 MCY002 AIMS_CORE_DB_11 1438 1235 0 203
## 3 MCY003 AIMS_CORE_DB_20 4387 3455 6 926
## 4 MCY004 AIMS_CORE_DB_12 1640 1457 1 182
## 5 MCY005 AIMS_CORE_DB_12 1521 1352 0 169
## 6 MCY006 AIMS_CORE_DB_13 1851 1495 0 356
## Timeout_per
## 1 48.39
## 2 14.12
## 3 21.11
## 4 11.10
## 5 11.11
## 6 19.23
Compaire the stat between 1st Jan and 24th Dec timeout stat
Lets join the 1st and 24th Dec stat and determine the difference between the time out %
joinDF_1_Jan_24_Dec <- inner_join(df_1_Jan, df_24_Dec, by = c("Store")) %>% mutate(diff = Timeout_per.x - Timeout_per.y ) %>% select(Store, diff,Database_Name.x, Total_Tag_Count.x, Total_Tag_Count.y, Timeout_per.x,Timeout_per.y )
## Warning: Column `Store` joining factors with different levels, coercing to
## character vector
Plot a histogram of difference of timeout %. Histogram shows a high timeout over a period of one week. The skewed right hand tail in histogram shows there are a significant # of store that are having high timeout.
hist(joinDF_1_Jan_24_Dec$diff, breaks = 100)
filter the stores that below the 95% confidence interval(i.e with zvalue > 1.96 ). These are the store having sudden upnormal dip in the timeout percentage.
library(knitr)
library(kableExtra)
highestTimeOut_part1 <- joinDF_1_Jan_24_Dec %>% filter(diff > 1.96) %>% arrange(desc(diff))
nrow(highestTimeOut_part1)
## [1] 45
There are around 45 such stores. Lets print it and see the each stat
highestTimeOut_part1 %>%
kable() %>%
kable_styling()
| Store | diff | Database_Name.x | Total_Tag_Count.x | Total_Tag_Count.y | Timeout_per.x | Timeout_per.y |
|---|---|---|---|---|---|---|
| MCY746 | 14.96 | AIMS_CORE_DB_03 | 234 | 234 | 16.24 | 1.28 |
| MCY777 | 10.97 | AIMS_CORE_DB_17 | 319 | 319 | 12.54 | 1.57 |
| MCY701 | 9.13 | AIMS_CORE_DB_04 | 1752 | 1674 | 16.84 | 7.71 |
| MCY798 | 8.94 | AIMS_CORE_DB_01 | 302 | 302 | 26.16 | 17.22 |
| MCY173 | 8.55 | AIMS_CORE_DB_18 | 2827 | 2964 | 34.70 | 26.15 |
| MCY075 | 7.76 | AIMS_CORE_DB_05 | 2579 | 2579 | 32.65 | 24.89 |
| MCY042 | 6.60 | AIMS_CORE_DB_08 | 91 | 91 | 13.19 | 6.59 |
| MCY564 | 6.10 | AIMS_CORE_DB_13 | 377 | 377 | 6.90 | 0.80 |
| MCY677 | 5.61 | AIMS_CORE_DB_02 | 915 | 861 | 12.46 | 6.85 |
| MCY325 | 5.58 | AIMS_CORE_DB_11 | 233 | 233 | 5.58 | 0.00 |
| MCY511 | 5.31 | AIMS_CORE_DB_11 | 1890 | 2167 | 10.16 | 4.85 |
| MCY730 | 5.10 | AIMS_CORE_DB_06 | 2599 | 2982 | 19.08 | 13.98 |
| MCY806 | 4.46 | AIMS_CORE_DB_02 | 305 | 298 | 10.16 | 5.70 |
| MCY344 | 4.45 | AIMS_CORE_DB_12 | 1461 | 1461 | 13.62 | 9.17 |
| MCY837 | 4.21 | AIMS_CORE_DB_14 | 166 | 166 | 6.02 | 1.81 |
| MCY560 | 4.12 | AIMS_CORE_DB_16 | 2430 | 2475 | 13.37 | 9.25 |
| MCY788 | 4.07 | AIMS_CORE_DB_05 | 123 | 123 | 29.27 | 25.20 |
| MCY033 | 4.03 | AIMS_CORE_DB_18 | 1438 | 1438 | 41.72 | 37.69 |
| MCY023 | 3.86 | AIMS_CORE_DB_17 | 1716 | 1922 | 17.54 | 13.68 |
| MCY007 | 3.53 | AIMS_CORE_DB_14 | 1972 | 1938 | 22.16 | 18.63 |
| MCY052 | 3.37 | AIMS_CORE_DB_19 | 829 | 1047 | 19.42 | 16.05 |
| MCY015 | 3.16 | AIMS_CORE_DB_15 | 2587 | 2586 | 10.28 | 7.12 |
| MCY772 | 3.09 | AIMS_CORE_DB_08 | 2134 | 2134 | 18.46 | 15.37 |
| MCY402 | 2.98 | AIMS_CORE_DB_05 | 235 | 235 | 17.45 | 14.47 |
| MCY690 | 2.88 | AIMS_CORE_DB_03 | 277 | 277 | 4.69 | 1.81 |
| MCY370 | 2.82 | AIMS_CORE_DB_14 | 1771 | 1771 | 14.85 | 12.03 |
| MCY006 | 2.81 | AIMS_CORE_DB_13 | 1851 | 1851 | 22.04 | 19.23 |
| MCY039 | 2.77 | AIMS_CORE_DB_13 | 1446 | 1446 | 8.16 | 5.39 |
| MCY501 | 2.73 | AIMS_CORE_DB_02 | 1480 | 1238 | 10.00 | 7.27 |
| MCY002 | 2.71 | AIMS_CORE_DB_11 | 1438 | 1438 | 16.83 | 14.12 |
| MCY567 | 2.63 | AIMS_CORE_DB_14 | 228 | 228 | 3.07 | 0.44 |
| MCY174 | 2.53 | AIMS_CORE_DB_01 | 1991 | 1992 | 44.30 | 41.77 |
| MCY558 | 2.51 | AIMS_CORE_DB_04 | 1157 | 1157 | 5.88 | 3.37 |
| MCY720 | 2.51 | AIMS_CORE_DB_09 | 239 | 239 | 8.37 | 5.86 |
| MCY457 | 2.46 | AIMS_CORE_DB_16 | 324 | 324 | 5.86 | 3.40 |
| MCY053 | 2.46 | AIMS_CORE_DB_04 | 2863 | 2860 | 18.65 | 16.19 |
| MCY648 | 2.29 | AIMS_CORE_DB_13 | 217 | 212 | 3.23 | 0.94 |
| MCY497 | 2.27 | AIMS_CORE_DB_05 | 88 | 88 | 3.41 | 1.14 |
| MCY220 | 2.27 | AIMS_CORE_DB_12 | 398 | 398 | 23.12 | 20.85 |
| MCY395 | 2.19 | AIMS_CORE_DB_05 | 320 | 320 | 6.88 | 4.69 |
| MCY716 | 2.10 | AIMS_CORE_DB_06 | 2400 | 2375 | 8.96 | 6.86 |
| MCY317 | 2.07 | AIMS_CORE_DB_06 | 290 | 290 | 15.17 | 13.10 |
| MCY045 | 2.02 | AIMS_CORE_DB_02 | 1366 | 1365 | 38.80 | 36.78 |
| MCY192 | 2.00 | AIMS_CORE_DB_08 | 249 | 249 | 10.84 | 8.84 |
| MCY172 | 1.99 | AIMS_CORE_DB_05 | 151 | 151 | 10.60 | 8.61 |
Lets take top 10 stores having high diff
highestTimeOut_part1 %>% top_n(10, diff) %>% kable() %>% kable_styling()
| Store | diff | Database_Name.x | Total_Tag_Count.x | Total_Tag_Count.y | Timeout_per.x | Timeout_per.y |
|---|---|---|---|---|---|---|
| MCY746 | 14.96 | AIMS_CORE_DB_03 | 234 | 234 | 16.24 | 1.28 |
| MCY777 | 10.97 | AIMS_CORE_DB_17 | 319 | 319 | 12.54 | 1.57 |
| MCY701 | 9.13 | AIMS_CORE_DB_04 | 1752 | 1674 | 16.84 | 7.71 |
| MCY798 | 8.94 | AIMS_CORE_DB_01 | 302 | 302 | 26.16 | 17.22 |
| MCY173 | 8.55 | AIMS_CORE_DB_18 | 2827 | 2964 | 34.70 | 26.15 |
| MCY075 | 7.76 | AIMS_CORE_DB_05 | 2579 | 2579 | 32.65 | 24.89 |
| MCY042 | 6.60 | AIMS_CORE_DB_08 | 91 | 91 | 13.19 | 6.59 |
| MCY564 | 6.10 | AIMS_CORE_DB_13 | 377 | 377 | 6.90 | 0.80 |
| MCY677 | 5.61 | AIMS_CORE_DB_02 | 915 | 861 | 12.46 | 6.85 |
| MCY325 | 5.58 | AIMS_CORE_DB_11 | 233 | 233 | 5.58 | 0.00 |
Lets group it with “SolumDB/Server” and see if these stores are mapped to a particular solumn server
groupby_part1 <- highestTimeOut_part1 %>% group_by(Database_Name.x) %>% summarise(count=n()) %>% arrange(desc(count))
groupby_part1 %>%
kable() %>%
kable_styling()
| Database_Name.x | count |
|---|---|
| AIMS_CORE_DB_05 | 6 |
| AIMS_CORE_DB_02 | 4 |
| AIMS_CORE_DB_13 | 4 |
| AIMS_CORE_DB_14 | 4 |
| AIMS_CORE_DB_04 | 3 |
| AIMS_CORE_DB_06 | 3 |
| AIMS_CORE_DB_08 | 3 |
| AIMS_CORE_DB_11 | 3 |
| AIMS_CORE_DB_01 | 2 |
| AIMS_CORE_DB_03 | 2 |
| AIMS_CORE_DB_12 | 2 |
| AIMS_CORE_DB_16 | 2 |
| AIMS_CORE_DB_17 | 2 |
| AIMS_CORE_DB_18 | 2 |
| AIMS_CORE_DB_09 | 1 |
| AIMS_CORE_DB_15 | 1 |
| AIMS_CORE_DB_19 | 1 |
Stores with high % timeout is not mapped a particular solum server and dont exhibit some pattern. Time-out % is just random. Lets find the store mapped to the Solum server(AIMS_CORE_DB_05) which is having highest # of timeout.*
highestTimeOut_part1 %>% filter(Database_Name.x == "AIMS_CORE_DB_05") %>%
kable() %>%
kable_styling()
| Store | diff | Database_Name.x | Total_Tag_Count.x | Total_Tag_Count.y | Timeout_per.x | Timeout_per.y |
|---|---|---|---|---|---|---|
| MCY075 | 7.76 | AIMS_CORE_DB_05 | 2579 | 2579 | 32.65 | 24.89 |
| MCY788 | 4.07 | AIMS_CORE_DB_05 | 123 | 123 | 29.27 | 25.20 |
| MCY402 | 2.98 | AIMS_CORE_DB_05 | 235 | 235 | 17.45 | 14.47 |
| MCY497 | 2.27 | AIMS_CORE_DB_05 | 88 | 88 | 3.41 | 1.14 |
| MCY395 | 2.19 | AIMS_CORE_DB_05 | 320 | 320 | 6.88 | 4.69 |
| MCY172 | 1.99 | AIMS_CORE_DB_05 | 151 | 151 | 10.60 | 8.61 |
Lets Compare the stats on 1st Jan and 31st Dec. That is to compare with previous day and determine the trend
joinDF_1_Jan_31_Dec <- inner_join(df_1_Jan, df_31_Dec, by = c("Store")) %>% mutate(diff = Timeout_per.x - Timeout_per.y ) %>% select(Store, diff,Database_Name.x, Total_Tag_Count.x, Total_Tag_Count.y, Timeout_per.x,Timeout_per.y )
hist(joinDF_1_Jan_31_Dec$diff, breaks = 100)
Histogram shows a comparatively very less timeout % over a period of one day. The distribution is normal and the right hand tail is very thin and also shows overall timeout % has reduced.
highestTimeOut_part2 <- joinDF_1_Jan_31_Dec %>% filter(diff > 1.96) %>% arrange(desc(diff))
nrow(highestTimeOut_part2)
## [1] 1
highestTimeOut_part2 %>%
kable() %>%
kable_styling()
| Store | diff | Database_Name.x | Total_Tag_Count.x | Total_Tag_Count.y | Timeout_per.x | Timeout_per.y |
|---|---|---|---|---|---|---|
| MCY052 | 2.19 | AIMS_CORE_DB_19 | 829 | 952 | 19.42 | 17.23 |
There is only one store has highest time-out %. The difference is just 2.19.
Conclusion
This way, we can identify stores having a sudden increase in the timeout % for a particular store over a period of time. This can be because of multiple reasons.