Trigger Analyze
ID, Phone and User
ID, Phone and User
Project of IT Fizo Team
1 Data Wrangling
1.1 Import Library
1.2 Import Data
1.3 Insecting data
- A Brief overview of the data set
## Rows: 4,612
## Columns: 96
## $ vnpostUserName <chr> "dungnt07", "59.0057", "anhpdt", "22.006…
## $ vnpostname <chr> "Nguyễn Thị Dung", "MAI THỊ XUÂN HƯỜNG",…
## $ vnpostprovincename <chr> "Bưu điện TP Hà Nội", "Bưu điện Tỉnh Bìn…
## $ vnpostdistrictname <chr> "BĐH Chương Mỹ", "BĐH Tuy Phước", "BĐH N…
## $ vnpostorganizationname <chr> "Chương Mỹ", "VHX Phước Sơn", "VHX Hiệp …
## $ Ekyc <chr> "Đã OTP thành công", "Đã OTP thành công"…
## $ userGroup <dbl> 2, 2, 2, 2, 2, 6, 2, 5, 5, 2, 2, 2, 2, 1…
## $ flowGroup <dbl> 1, 1, 1, 1, 1, 0, 1, 3, 3, 1, 1, 1, 1, 1…
## $ Status <chr> NA, NA, NA, NA, NA, "Lock", NA, NA, NA, …
## $ Total_app <dbl> 26, 18, 18, 18, 17, 14, 13, 13, 13, 13, …
## $ Total_app7 <dbl> 44, 29, 62, 31, 46, 51, 22, 67, 31, 47, …
## $ Total_app30 <dbl> 198, 67, 155, 31, 49, 51, 55, 262, 138, …
## $ Total_app_cancel <dbl> 5, 0, 1, 2, 6, 0, 5, 3, 3, 1, 0, 1, 2, 1…
## $ Total_app_cancel_7Day <dbl> 7, 3, 8, 7, 14, 0, 5, 17, 4, 4, 3, 1, 2,…
## $ Total_app_cancel_30Day <dbl> 32, 6, 25, 7, 15, 0, 15, 60, 37, 12, 15,…
## $ cancelRatio <dbl> 0.19230769, 0.00000000, 0.05555556, 0.11…
## $ cancelRatio7DDay <dbl> 0.15909091, 0.10344828, 0.12903226, 0.22…
## $ cancelRatio30Day <dbl> 0.16161616, 0.08955224, 0.16129032, 0.22…
## $ fpd30_his_TT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ fpd30_base_TT <dbl> 0, 0, 36, 0, 0, 0, 4, 0, 12, 0, 0, 0, 2,…
## $ fpd30_TT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Total_app_reject <dbl> 19, 11, 11, 14, 10, 11, 4, 7, 9, 10, 10,…
## $ Total_app_reject_7Day <dbl> 30, 18, 38, 21, 26, 47, 9, 38, 23, 32, 1…
## $ Total_app_reject_30Day <dbl> 133, 34, 93, 21, 27, 47, 26, 132, 69, 49…
## $ rejectRatio <dbl> 0.7307692, 0.6111111, 0.6111111, 0.77777…
## $ rejectRatio7day <dbl> 0.6818182, 0.6206897, 0.6129032, 0.67741…
## $ rejectRatio30Day <dbl> 0.6717172, 0.5074627, 0.6000000, 0.67741…
## $ Total_app_approve <dbl> 1, 1, 4, 0, 1, 2, 0, 0, 0, 1, 1, 0, 0, 0…
## $ Total_app_approve_7Day <dbl> 6, 2, 14, 1, 5, 3, 4, 9, 3, 10, 3, 2, 1,…
## $ Total_app_approve_30Day <dbl> 32, 21, 35, 1, 6, 3, 10, 67, 31, 18, 12,…
## $ approveRatio <dbl> 0.03846154, 0.05555556, 0.22222222, 0.00…
## $ approveRatio7day <dbl> 0.13636364, 0.06896552, 0.22580645, 0.03…
## $ approveRatio30Day <dbl> 0.16161616, 0.31343284, 0.22580645, 0.03…
## $ Total_app_disbursed <dbl> 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0…
## $ Total_app_disbursed_7Day <dbl> 3, 1, 12, 1, 3, 1, 4, 9, 3, 9, 3, 2, 1, …
## $ Total_app_disbursed_30Day <dbl> 29, 20, 33, 1, 4, 1, 10, 67, 31, 17, 12,…
## $ disbursedRatio <dbl> 0.00000000, 0.00000000, 0.11111111, 0.00…
## $ disbursedRatio7day <dbl> 0.06818182, 0.03448276, 0.19354839, 0.03…
## $ disbursedRatio30Day <dbl> 0.14646465, 0.29850746, 0.21290323, 0.03…
## $ outsideApp <dbl> 5, 0, 0, 0, 17, 1, 0, 0, 0, 0, 0, 2, 0, …
## $ outsideApp7Day <dbl> 6, 1, 1, 0, 46, 4, 0, 1, 0, 0, 0, 2, 0, …
## $ outsideApp30Day <dbl> 11, 1, 3, 0, 46, 4, 0, 3, 4, 0, 0, 15, 2…
## $ outside_ratio <dbl> 0.19230769, 0.00000000, 0.00000000, 0.00…
## $ outsideRatio7Day <dbl> 0.13636364, 0.03448276, 0.01612903, 0.00…
## $ outsideRatio30Day <dbl> 0.05555556, 0.01492537, 0.01935484, 0.00…
## $ outSidePro <dbl> 4, 0, 0, 0, 9, 1, 0, 0, 0, 0, 0, 2, 0, 1…
## $ outSidePro7Day <dbl> 5, 1, 1, 0, 16, 3, 0, 1, 0, 0, 0, 2, 0, …
## $ outSidePro30Day <dbl> 6, 1, 1, 0, 16, 3, 0, 3, 3, 0, 0, 8, 1, …
## $ upLoadRatio <dbl> 0.1153846, 1.0000000, 0.2777778, 0.11111…
## $ upLoadRatio7Day <dbl> 0.18181818, 0.65517241, 0.17204301, 0.06…
## $ upLoadRatio30Day <dbl> 0.60942761, 0.28358209, 0.39139785, 0.06…
## $ IDHitApp <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ IDHitApp7Day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0…
## $ IDHitApp30Day <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 1, 3, 1, 0, 0, 0…
## $ IDHitAppRatio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ IDHitAppRatio7Day <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00…
## $ IDHitAppRatio30Day <dbl> 0.000000000, 0.000000000, 0.000000000, 0…
## $ phoneHitApp <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ phoneHitApp7Day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ phoneHitApp30Day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0…
## $ phoneHitAppRatio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ phoneHitAppRatio7Day <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00…
## $ phoneHitAppRatio30Day <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00…
## $ userHitApp <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ userHitApp7Day <dbl> 0, 0, 2, 1, 11, 2, 1, 2, 2, 5, 6, 1, 4, …
## $ userHitApp30Day <dbl> 9, 0, 3, 1, 12, 2, 3, 7, 13, 6, 10, 6, 4…
## $ userHitAppRatio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ userHitAppRatio7Day <dbl> 0.00000000, 0.00000000, 0.03225806, 0.03…
## $ userHitAppRatio30Day <dbl> 0.04545455, 0.00000000, 0.01935484, 0.03…
## $ Total_locked <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ Total_device <dbl> 9, 2, 2, 3, 3, 2, 1, 9, 1, 3, 2, 8, 2, 2…
## $ oldReferenceTrigger <dbl> 0, 1, 0, 1, 0, 0, 2, 0, 1, 5, 5, 1, 1, 0…
## $ oldReferenceTrigger7Day <dbl> 0, 1, 3, 1, 1, 0, 6, 2, 13, 18, 11, 1, 2…
## $ oldReferenceTrigger30Day <dbl> 4, 7, 5, 1, 1, 0, 14, 28, 56, 19, 21, 3,…
## $ oldReferenceRatio <dbl> 0.00000000, 0.05555556, 0.00000000, 0.05…
## $ oldReferenceRatio7Day <dbl> 0.00000000, 0.03448276, 0.04838710, 0.03…
## $ oldReferenceRatio30Day <dbl> 0.02020202, 0.10447761, 0.03225806, 0.03…
## $ groupFpd30TT <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ groupHitOutside <chr> "Hit", "NoneHit", "NoneHit", "NoneHit", …
## $ groupHitOutside7Day <chr> "Hit", "Hit", "Hit", "NoneHit", "Hit", "…
## $ groupHitOutside30Day <chr> "Hit", "Hit", "Hit", "NoneHit", "Hit", "…
## $ groupHitUpLoadRatio <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "None…
## $ groupHitUpLoadRatio7Day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "Hit"…
## $ groupHitUpLoadRatio30Day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "Hit"…
## $ groupHitOldReferenceRatio <chr> "NoneHit", "Hit", "NoneHit", "Hit", "Non…
## $ groupHitOldReferenceRatio7Day <chr> "NoneHit", "Hit", "Hit", "Hit", "Hit", "…
## $ groupHitOldReference30Day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "None…
## $ groupHitIDApp <chr> "NoneHit", "NoneHit", "NoneHit", "NoneHi…
## $ groupHitIDApp7Day <chr> "NoneHit", "NoneHit", "NoneHit", "NoneHi…
## $ groupHitIDApp30Day <chr> "NoneHit", "NoneHit", "NoneHit", "NoneHi…
## $ groupHitPhoneApp <chr> "NoneHit", "NoneHit", "NoneHit", "NoneHi…
## $ groupHitPhoneApp7Day <chr> "NoneHit", "NoneHit", "NoneHit", "NoneHi…
## $ groupHitPhoneApp30Day <chr> "NoneHit", "NoneHit", "NoneHit", "NoneHi…
## $ groupHitUserApp <chr> "NoneHit", "NoneHit", "NoneHit", "NoneHi…
## $ groupHitUserApp7Day <chr> "NoneHit", "NoneHit", "Hit", "Hit", "Hit…
## $ groupHitUserApp30Day <chr> "Hit", "NoneHit", "Hit", "Hit", "Hit", "…
Name | TriggerOverview |
Number of rows | 4612 |
Number of columns | 96 |
_______________________ | |
Column type frequency: | |
character | 26 |
numeric | 70 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
vnpostUserName | 0 | 1.00 | 4 | 15 | 0 | 4612 | 0 |
vnpostname | 0 | 1.00 | 4 | 28 | 0 | 4281 | 0 |
vnpostprovincename | 0 | 1.00 | 12 | 29 | 0 | 64 | 0 |
vnpostdistrictname | 0 | 1.00 | 6 | 30 | 0 | 612 | 0 |
vnpostorganizationname | 0 | 1.00 | 3 | 41 | 0 | 2844 | 0 |
Ekyc | 181 | 0.96 | 8 | 21 | 0 | 4 | 0 |
Status | 4060 | 0.12 | 4 | 4 | 0 | 1 | 0 |
groupFpd30TT | 0 | 1.00 | 1 | 10 | 0 | 5 | 0 |
groupHitOutside | 3676 | 0.20 | 3 | 7 | 0 | 2 | 0 |
groupHitOutside7Day | 2107 | 0.54 | 3 | 7 | 0 | 2 | 0 |
groupHitOutside30Day | 758 | 0.84 | 3 | 7 | 0 | 2 | 0 |
groupHitUpLoadRatio | 3676 | 0.20 | 3 | 7 | 0 | 2 | 0 |
groupHitUpLoadRatio7Day | 2107 | 0.54 | 3 | 7 | 0 | 2 | 0 |
groupHitUpLoadRatio30Day | 758 | 0.84 | 3 | 7 | 0 | 2 | 0 |
groupHitOldReferenceRatio | 3676 | 0.20 | 3 | 7 | 0 | 2 | 0 |
groupHitOldReferenceRatio7Day | 2107 | 0.54 | 3 | 7 | 0 | 2 | 0 |
groupHitOldReference30Day | 758 | 0.84 | 3 | 7 | 0 | 2 | 0 |
groupHitIDApp | 3676 | 0.20 | 7 | 7 | 0 | 1 | 0 |
groupHitIDApp7Day | 2107 | 0.54 | 3 | 7 | 0 | 2 | 0 |
groupHitIDApp30Day | 758 | 0.84 | 3 | 7 | 0 | 2 | 0 |
groupHitPhoneApp | 3676 | 0.20 | 7 | 7 | 0 | 1 | 0 |
groupHitPhoneApp7Day | 2107 | 0.54 | 3 | 7 | 0 | 2 | 0 |
groupHitPhoneApp30Day | 758 | 0.84 | 3 | 7 | 0 | 2 | 0 |
groupHitUserApp | 3676 | 0.20 | 7 | 7 | 0 | 1 | 0 |
groupHitUserApp7Day | 2107 | 0.54 | 3 | 7 | 0 | 2 | 0 |
groupHitUserApp30Day | 758 | 0.84 | 3 | 7 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
userGroup | 0 | 1.00 | 2.89 | 1.54 | 1 | 2.0 | 2.00 | 5.00 | 7 | ▇▁▁▂▁ |
flowGroup | 0 | 1.00 | 1.25 | 0.84 | 0 | 1.0 | 1.00 | 1.00 | 3 | ▁▇▁▁▂ |
Total_app | 0 | 1.00 | 0.47 | 1.41 | 0 | 0.0 | 0.00 | 0.00 | 26 | ▇▁▁▁▁ |
Total_app7 | 0 | 1.00 | 3.16 | 6.18 | 0 | 0.0 | 1.00 | 4.00 | 67 | ▇▁▁▁▁ |
Total_app30 | 0 | 1.00 | 10.60 | 20.20 | 0 | 1.0 | 4.00 | 12.00 | 351 | ▇▁▁▁▁ |
Total_app_cancel | 0 | 1.00 | 0.06 | 0.33 | 0 | 0.0 | 0.00 | 0.00 | 6 | ▇▁▁▁▁ |
Total_app_cancel_7Day | 0 | 1.00 | 0.54 | 1.36 | 0 | 0.0 | 0.00 | 0.00 | 20 | ▇▁▁▁▁ |
Total_app_cancel_30Day | 0 | 1.00 | 2.18 | 4.42 | 0 | 0.0 | 1.00 | 3.00 | 85 | ▇▁▁▁▁ |
cancelRatio | 3676 | 0.20 | 0.10 | 0.24 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
cancelRatio7DDay | 2107 | 0.54 | 0.15 | 0.23 | 0 | 0.0 | 0.00 | 0.25 | 1 | ▇▂▁▁▁ |
cancelRatio30Day | 758 | 0.84 | 0.21 | 0.24 | 0 | 0.0 | 0.16 | 0.33 | 1 | ▇▃▂▁▁ |
fpd30_his_TT | 0 | 1.00 | 0.12 | 1.18 | 0 | 0.0 | 0.00 | 0.00 | 54 | ▇▁▁▁▁ |
fpd30_base_TT | 0 | 1.00 | 2.39 | 7.62 | 0 | 0.0 | 0.00 | 2.00 | 206 | ▇▁▁▁▁ |
fpd30_TT | 0 | 1.00 | 0.01 | 0.09 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
Total_app_reject | 0 | 1.00 | 0.28 | 0.95 | 0 | 0.0 | 0.00 | 0.00 | 19 | ▇▁▁▁▁ |
Total_app_reject_7Day | 0 | 1.00 | 1.92 | 4.09 | 0 | 0.0 | 0.00 | 2.00 | 47 | ▇▁▁▁▁ |
Total_app_reject_30Day | 0 | 1.00 | 6.06 | 12.54 | 0 | 1.0 | 2.00 | 7.00 | 252 | ▇▁▁▁▁ |
rejectRatio | 3676 | 0.20 | 0.59 | 0.41 | 0 | 0.0 | 0.67 | 1.00 | 1 | ▅▁▃▂▇ |
rejectRatio7day | 2107 | 0.54 | 0.62 | 0.34 | 0 | 0.4 | 0.67 | 1.00 | 1 | ▃▂▅▅▇ |
rejectRatio30Day | 758 | 0.84 | 0.58 | 0.30 | 0 | 0.4 | 0.57 | 0.80 | 1 | ▃▃▇▆▆ |
Total_app_approve | 0 | 1.00 | 0.06 | 0.26 | 0 | 0.0 | 0.00 | 0.00 | 4 | ▇▁▁▁▁ |
Total_app_approve_7Day | 0 | 1.00 | 0.62 | 1.41 | 0 | 0.0 | 0.00 | 1.00 | 15 | ▇▁▁▁▁ |
Total_app_approve_30Day | 0 | 1.00 | 2.28 | 4.54 | 0 | 0.0 | 1.00 | 3.00 | 72 | ▇▁▁▁▁ |
approveRatio | 3676 | 0.20 | 0.14 | 0.30 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
approveRatio7day | 2107 | 0.54 | 0.20 | 0.27 | 0 | 0.0 | 0.10 | 0.33 | 1 | ▇▂▁▁▁ |
approveRatio30Day | 758 | 0.84 | 0.20 | 0.22 | 0 | 0.0 | 0.17 | 0.31 | 1 | ▇▅▁▁▁ |
Total_app_disbursed | 0 | 1.00 | 0.03 | 0.18 | 0 | 0.0 | 0.00 | 0.00 | 2 | ▇▁▁▁▁ |
Total_app_disbursed_7Day | 0 | 1.00 | 0.56 | 1.31 | 0 | 0.0 | 0.00 | 1.00 | 15 | ▇▁▁▁▁ |
Total_app_disbursed_30Day | 0 | 1.00 | 2.22 | 4.47 | 0 | 0.0 | 1.00 | 2.00 | 71 | ▇▁▁▁▁ |
disbursedRatio | 3676 | 0.20 | 0.08 | 0.23 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
disbursedRatio7day | 2107 | 0.54 | 0.18 | 0.25 | 0 | 0.0 | 0.00 | 0.27 | 1 | ▇▂▁▁▁ |
disbursedRatio30Day | 758 | 0.84 | 0.20 | 0.22 | 0 | 0.0 | 0.17 | 0.30 | 1 | ▇▃▁▁▁ |
outsideApp | 0 | 1.00 | 0.04 | 0.42 | 0 | 0.0 | 0.00 | 0.00 | 17 | ▇▁▁▁▁ |
outsideApp7Day | 0 | 1.00 | 0.29 | 1.73 | 0 | 0.0 | 0.00 | 0.00 | 46 | ▇▁▁▁▁ |
outsideApp30Day | 0 | 1.00 | 1.08 | 5.01 | 0 | 0.0 | 0.00 | 0.00 | 130 | ▇▁▁▁▁ |
outside_ratio | 3676 | 0.20 | 0.07 | 0.23 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
outsideRatio7Day | 2107 | 0.54 | 0.07 | 0.20 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
outsideRatio30Day | 758 | 0.84 | 0.08 | 0.20 | 0 | 0.0 | 0.00 | 0.03 | 1 | ▇▁▁▁▁ |
outSidePro | 0 | 1.00 | 0.03 | 0.25 | 0 | 0.0 | 0.00 | 0.00 | 9 | ▇▁▁▁▁ |
outSidePro7Day | 0 | 1.00 | 0.15 | 0.67 | 0 | 0.0 | 0.00 | 0.00 | 16 | ▇▁▁▁▁ |
outSidePro30Day | 0 | 1.00 | 0.49 | 1.52 | 0 | 0.0 | 0.00 | 0.00 | 39 | ▇▁▁▁▁ |
upLoadRatio | 3676 | 0.20 | 0.22 | 0.38 | 0 | 0.0 | 0.00 | 0.33 | 1 | ▇▁▁▁▂ |
upLoadRatio7Day | 2107 | 0.54 | 0.29 | 0.38 | 0 | 0.0 | 0.00 | 0.58 | 1 | ▇▁▁▁▂ |
upLoadRatio30Day | 758 | 0.84 | 0.39 | 0.39 | 0 | 0.0 | 0.32 | 0.75 | 1 | ▇▂▂▂▅ |
IDHitApp | 0 | 1.00 | 0.00 | 0.00 | 0 | 0.0 | 0.00 | 0.00 | 0 | ▁▁▇▁▁ |
IDHitApp7Day | 0 | 1.00 | 0.03 | 0.21 | 0 | 0.0 | 0.00 | 0.00 | 7 | ▇▁▁▁▁ |
IDHitApp30Day | 0 | 1.00 | 0.10 | 0.51 | 0 | 0.0 | 0.00 | 0.00 | 16 | ▇▁▁▁▁ |
IDHitAppRatio | 3676 | 0.20 | 0.00 | 0.00 | 0 | 0.0 | 0.00 | 0.00 | 0 | ▁▁▇▁▁ |
IDHitAppRatio7Day | 2107 | 0.54 | 0.01 | 0.06 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
IDHitAppRatio30Day | 758 | 0.84 | 0.01 | 0.05 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
phoneHitApp | 0 | 1.00 | 0.00 | 0.00 | 0 | 0.0 | 0.00 | 0.00 | 0 | ▁▁▇▁▁ |
phoneHitApp7Day | 0 | 1.00 | 0.01 | 0.13 | 0 | 0.0 | 0.00 | 0.00 | 3 | ▇▁▁▁▁ |
phoneHitApp30Day | 0 | 1.00 | 0.05 | 0.30 | 0 | 0.0 | 0.00 | 0.00 | 8 | ▇▁▁▁▁ |
phoneHitAppRatio | 3676 | 0.20 | 0.00 | 0.00 | 0 | 0.0 | 0.00 | 0.00 | 0 | ▁▁▇▁▁ |
phoneHitAppRatio7Day | 2107 | 0.54 | 0.00 | 0.05 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
phoneHitAppRatio30Day | 758 | 0.84 | 0.01 | 0.05 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
userHitApp | 0 | 1.00 | 0.00 | 0.00 | 0 | 0.0 | 0.00 | 0.00 | 0 | ▁▁▇▁▁ |
userHitApp7Day | 0 | 1.00 | 0.31 | 1.05 | 0 | 0.0 | 0.00 | 0.00 | 19 | ▇▁▁▁▁ |
userHitApp30Day | 0 | 1.00 | 1.03 | 2.51 | 0 | 0.0 | 0.00 | 1.00 | 42 | ▇▁▁▁▁ |
userHitAppRatio | 3676 | 0.20 | 0.00 | 0.00 | 0 | 0.0 | 0.00 | 0.00 | 0 | ▁▁▇▁▁ |
userHitAppRatio7Day | 2107 | 0.54 | 0.09 | 0.20 | 0 | 0.0 | 0.00 | 0.07 | 1 | ▇▁▁▁▁ |
userHitAppRatio30Day | 758 | 0.84 | 0.09 | 0.18 | 0 | 0.0 | 0.00 | 0.12 | 1 | ▇▁▁▁▁ |
Total_locked | 0 | 1.00 | 0.37 | 0.48 | 0 | 0.0 | 0.00 | 1.00 | 1 | ▇▁▁▁▅ |
Total_device | 0 | 1.00 | 1.46 | 0.93 | 0 | 1.0 | 1.00 | 2.00 | 13 | ▇▁▁▁▁ |
oldReferenceTrigger | 0 | 1.00 | 0.06 | 0.34 | 0 | 0.0 | 0.00 | 0.00 | 6 | ▇▁▁▁▁ |
oldReferenceTrigger7Day | 0 | 1.00 | 0.43 | 1.24 | 0 | 0.0 | 0.00 | 0.00 | 18 | ▇▁▁▁▁ |
oldReferenceTrigger30Day | 0 | 1.00 | 1.54 | 3.59 | 0 | 0.0 | 0.00 | 2.00 | 62 | ▇▁▁▁▁ |
oldReferenceRatio | 3676 | 0.20 | 0.14 | 0.31 | 0 | 0.0 | 0.00 | 0.00 | 1 | ▇▁▁▁▁ |
oldReferenceRatio7Day | 2107 | 0.54 | 0.14 | 0.25 | 0 | 0.0 | 0.00 | 0.18 | 1 | ▇▁▁▁▁ |
oldReferenceRatio30Day | 758 | 0.84 | 0.14 | 0.23 | 0 | 0.0 | 0.00 | 0.20 | 1 | ▇▂▁▁▁ |
1.4 Cleaning column names: Call the janitor
1.5 Data types:
In R we have a slightly different distinction:
* character / <chr>: Textual data, for example the text of a tweet.
* factor / <fct>: Categorical data with a finite number of categories with no particular order.
* ordered / <ord>: Categorical data with a finite number of categories with a particular order.
* double / <dbl>: Numerical data with decimal places.
* integer / <int>: Numerical data with whole numbers only (i.e. no decimals).
* logical / <lgl>: Logical data, which only consists of values TRUE and FALSE.
* date / date: Data which consists of dates, e.g. 2021-08-05.
* date-time / dttm: Data which consists of dates and times, e.g. 2021-08-05 16:29:25 BST
We want to convert all variables in one go, we can put into a same function
## Rows: 4,612
## Columns: 96
## $ vnpost_user_name <chr> "dungnt07", "59.0057", "anhpdt", "22…
## $ vnpostname <chr> "Nguyễn Thị Dung", "MAI THỊ XUÂN HƯỜ…
## $ vnpostprovincename <fct> Bưu điện TP Hà Nội, Bưu điện Tỉnh Bì…
## $ vnpostdistrictname <fct> BĐH Chương Mỹ, BĐH Tuy Phước, BĐH Nh…
## $ vnpostorganizationname <fct> "Chương Mỹ", "VHX Phước Sơn", "VHX H…
## $ ekyc <chr> "Đã OTP thành công", "Đã OTP thành c…
## $ user_group <dbl> 2, 2, 2, 2, 2, 6, 2, 5, 5, 2, 2, 2, …
## $ flow_group <dbl> 1, 1, 1, 1, 1, 0, 1, 3, 3, 1, 1, 1, …
## $ status <chr> NA, NA, NA, NA, NA, "Lock", NA, NA, …
## $ total_app <dbl> 26, 18, 18, 18, 17, 14, 13, 13, 13, …
## $ total_app7 <dbl> 44, 29, 62, 31, 46, 51, 22, 67, 31, …
## $ total_app30 <dbl> 198, 67, 155, 31, 49, 51, 55, 262, 1…
## $ total_app_cancel <dbl> 5, 0, 1, 2, 6, 0, 5, 3, 3, 1, 0, 1, …
## $ total_app_cancel_7day <dbl> 7, 3, 8, 7, 14, 0, 5, 17, 4, 4, 3, 1…
## $ total_app_cancel_30day <dbl> 32, 6, 25, 7, 15, 0, 15, 60, 37, 12,…
## $ cancel_ratio <dbl> 0.19230769, 0.00000000, 0.05555556, …
## $ cancel_ratio7d_day <dbl> 0.15909091, 0.10344828, 0.12903226, …
## $ cancel_ratio30day <dbl> 0.16161616, 0.08955224, 0.16129032, …
## $ fpd30_his_tt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ fpd30_base_tt <dbl> 0, 0, 36, 0, 0, 0, 4, 0, 12, 0, 0, 0…
## $ fpd30_tt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total_app_reject <dbl> 19, 11, 11, 14, 10, 11, 4, 7, 9, 10,…
## $ total_app_reject_7day <dbl> 30, 18, 38, 21, 26, 47, 9, 38, 23, 3…
## $ total_app_reject_30day <dbl> 133, 34, 93, 21, 27, 47, 26, 132, 69…
## $ reject_ratio <dbl> 0.7307692, 0.6111111, 0.6111111, 0.7…
## $ reject_ratio7day <dbl> 0.6818182, 0.6206897, 0.6129032, 0.6…
## $ reject_ratio30day <dbl> 0.6717172, 0.5074627, 0.6000000, 0.6…
## $ total_app_approve <dbl> 1, 1, 4, 0, 1, 2, 0, 0, 0, 1, 1, 0, …
## $ total_app_approve_7day <dbl> 6, 2, 14, 1, 5, 3, 4, 9, 3, 10, 3, 2…
## $ total_app_approve_30day <dbl> 32, 21, 35, 1, 6, 3, 10, 67, 31, 18,…
## $ approve_ratio <dbl> 0.03846154, 0.05555556, 0.22222222, …
## $ approve_ratio7day <dbl> 0.13636364, 0.06896552, 0.22580645, …
## $ approve_ratio30day <dbl> 0.16161616, 0.31343284, 0.22580645, …
## $ total_app_disbursed <dbl> 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 1, 0, …
## $ total_app_disbursed_7day <dbl> 3, 1, 12, 1, 3, 1, 4, 9, 3, 9, 3, 2,…
## $ total_app_disbursed_30day <dbl> 29, 20, 33, 1, 4, 1, 10, 67, 31, 17,…
## $ disbursed_ratio <dbl> 0.00000000, 0.00000000, 0.11111111, …
## $ disbursed_ratio7day <dbl> 0.06818182, 0.03448276, 0.19354839, …
## $ disbursed_ratio30day <dbl> 0.14646465, 0.29850746, 0.21290323, …
## $ outside_app <dbl> 5, 0, 0, 0, 17, 1, 0, 0, 0, 0, 0, 2,…
## $ outside_app7day <dbl> 6, 1, 1, 0, 46, 4, 0, 1, 0, 0, 0, 2,…
## $ outside_app30day <dbl> 11, 1, 3, 0, 46, 4, 0, 3, 4, 0, 0, 1…
## $ outside_ratio <dbl> 0.19230769, 0.00000000, 0.00000000, …
## $ outside_ratio7day <dbl> 0.13636364, 0.03448276, 0.01612903, …
## $ outside_ratio30day <dbl> 0.05555556, 0.01492537, 0.01935484, …
## $ out_side_pro <dbl> 4, 0, 0, 0, 9, 1, 0, 0, 0, 0, 0, 2, …
## $ out_side_pro7day <dbl> 5, 1, 1, 0, 16, 3, 0, 1, 0, 0, 0, 2,…
## $ out_side_pro30day <dbl> 6, 1, 1, 0, 16, 3, 0, 3, 3, 0, 0, 8,…
## $ up_load_ratio <dbl> 0.1153846, 1.0000000, 0.2777778, 0.1…
## $ up_load_ratio7day <dbl> 0.18181818, 0.65517241, 0.17204301, …
## $ up_load_ratio30day <dbl> 0.60942761, 0.28358209, 0.39139785, …
## $ id_hit_app <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ id_hit_app7day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, …
## $ id_hit_app30day <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 1, 3, 1, 0, …
## $ id_hit_app_ratio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ id_hit_app_ratio7day <dbl> 0.00000000, 0.00000000, 0.00000000, …
## $ id_hit_app_ratio30day <dbl> 0.000000000, 0.000000000, 0.00000000…
## $ phone_hit_app <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ phone_hit_app7day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ phone_hit_app30day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, …
## $ phone_hit_app_ratio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ phone_hit_app_ratio7day <dbl> 0.00000000, 0.00000000, 0.00000000, …
## $ phone_hit_app_ratio30day <dbl> 0.00000000, 0.00000000, 0.00000000, …
## $ user_hit_app <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ user_hit_app7day <dbl> 0, 0, 2, 1, 11, 2, 1, 2, 2, 5, 6, 1,…
## $ user_hit_app30day <dbl> 9, 0, 3, 1, 12, 2, 3, 7, 13, 6, 10, …
## $ user_hit_app_ratio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ user_hit_app_ratio7day <dbl> 0.00000000, 0.00000000, 0.03225806, …
## $ user_hit_app_ratio30day <dbl> 0.04545455, 0.00000000, 0.01935484, …
## $ total_locked <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, …
## $ total_device <dbl> 9, 2, 2, 3, 3, 2, 1, 9, 1, 3, 2, 8, …
## $ old_reference_trigger <dbl> 0, 1, 0, 1, 0, 0, 2, 0, 1, 5, 5, 1, …
## $ old_reference_trigger7day <dbl> 0, 1, 3, 1, 1, 0, 6, 2, 13, 18, 11, …
## $ old_reference_trigger30day <dbl> 4, 7, 5, 1, 1, 0, 14, 28, 56, 19, 21…
## $ old_reference_ratio <dbl> 0.00000000, 0.05555556, 0.00000000, …
## $ old_reference_ratio7day <dbl> 0.00000000, 0.03448276, 0.04838710, …
## $ old_reference_ratio30day <dbl> 0.02020202, 0.10447761, 0.03225806, …
## $ group_fpd30tt <chr> "0", "0", "0", "0", "0", "0", "0", "…
## $ group_hit_outside <fct> Hit, NoneHit, NoneHit, NoneHit, Hit,…
## $ group_hit_outside7day <fct> Hit, Hit, Hit, NoneHit, Hit, Hit, No…
## $ group_hit_outside30day <fct> Hit, Hit, Hit, NoneHit, Hit, Hit, No…
## $ group_hit_up_load_ratio <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_up_load_ratio7day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_up_load_ratio30day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_old_reference_ratio <chr> "NoneHit", "Hit", "NoneHit", "Hit", …
## $ group_hit_old_reference_ratio7day <chr> "NoneHit", "Hit", "Hit", "Hit", "Hit…
## $ group_hit_old_reference30day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_id_app <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_id_app7day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_id_app30day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_phone_app <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_phone_app7day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_phone_app30day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_user_app <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_user_app7day <chr> "NoneHit", "NoneHit", "Hit", "Hit", …
## $ group_hit_user_app30day <chr> "Hit", "NoneHit", "Hit", "Hit", "Hit…
## # A tibble: 4,612 × 97
## vnpost…¹ vnpos…² vnpos…³ vnpos…⁴ vnpos…⁵ ekyc user_…⁶ flow_…⁷ status total…⁸
## <chr> <chr> <fct> <fct> <fct> <chr> <chr> <dbl> <chr> <dbl>
## 1 dungnt07 Nguyễn… Bưu đi… BĐH Ch… Chương… Đã O… 2 1 <NA> 26
## 2 59.0057 MAI TH… Bưu đi… BĐH Tu… VHX Ph… Đã O… 2 1 <NA> 18
## 3 anhpdt PHAN D… Bưu đi… BĐH Nh… VHX Hi… Đã O… 2 1 <NA> 18
## 4 22.0067 Trịnh … Bưu đi… BĐH Qu… Quế Võ Đã O… 2 1 <NA> 18
## 5 43.0415 Đinh … Bưu đi… BĐTP N… Ninh B… Đã O… 2 1 <NA> 17
## 6 10.0912 NGỌC T… Bưu đi… BĐ Tru… Cầu Di… Đã O… 6 0 Lock 14
## 7 hant10 Nguyễn… Bưu đi… BĐH Ph… Phù Yên Đã O… 2 1 <NA> 13
## 8 41.0824 NGUYỄN… Bưu đi… BĐH Ki… Kiến X… Đã O… 5 3 <NA> 13
## 9 53.0426 Lê Thị… Bưu đi… BĐH A … VHX Ph… Đã O… 5 3 <NA> 13
## 10 83.0048 ĐẶNG T… Bưu đi… BĐH Bù… Bù Gia… Đã O… 2 1 <NA> 13
## # … with 4,602 more rows, 87 more variables: total_app7 <dbl>,
## # total_app30 <dbl>, total_app_cancel <dbl>, total_app_cancel_7day <dbl>,
## # total_app_cancel_30day <dbl>, cancel_ratio <dbl>, cancel_ratio7d_day <dbl>,
## # cancel_ratio30day <dbl>, fpd30_his_tt <dbl>, fpd30_base_tt <dbl>,
## # fpd30_tt <dbl>, total_app_reject <dbl>, total_app_reject_7day <dbl>,
## # total_app_reject_30day <dbl>, reject_ratio <dbl>, reject_ratio7day <dbl>,
## # reject_ratio30day <dbl>, total_app_approve <dbl>, …
1.6 Handling factors
Factors are an essential way to classify observations in our data in different ways. In terms of data wrangling, there are usually at least two steps we take to prepare them for analysis:
Recoding factors, and
Reordering factor levels.
1.6.1 Recoding factors
%>%
TriggerOverview_clean count(group_hit_outside, sort = TRUE)
## # A tibble: 3 × 2
## group_hit_outside n
## <fct> <int>
## 1 <NA> 3676
## 2 NoneHit 829
## 3 Hit 107
%>%
TriggerOverview_clean count(group_hit_outside) %>%
ggplot(aes(group_hit_outside, n))+
geom_col()
### Reordering factor levels
fct_unique(TriggerOverview_clean$user_group)
## factor(0)
## Levels:
%>%
TriggerOverview_clean filter(str_length(vnpostprovincename) >= 15) %>%
count(vnpostprovincename)
## # A tibble: 63 × 2
## vnpostprovincename n
## <fct> <int>
## 1 Bưu điện TP Hà Nội 129
## 2 Bưu điện Tỉnh Bình Định 141
## 3 Bưu điện Tỉnh Đồng Nai 112
## 4 Bưu điện Tỉnh Bắc Ninh 38
## 5 Bưu điện Tỉnh Ninh Bình 57
## 6 Bưu điện Tỉnh Sơn La 98
## 7 Bưu điện Tỉnh Thái Bình 66
## 8 Bưu điện Tỉnh Thừa Thiên Huế 77
## 9 Bưu điện Tỉnh Bình Phước 146
## 10 Bưu điện Tỉnh Quảng Ngãi 79
## # … with 53 more rows
1.7 Dealing with missing data
1.7.1 Mapping missing data
library(naniar)
vis_miss(TriggerOverview_clean)
gg_miss_var(TriggerOverview_clean)
# Summarise the missing value in each variable
miss_var_summary(TriggerOverview_clean)
## # A tibble: 96 × 3
## variable n_miss pct_miss
## <chr> <int> <dbl>
## 1 status 4060 88.0
## 2 cancel_ratio 3676 79.7
## 3 reject_ratio 3676 79.7
## 4 approve_ratio 3676 79.7
## 5 disbursed_ratio 3676 79.7
## 6 outside_ratio 3676 79.7
## 7 up_load_ratio 3676 79.7
## 8 id_hit_app_ratio 3676 79.7
## 9 phone_hit_app_ratio 3676 79.7
## 10 user_hit_app_ratio 3676 79.7
## # … with 86 more rows
gg_miss_upset(TriggerOverview_clean)
## Replacing or removing missing data * User active 12-12
<- TriggerOverview_clean %>%
TriggerOverview_clean1 drop_na(group_hit_outside) %>% vis_miss()
TriggerOverview_clean1
* User active last 7 day
<- TriggerOverview_clean %>%
TriggerOverview_clean7 na.omit(group_hit_outside7day) %>% vis_miss()
TriggerOverview_clean7
* User active last 30 day
<- TriggerOverview_clean %>%
TriggerOverview_clean30 na.omit(group_hit_outside30day) %>% vis_miss()
TriggerOverview_clean30
1.8 Latent constructs and their reliability
- Compute mean() of all related item
- For each rowwise() because each row presents one paticipant
<- TriggerOverview_clean %>%
TriggerOverview_clean1 rowwise() %>%
mutate(user_ratio1 = mean(c(user_hit_app_ratio,
user_hit_app_ratio7day,
user_hit_app_ratio30day
)
),phone_ratio1 = mean(c(phone_hit_app_ratio,
phone_hit_app_ratio7day,
phone_hit_app_ratio30day
)
),id_ratio1 = mean(c(id_hit_app_ratio,
id_hit_app_ratio7day,
id_hit_app_ratio30day
)
)
)
glimpse(TriggerOverview_clean1)
## Rows: 4,612
## Columns: 99
## Rowwise:
## $ vnpost_user_name <chr> "dungnt07", "59.0057", "anhpdt", "22…
## $ vnpostname <chr> "Nguyễn Thị Dung", "MAI THỊ XUÂN HƯỜ…
## $ vnpostprovincename <fct> Bưu điện TP Hà Nội, Bưu điện Tỉnh Bì…
## $ vnpostdistrictname <fct> BĐH Chương Mỹ, BĐH Tuy Phước, BĐH Nh…
## $ vnpostorganizationname <fct> "Chương Mỹ", "VHX Phước Sơn", "VHX H…
## $ ekyc <chr> "Đã OTP thành công", "Đã OTP thành c…
## $ user_group <dbl> 2, 2, 2, 2, 2, 6, 2, 5, 5, 2, 2, 2, …
## $ flow_group <dbl> 1, 1, 1, 1, 1, 0, 1, 3, 3, 1, 1, 1, …
## $ status <chr> NA, NA, NA, NA, NA, "Lock", NA, NA, …
## $ total_app <dbl> 26, 18, 18, 18, 17, 14, 13, 13, 13, …
## $ total_app7 <dbl> 44, 29, 62, 31, 46, 51, 22, 67, 31, …
## $ total_app30 <dbl> 198, 67, 155, 31, 49, 51, 55, 262, 1…
## $ total_app_cancel <dbl> 5, 0, 1, 2, 6, 0, 5, 3, 3, 1, 0, 1, …
## $ total_app_cancel_7day <dbl> 7, 3, 8, 7, 14, 0, 5, 17, 4, 4, 3, 1…
## $ total_app_cancel_30day <dbl> 32, 6, 25, 7, 15, 0, 15, 60, 37, 12,…
## $ cancel_ratio <dbl> 0.19230769, 0.00000000, 0.05555556, …
## $ cancel_ratio7d_day <dbl> 0.15909091, 0.10344828, 0.12903226, …
## $ cancel_ratio30day <dbl> 0.16161616, 0.08955224, 0.16129032, …
## $ fpd30_his_tt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ fpd30_base_tt <dbl> 0, 0, 36, 0, 0, 0, 4, 0, 12, 0, 0, 0…
## $ fpd30_tt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total_app_reject <dbl> 19, 11, 11, 14, 10, 11, 4, 7, 9, 10,…
## $ total_app_reject_7day <dbl> 30, 18, 38, 21, 26, 47, 9, 38, 23, 3…
## $ total_app_reject_30day <dbl> 133, 34, 93, 21, 27, 47, 26, 132, 69…
## $ reject_ratio <dbl> 0.7307692, 0.6111111, 0.6111111, 0.7…
## $ reject_ratio7day <dbl> 0.6818182, 0.6206897, 0.6129032, 0.6…
## $ reject_ratio30day <dbl> 0.6717172, 0.5074627, 0.6000000, 0.6…
## $ total_app_approve <dbl> 1, 1, 4, 0, 1, 2, 0, 0, 0, 1, 1, 0, …
## $ total_app_approve_7day <dbl> 6, 2, 14, 1, 5, 3, 4, 9, 3, 10, 3, 2…
## $ total_app_approve_30day <dbl> 32, 21, 35, 1, 6, 3, 10, 67, 31, 18,…
## $ approve_ratio <dbl> 0.03846154, 0.05555556, 0.22222222, …
## $ approve_ratio7day <dbl> 0.13636364, 0.06896552, 0.22580645, …
## $ approve_ratio30day <dbl> 0.16161616, 0.31343284, 0.22580645, …
## $ total_app_disbursed <dbl> 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 1, 0, …
## $ total_app_disbursed_7day <dbl> 3, 1, 12, 1, 3, 1, 4, 9, 3, 9, 3, 2,…
## $ total_app_disbursed_30day <dbl> 29, 20, 33, 1, 4, 1, 10, 67, 31, 17,…
## $ disbursed_ratio <dbl> 0.00000000, 0.00000000, 0.11111111, …
## $ disbursed_ratio7day <dbl> 0.06818182, 0.03448276, 0.19354839, …
## $ disbursed_ratio30day <dbl> 0.14646465, 0.29850746, 0.21290323, …
## $ outside_app <dbl> 5, 0, 0, 0, 17, 1, 0, 0, 0, 0, 0, 2,…
## $ outside_app7day <dbl> 6, 1, 1, 0, 46, 4, 0, 1, 0, 0, 0, 2,…
## $ outside_app30day <dbl> 11, 1, 3, 0, 46, 4, 0, 3, 4, 0, 0, 1…
## $ outside_ratio <dbl> 0.19230769, 0.00000000, 0.00000000, …
## $ outside_ratio7day <dbl> 0.13636364, 0.03448276, 0.01612903, …
## $ outside_ratio30day <dbl> 0.05555556, 0.01492537, 0.01935484, …
## $ out_side_pro <dbl> 4, 0, 0, 0, 9, 1, 0, 0, 0, 0, 0, 2, …
## $ out_side_pro7day <dbl> 5, 1, 1, 0, 16, 3, 0, 1, 0, 0, 0, 2,…
## $ out_side_pro30day <dbl> 6, 1, 1, 0, 16, 3, 0, 3, 3, 0, 0, 8,…
## $ up_load_ratio <dbl> 0.1153846, 1.0000000, 0.2777778, 0.1…
## $ up_load_ratio7day <dbl> 0.18181818, 0.65517241, 0.17204301, …
## $ up_load_ratio30day <dbl> 0.60942761, 0.28358209, 0.39139785, …
## $ id_hit_app <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ id_hit_app7day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, …
## $ id_hit_app30day <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 1, 3, 1, 0, …
## $ id_hit_app_ratio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ id_hit_app_ratio7day <dbl> 0.00000000, 0.00000000, 0.00000000, …
## $ id_hit_app_ratio30day <dbl> 0.000000000, 0.000000000, 0.00000000…
## $ phone_hit_app <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ phone_hit_app7day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ phone_hit_app30day <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, …
## $ phone_hit_app_ratio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ phone_hit_app_ratio7day <dbl> 0.00000000, 0.00000000, 0.00000000, …
## $ phone_hit_app_ratio30day <dbl> 0.00000000, 0.00000000, 0.00000000, …
## $ user_hit_app <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ user_hit_app7day <dbl> 0, 0, 2, 1, 11, 2, 1, 2, 2, 5, 6, 1,…
## $ user_hit_app30day <dbl> 9, 0, 3, 1, 12, 2, 3, 7, 13, 6, 10, …
## $ user_hit_app_ratio <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ user_hit_app_ratio7day <dbl> 0.00000000, 0.00000000, 0.03225806, …
## $ user_hit_app_ratio30day <dbl> 0.04545455, 0.00000000, 0.01935484, …
## $ total_locked <dbl> 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, …
## $ total_device <dbl> 9, 2, 2, 3, 3, 2, 1, 9, 1, 3, 2, 8, …
## $ old_reference_trigger <dbl> 0, 1, 0, 1, 0, 0, 2, 0, 1, 5, 5, 1, …
## $ old_reference_trigger7day <dbl> 0, 1, 3, 1, 1, 0, 6, 2, 13, 18, 11, …
## $ old_reference_trigger30day <dbl> 4, 7, 5, 1, 1, 0, 14, 28, 56, 19, 21…
## $ old_reference_ratio <dbl> 0.00000000, 0.05555556, 0.00000000, …
## $ old_reference_ratio7day <dbl> 0.00000000, 0.03448276, 0.04838710, …
## $ old_reference_ratio30day <dbl> 0.02020202, 0.10447761, 0.03225806, …
## $ group_fpd30tt <chr> "0", "0", "0", "0", "0", "0", "0", "…
## $ group_hit_outside <fct> Hit, NoneHit, NoneHit, NoneHit, Hit,…
## $ group_hit_outside7day <fct> Hit, Hit, Hit, NoneHit, Hit, Hit, No…
## $ group_hit_outside30day <fct> Hit, Hit, Hit, NoneHit, Hit, Hit, No…
## $ group_hit_up_load_ratio <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_up_load_ratio7day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_up_load_ratio30day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_old_reference_ratio <chr> "NoneHit", "Hit", "NoneHit", "Hit", …
## $ group_hit_old_reference_ratio7day <chr> "NoneHit", "Hit", "Hit", "Hit", "Hit…
## $ group_hit_old_reference30day <chr> "Hit", "Hit", "Hit", "Hit", "Hit", "…
## $ group_hit_id_app <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_id_app7day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_id_app30day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_phone_app <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_phone_app7day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_phone_app30day <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_user_app <chr> "NoneHit", "NoneHit", "NoneHit", "No…
## $ group_hit_user_app7day <chr> "NoneHit", "NoneHit", "Hit", "Hit", …
## $ group_hit_user_app30day <chr> "Hit", "NoneHit", "Hit", "Hit", "Hit…
## $ user_ratio1 <dbl> 0.01515152, 0.00000000, 0.01720430, …
## $ phone_ratio1 <dbl> 0.00000000, 0.00000000, 0.00000000, …
## $ id_ratio1 <dbl> 0.000000000, 0.000000000, 0.00000000…
%>%
TriggerOverview_clean select(user_hit_app_ratio, user_hit_app_ratio7day, user_hit_app_ratio30day ) %>%
::alpha() psych
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.76 0.76 0.62 0.62 3.3 0.0069 0.062 0.13 0.62
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.75 0.76 0.78
## Duhachek 0.75 0.76 0.78
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## user_hit_app_ratio7day 0.69 0.62 0.38 0.62 1.6 NA
## user_hit_app_ratio30day 0.56 0.62 0.38 0.62 1.6 NA
## var.r med.r
## user_hit_app_ratio7day 0 0.62
## user_hit_app_ratio30day 0 0.62
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## user_hit_app_ratio7day 2505 0.95 0.9 0.71 0.62 0.089 0.20
## user_hit_app_ratio30day 3854 0.95 0.9 0.71 0.62 0.095 0.18
%>%
TriggerOverview_clean select(id_hit_app_ratio, id_hit_app_ratio7day, id_hit_app_ratio30day ) %>%
::alpha() psych
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.66 0.67 0.5 0.5 2 0.0098 0.0054 0.035 0.5
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.64 0.66 0.68
## Duhachek 0.64 0.66 0.68
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## id_hit_app_ratio7day 0.60 0.5 0.25 0.5 1 NA 0
## id_hit_app_ratio30day 0.42 0.5 0.25 0.5 1 NA 0
## med.r
## id_hit_app_ratio7day 0.5
## id_hit_app_ratio30day 0.5
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## id_hit_app_ratio7day 2505 0.93 0.87 0.61 0.5 0.0074 0.06
## id_hit_app_ratio30day 3854 0.93 0.87 0.61 0.5 0.0086 0.05
%>%
TriggerOverview_clean select(phone_hit_app_ratio, phone_hit_app_ratio7day, phone_hit_app_ratio30day ) %>%
::alpha() psych
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.68 0.68 0.52 0.52 2.1 0.0094 0.0033 0.031 0.52
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.66 0.68 0.7
## Duhachek 0.66 0.68 0.7
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## phone_hit_app_ratio7day 0.51 0.52 0.27 0.52 1.1 NA
## phone_hit_app_ratio30day 0.52 0.52 0.27 0.52 1.1 NA
## var.r med.r
## phone_hit_app_ratio7day 0 0.52
## phone_hit_app_ratio30day 0 0.52
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## phone_hit_app_ratio7day 2505 0.94 0.87 0.63 0.52 0.0042 0.045
## phone_hit_app_ratio30day 3854 0.96 0.87 0.63 0.52 0.0051 0.046
1.8.1 Confirmatory factor analysis
Confirmatory factor analysis (CFA): This approach is used to confirm whether a set of items truly reflect a latent variable which we defined ex-ante.
#1: Define the model which explains how items relate to latent variables
<- 'latent_today =~ total_app +id_hit_app30day+ phone_hit_app30day+ user_hit_app30day
model
latent_7day =~ total_app7 +id_hit_app30day + phone_hit_app30day+ user_hit_app30day
latent_30day =~ total_app30+ id_hit_app30day + phone_hit_app30day+ user_hit_app30day'
#2: Run the CFA to see how well this model fits our data
<- cfa(model, data = TriggerOverview_clean)
fit
#3a: Extract the performance indicators
<- fitmeasures(fit)
fit_indices
#3b: We tidy the results with enframe() and
# pick only those indices we are most interested in
%>%
fit_indices enframe() %>%
filter(name == "cfi" |
== "srmr" |
name == "rmsea") %>%
name mutate(value = round(value, 3)) # Round to 3 decimal places
## # A tibble: 3 × 2
## name value
## <chr> <lvn.vctr>
## 1 cfi 1.000
## 2 rmsea 0.000
## 3 srmr 0.002
enframe(fit_indices)
## # A tibble: 42 × 2
## name value
## <chr> <lvn.vctr>
## 1 npar 2.100000e+01
## 2 fmin 4.025079e-05
## 3 chisq 3.712732e-01
## 4 df 0.000000e+00
## 5 pvalue NA
## 6 baseline.chisq 8.037557e+03
## 7 baseline.df 1.500000e+01
## 8 baseline.pvalue 0.000000e+00
## 9 cfi 9.999537e-01
## 10 tli 1.000000e+00
## # … with 32 more rows
These column names were generated when we called the function (). I often find myself working through chains of analytical steps iteratively to see what the intermediary steps produce. This also makes it easier to spot any mistakes early on. Therefore, I recommend slowly building up your dplyr chains of function calls, especially when you just started learning R and the approach of data analysis.
The results of our CFA appear fairly promising:
The cfi (Comparative Fit Index) lies above 1,
The rmsea (Root Mean Square Error of Approximation) appears slightly higher than desirable, which usually is lower than 0.00, and
The srmr (Standardised Root Mean Square Residual) lies well below 0.02 (Hu & Bentler, 1999; West et al., 2012).
Overall, the model seems to suggest a good fit with our data.
Combined with the computed Cronbach’s
\(\alpha\), we can be reasonably
confident in our latent variables and perform further analytical
steps.
2 Analysis
2.1 Central tendency measures: Mean, Median, Mode
2.1.1 Mean
%>%
TriggerOverview_clean1 filter(total_app>0) %>%
group_by(vnpostprovincename) %>%
summarise(mean_total_app = mean(total_app, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(vnpostprovincename, mean_total_app), y = mean_total_app)) +
geom_col() +
coord_flip()
%>%
TriggerOverview_clean filter(!is.na(id_hit_app), id_hit_app7day >0) %>%
group_by(vnpostprovincename) %>%
summarise(sum_gross_in_m = sum(id_hit_app7day)) %>%
ggplot(aes(x = vnpostprovincename, y = sum_gross_in_m)) +
geom_col() +
coord_flip()
%>%
TriggerOverview_clean filter(!is.na(id_hit_app), id_hit_app30day >0) %>%
group_by(vnpostprovincename) %>%
summarise(sum_gross_in_m = sum(id_hit_app30day)) %>%
ggplot(aes(x = vnpostprovincename, y = sum_gross_in_m)) +
geom_col() +
coord_flip()
2.2 Indicators and visualisations to examine the spread of data
%>%
TriggerOverview_clean filter(id_hit_app30day >0) %>%
ggplot(aes(id_hit_app_ratio7day)) +
geom_histogram() +
geom_bar(aes(fill = "red"), show.legend = FALSE)
%>%
TriggerOverview_clean filter(id_hit_app30day >0) %>%
ggplot(aes(id_hit_app_ratio30day)) +
geom_histogram() +
geom_bar(aes(fill = "red"), show.legend = FALSE)
2.3 Packages to compute descriptive statistics
2.3.1 The psych package for descriptive statistics
2.3.2 The skimr package for descriptive statistics
2.4 Sources of bias: Outliers, normality and other ‘conundrums’
2.4.1 Linearity and additivity
2.4.2 Independence
2.4.3 Normality
2.4.4 Homogeneity of variance (homoscedasticity)
2.4.5 Outliers and how to deal with them
2.4.5.1 Detecting outliers using the standard deviation
A very frequently used approach to detecting outliers is the use of the standard deviation. Usually, scholars use multiples of the standard deviation to determine thresholds. For example, a value that lies 3 standard deviations above or below the mean could be categorised as an outlier. Unfortunately, there is quite some variability regarding how many multiples of the standard deviation counts as an outlier. Some authors might use 3, and others might settle for 2 (see also Leys et al. (2013)). Let’s stick with the definition of 3 standard deviations to get us started. We can revisit our previous plot regarding id_hit_app7day (see Figure 8.5) and add lines that show the thresholds above and below the mean. As before, I will create a base plot outlier_plot first so that we do not have to repeat the same code over and over again. We then use outlier_plot and add more layers as we see fit.
2.4.6 Detecting outliers using the interquartile range (IQR)
# Compute the quartiles
<- quantile(TriggerOverview_clean$runtime_min)) (TO
## 0% 25% 50% 75% 100%
## NA NA NA NA NA
# Compute the thresholds
<- TO[4] + 1.5 * IQR(TriggerOverview_clean$id_hit_app7day)
iqr_upper <- TO[2] - 1.5 * IQR(TriggerOverview_clean$id_hit_app7day)
iqr_lower
<-
TriggerOverview_clean %>%
TriggerOverview_clean mutate(outlier = ifelse(id_hit_app7day > iqr_upper |
< iqr_lower,
id_hit_app7day TRUE, FALSE))
%>%
TriggerOverview_clean ggplot(aes(x = reorder(vnpostprovincename, id_hit_app30day),
y = id_hit_app7day)
+
) geom_point(size = 1) +
theme(panel.background = element_blank()) +
coord_flip()
%>%
TriggerOverview_clean ggplot(aes(x = reorder(vnpostprovincename, phone_hit_app7day),
y = phone_hit_app7day)
+
) geom_point(size = 1) +
theme(panel.background = element_blank()) +
coord_flip()
%>%
TriggerOverview_clean ggplot(aes(x = reorder(vnpostprovincename, phone_hit_app30day),
y = phone_hit_app30day)
+
) geom_point(size = 1) +
theme(panel.background = element_blank()) +
coord_flip()
%>%
TriggerOverview_clean ggplot(aes(x = reorder(vnpostprovincename, user_hit_app7day),
y = user_hit_app7day)
+
) geom_point(size = 1) +
theme(panel.background = element_blank()) +
coord_flip()
%>%
TriggerOverview_clean ggplot(aes(x = reorder(vnpostprovincename, user_hit_app30day),
y = user_hit_app30day)
+
) geom_point(size = 1) +
theme(panel.background = element_blank()) +
coord_flip()