In today’s highly hyperconnected world, an increasing number of people spending more time consuming digital media content is evident. As the gap between mobile and desktop usage is becoming more pronounced, this allows a shift in focus to navigating the ever-changing mobile landscape.
This analysis aims to explore and understand the existing mobile application landscape, particularly the Apple’s App Store.
This dataset contains nearly 7200 mobile app details in Apple iOS App Store. The data was collected in July 2017 and extracted from the iTunes Search API at the Apple Inc website.
# Read data
apps <- read.csv("data_input/AppleStore.csv")
appd <- read.csv("data_input/appleStore_description.csv")There are two csv files: Apple Store and Apple Store Description, in which this section will be divided into.
This data frame contains each application’s ID, size (in Bytes), currency, price, rating counts (all and current versions), user rating value (all and current versions), latest version code, content rating, app genre or category, number of supporting devices, number of screenshots showed for display, number of supported languages, and whether VPP licensed was enabled
dim(apps)#> [1] 7197 17
names(apps)#> [1] "X" "id" "track_name" "size_bytes"
#> [5] "currency" "price" "rating_count_tot" "rating_count_ver"
#> [9] "user_rating" "user_rating_ver" "ver" "cont_rating"
#> [13] "prime_genre" "sup_devices.num" "ipadSc_urls.num" "lang.num"
#> [17] "vpp_lic"
head(apps)tail(apps)This data frame contains the ID, memory size (in Bytes) and description of each application.
dim(appd)#> [1] 7197 4
names(appd)#> [1] "id" "track_name" "size_bytes" "app_desc"
head(appd)tail(appd)As part of data pre-processing steps, it is important to ensure that this dataset is in the correct data type and has no null or empty value.
For the purpose of this analysis, however, only Apple Store data will be used as the other contains no additional information aside from the descriptions of all 7197 apps listed on the App Store.
# Check data structure
str(apps)#> 'data.frame': 7197 obs. of 17 variables:
#> $ X : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ id : int 281656475 281796108 281940292 282614216 282935706 283619399 283646709 284035177 284666222 284736660 ...
#> $ track_name : chr "PAC-MAN Premium" "Evernote - stay organized" "WeatherBug - Local Weather, Radar, Maps, Alerts" "eBay: Best App to Buy, Sell, Save! Online Shopping" ...
#> $ size_bytes : num 1.01e+08 1.59e+08 1.01e+08 1.29e+08 9.28e+07 ...
#> $ currency : chr "USD" "USD" "USD" "USD" ...
#> $ price : num 3.99 0 0 0 0 0.99 0 0 9.99 3.99 ...
#> $ rating_count_tot: int 21292 161065 188583 262241 985920 8253 119487 1126879 1117 7885 ...
#> $ rating_count_ver: int 26 26 2822 649 5320 5516 879 3594 4 40 ...
#> $ user_rating : num 4 4 3.5 4 4.5 4 4 4 4.5 4 ...
#> $ user_rating_ver : num 4.5 3.5 4.5 4.5 5 4 4.5 4.5 5 4 ...
#> $ ver : chr "6.3.5" "8.2.2" "5.0.0" "5.10.0" ...
#> $ cont_rating : chr "4+" "4+" "4+" "12+" ...
#> $ prime_genre : chr "Games" "Productivity" "Weather" "Shopping" ...
#> $ sup_devices.num : int 38 37 37 37 37 47 37 37 37 38 ...
#> $ ipadSc_urls.num : int 5 5 5 5 5 5 0 4 5 0 ...
#> $ lang.num : int 10 23 3 9 45 1 19 1 1 10 ...
#> $ vpp_lic : int 1 1 1 1 1 1 1 1 1 1 ...
# Change data type
apps$X <- as.character(apps$X)
apps$id <- as.character(apps$id)
apps$currency <- as.factor(apps$currency)
apps$user_rating <- as.factor(apps$user_rating)
apps$user_rating_ver <- as.factor(apps$user_rating_ver)
apps$cont_rating <- as.factor(apps$cont_rating)
apps$prime_genre <- as.factor(apps$prime_genre)
apps$vpp_lic <- as.factor(apps$vpp_lic)
str(apps)#> 'data.frame': 7197 obs. of 17 variables:
#> $ X : chr "1" "2" "3" "4" ...
#> $ id : chr "281656475" "281796108" "281940292" "282614216" ...
#> $ track_name : chr "PAC-MAN Premium" "Evernote - stay organized" "WeatherBug - Local Weather, Radar, Maps, Alerts" "eBay: Best App to Buy, Sell, Save! Online Shopping" ...
#> $ size_bytes : num 1.01e+08 1.59e+08 1.01e+08 1.29e+08 9.28e+07 ...
#> $ currency : Factor w/ 1 level "USD": 1 1 1 1 1 1 1 1 1 1 ...
#> $ price : num 3.99 0 0 0 0 0.99 0 0 9.99 3.99 ...
#> $ rating_count_tot: int 21292 161065 188583 262241 985920 8253 119487 1126879 1117 7885 ...
#> $ rating_count_ver: int 26 26 2822 649 5320 5516 879 3594 4 40 ...
#> $ user_rating : Factor w/ 10 levels "0","1","1.5",..: 8 8 7 8 9 8 8 8 9 8 ...
#> $ user_rating_ver : Factor w/ 10 levels "0","1","1.5",..: 9 7 9 9 10 8 9 9 10 8 ...
#> $ ver : chr "6.3.5" "8.2.2" "5.0.0" "5.10.0" ...
#> $ cont_rating : Factor w/ 4 levels "12+","17+","4+",..: 3 3 3 1 3 3 3 1 3 3 ...
#> $ prime_genre : Factor w/ 23 levels "Book","Business",..: 8 16 23 18 17 8 6 12 22 8 ...
#> $ sup_devices.num : int 38 37 37 37 37 47 37 37 37 38 ...
#> $ ipadSc_urls.num : int 5 5 5 5 5 5 0 4 5 0 ...
#> $ lang.num : int 10 23 3 9 45 1 19 1 1 10 ...
#> $ vpp_lic : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
Analysing app size in Bytes might be confusing and so, converting it to MB may be more convenient.
# Convert app size from Bytes to MB
apps$size_mb <- apps$size_bytes / 1000000colSums(is.na(apps))#> X id track_name size_bytes
#> 0 0 0 0
#> currency price rating_count_tot rating_count_ver
#> 0 0 0 0
#> user_rating user_rating_ver ver cont_rating
#> 0 0 0 0
#> prime_genre sup_devices.num ipadSc_urls.num lang.num
#> 0 0 0 0
#> vpp_lic size_mb
#> 0 0
anyNA(apps)#> [1] FALSE
No missing value found in this data frame.
summary(apps)#> X id track_name size_bytes
#> Length:7197 Length:7197 Length:7197 Min. :5.898e+05
#> Class :character Class :character Class :character 1st Qu.:4.692e+07
#> Mode :character Mode :character Mode :character Median :9.715e+07
#> Mean :1.991e+08
#> 3rd Qu.:1.819e+08
#> Max. :4.026e+09
#>
#> currency price rating_count_tot rating_count_ver
#> USD:7197 Min. : 0.000 Min. : 0 Min. : 0.0
#> 1st Qu.: 0.000 1st Qu.: 28 1st Qu.: 1.0
#> Median : 0.000 Median : 300 Median : 23.0
#> Mean : 1.726 Mean : 12893 Mean : 460.4
#> 3rd Qu.: 1.990 3rd Qu.: 2793 3rd Qu.: 140.0
#> Max. :299.990 Max. :2974676 Max. :177050.0
#>
#> user_rating user_rating_ver ver cont_rating
#> 4.5 :2663 4.5 :2205 Length:7197 12+:1155
#> 4 :1626 0 :1443 Class :character 17+: 622
#> 0 : 929 4 :1237 Mode :character 4+ :4433
#> 3.5 : 702 5 : 964 9+ : 987
#> 5 : 492 3.5 : 533
#> 3 : 383 3 : 304
#> (Other): 402 (Other): 511
#> prime_genre sup_devices.num ipadSc_urls.num lang.num
#> Games :3862 Min. : 9.00 Min. :0.000 Min. : 0.000
#> Entertainment : 535 1st Qu.:37.00 1st Qu.:3.000 1st Qu.: 1.000
#> Education : 453 Median :37.00 Median :5.000 Median : 1.000
#> Photo & Video : 349 Mean :37.36 Mean :3.707 Mean : 5.435
#> Utilities : 248 3rd Qu.:38.00 3rd Qu.:5.000 3rd Qu.: 8.000
#> Health & Fitness: 180 Max. :47.00 Max. :5.000 Max. :75.000
#> (Other) :1570
#> vpp_lic size_mb
#> 0: 50 Min. : 0.59
#> 1:7147 1st Qu.: 46.92
#> Median : 97.15
#> Mean : 199.13
#> 3rd Qu.: 181.93
#> Max. :4025.97
#>
This summary indicates that:
These are some random interesting facts!
unique(apps$prime_genre)#> [1] Games Productivity Weather Shopping
#> [5] Reference Finance Music Utilities
#> [9] Travel Social Networking Sports Business
#> [13] Health & Fitness Entertainment Photo & Video Navigation
#> [17] Education Lifestyle Food & Drink News
#> [21] Book Medical Catalogs
#> 23 Levels: Book Business Catalogs Education Entertainment ... Weather
There was a total of 23 genres in this dataset.
summary(apps$price==0)#> Mode FALSE TRUE
#> logical 3141 4056
apps$price_type <- apps$price == 0
apps$price_type <- ifelse(apps$price_type == TRUE, "FREE", "PAID")4056 apps were free and 3141 were paid apps.
# Free and paid apps in each genre
table(apps$prime_genre, apps$price_type)#>
#> FREE PAID
#> Book 66 46
#> Business 20 37
#> Catalogs 9 1
#> Education 132 321
#> Entertainment 334 201
#> Finance 84 20
#> Food & Drink 43 20
#> Games 2257 1605
#> Health & Fitness 76 104
#> Lifestyle 94 50
#> Medical 8 15
#> Music 67 71
#> Navigation 20 26
#> News 58 17
#> Photo & Video 167 182
#> Productivity 62 116
#> Reference 20 44
#> Shopping 121 1
#> Social Networking 143 24
#> Sports 79 35
#> Travel 56 25
#> Utilities 109 139
#> Weather 31 41
2257 of 3862 Game apps were free, which means more than half were free! Other than Games, Education genre had the most paid apps and Entertainment genre had the most free apps.
xtabs(formula = rating_count_tot ~ prime_genre + user_rating, data = apps)#> user_rating
#> prime_genre 0 1 1.5 2 2.5 3
#> Book 0 1 0 0 1798 201
#> Business 0 0 0 53 2601 6581
#> Catalogs 0 0 0 0 0 0
#> Education 0 2 522 13722 7174 132595
#> Entertainment 0 428 597 80677 98604 684139
#> Finance 0 0 715 1051 3166 45323
#> Food & Drink 0 1 0 4103 1529 2294
#> Games 0 151 1331 190053 97885 843853
#> Health & Fitness 0 1 42 663 502 9084
#> Lifestyle 0 70 3324 1318 7059 6010
#> Medical 0 5 428 0 0 0
#> Music 0 0 0 3018 788 6348
#> Navigation 0 0 0 0 0 82
#> News 0 0 28 209 12108 80664
#> Photo & Video 0 131 888 207 645977 53351
#> Productivity 0 0 431 862 152 500
#> Reference 0 0 103 0 15 22
#> Shopping 0 0 0 0 54936 198298
#> Social Networking 0 118 525 2692 36199 429661
#> Sports 0 27 470 4245 81718 155790
#> Travel 0 2 7 1775 6209 88475
#> Utilities 0 364 1973 3519 3389 122260
#> Weather 0 0 12 0 498 2040
#> user_rating
#> prime_genre 3.5 4 4.5 5
#> Book 252376 67767 163701 88205
#> Business 1947 84943 141905 34891
#> Catalogs 213 2458 1309 13345
#> Education 40689 103096 618951 97620
#> Entertainment 766330 1028667 1337793 33283
#> Finance 202646 385558 408792 101705
#> Food & Drink 125551 29321 452495 262839
#> Games 1577551 4265483 41357161 4545023
#> Health & Fitness 197980 566758 898052 111289
#> Lifestyle 156184 219304 472396 21629
#> Medical 194 29 11774 1204
#> Music 55668 2141110 1771872 1395
#> Navigation 29917 7539 507743 1
#> News 580422 195335 99160 8204
#> Photo & Video 127416 211752 3586071 383153
#> Productivity 210940 574069 608754 37428
#> Reference 29148 228607 1140884 35515
#> Shopping 275544 450641 1110843 180808
#> Social Networking 3743658 980500 2314495 90468
#> Sports 597038 542651 213113 4018
#> Travel 458115 314422 275036 444
#> Utilities 624840 111998 657157 176728
#> Weather 828926 230687 515750 19121
In most genres, higher rating counts were found in apps rated 4 stars or above. But for Social Networking apps, ratings of 3.5 stars were given by over 3.7 million users!
apps[apps$prime_genre=='Productivity' & apps$size_mb>199.13,
c('track_name', 'prime_genre', 'size_mb')]Some of Microsoft’s and Google’s Productivity apps tend to have bigger file sizes.
In most cases, users tend to be more drawn to apps with higher ratings when browsing the App Store and eventually end up downloading them. This makes user ratings a crucial aspect of an app that can immediately grab the target audience’s attention, which ultimately determines future growth and retention.
This section will further explore this dataset to identify the most popular apps based on user rating.
The best apps are normally characterised by having 5 stars, but those rated 4.5 stars will also be considered. To increase validity, apps of below average total rating count will be eliminated.
Therefore, the best apps in this case were those rated 4.5 stars and above with rating counts of greater than or equal to 12,893.
best <- apps[(apps$user_rating=='5' | apps$user_rating=='4.5') & apps$rating_count_tot>=12893,]
head(best)nrow(best)#> [1] 607
607 out of 7197 apps were the best apps
table(best$price_type)#>
#> FREE PAID
#> 501 106
… and over 501 of them were free!
# App categories count
sort(table(best$prime_genre), decreasing = TRUE)#>
#> Games Photo & Video Social Networking Entertainment
#> 398 35 21 20
#> Music Shopping Utilities Education
#> 17 14 14 13
#> Productivity Health & Fitness Weather Finance
#> 13 11 10 7
#> Reference Lifestyle Business Food & Drink
#> 6 5 4 4
#> Travel Book Sports Navigation
#> 4 3 3 2
#> News Catalogs Medical
#> 2 1 0
# App categories in percentage
round(sort(prop.table(table(best$prime_genre)), decreasing = T)*100,2)#>
#> Games Photo & Video Social Networking Entertainment
#> 65.57 5.77 3.46 3.29
#> Music Shopping Utilities Education
#> 2.80 2.31 2.31 2.14
#> Productivity Health & Fitness Weather Finance
#> 2.14 1.81 1.65 1.15
#> Reference Lifestyle Business Food & Drink
#> 0.99 0.82 0.66 0.66
#> Travel Book Sports Navigation
#> 0.66 0.49 0.49 0.33
#> News Catalogs Medical
#> 0.33 0.16 0.00
top3 <- best[(best$prime_genre=='Games') |
(best$prime_genre=='Photo & Video') |
(best$prime_genre=='Social Networking'),]
head(top3)# top3 total
nrow(top3)#> [1] 454
In 2017, the top 3 categories with the highest number of best apps were Games (65.57%), Photo & Video (5.77%), and Social Networking (3.46%).
Other random interesting facts:
best_price <- aggregate(x=price ~ prime_genre, data = best, FUN = mean)
best_price[order(best_price$price, decreasing = T),]Business apps had the highest average price, which means the majority of the best Business apps required payments. The same also applied to Health & Fitness, Productivity, and Weather apps.
App genres such as Music, Games, and Entertainment were some consist of paid apps, although did not seem as expensive as Business or Health & Fitness apps.
Some of the best apps that users tend to use on a daily basis, such as Social Networking and Navigation apps, were free.
# App genres and user rating with the highest total rating count
best_rating <- aggregate(x=rating_count_tot ~ prime_genre + user_rating + price_type, data = best, FUN = sum)
best_rating[order(best_rating$rating_count_tot, decreasing = T),]# Most popular app in the top 3 list
top3_rating <- top3[, c('track_name','prime_genre', 'user_rating', 'rating_count_tot')]
top3_rating[order(top3_rating$rating_count_tot, decreasing = T),]Although the list was highly dominated by Gaming apps, Instagram earned the highest total rating count. This indicates that aside from being the best, Instagram was the most popular app in 2017, followed by Clash of Clans, Temple Run, and Pinterest; all of which had over a million ratings each.
Of over 7000 app details, 607 were apps rated 4.5+ stars and raved by at least 12500 iPhone users. These are referred to as the best apps in this analysis. The top 3 categories with the highest number of best apps were Games (65.57%), Photo & Video (5.77%), and Social Networking (3.46%).
More than 80% of the best apps were free. They did not cost a cent upfront, but offered users limitless capabilities and entertainment. For example, Games were mostly free and at least 10 times more popular than any other categories, yet still earned the highest ratings and impressed over 35 million users through their positive feedback. From the analysis, it can be concluded that the cheaper the app, the greater the chance of users leaving a positive feedback or a high rating.
Many would also argue that the performance of an app can be seen from its user ratings. High ratings can often determine users level of satisfaction with the app and how their user experience have been. This is the reason why user ratings were taken into account in most aspects of this analysis. For instance, ratings of 4.5 stars or above were required for apps to be considered top or popular. Overall, users were more inclined to giving high ratings for apps with entertainment purposes. These include Games, Social Networking, Entertainment, and Music apps.