Introduction

In today’s highly hyperconnected world, an increasing number of people spending more time consuming digital media content is evident. As the gap between mobile and desktop usage is becoming more pronounced, this allows a shift in focus to navigating the ever-changing mobile landscape.

This analysis aims to explore and understand the existing mobile application landscape, particularly the Apple’s App Store.

About the Dataset

This dataset contains nearly 7200 mobile app details in Apple iOS App Store. The data was collected in July 2017 and extracted from the iTunes Search API at the Apple Inc website.

# Read data
apps <- read.csv("data_input/AppleStore.csv")
appd <- read.csv("data_input/appleStore_description.csv")

Data Inspection

There are two csv files: Apple Store and Apple Store Description, in which this section will be divided into.

Apple Store

This data frame contains each application’s ID, size (in Bytes), currency, price, rating counts (all and current versions), user rating value (all and current versions), latest version code, content rating, app genre or category, number of supporting devices, number of screenshots showed for display, number of supported languages, and whether VPP licensed was enabled

dim(apps)
#> [1] 7197   17
names(apps)
#>  [1] "X"                "id"               "track_name"       "size_bytes"      
#>  [5] "currency"         "price"            "rating_count_tot" "rating_count_ver"
#>  [9] "user_rating"      "user_rating_ver"  "ver"              "cont_rating"     
#> [13] "prime_genre"      "sup_devices.num"  "ipadSc_urls.num"  "lang.num"        
#> [17] "vpp_lic"
head(apps)
tail(apps)

Apple Store Description

This data frame contains the ID, memory size (in Bytes) and description of each application.

dim(appd)
#> [1] 7197    4
names(appd)
#> [1] "id"         "track_name" "size_bytes" "app_desc"
head(appd)
tail(appd)

Data Cleansing

As part of data pre-processing steps, it is important to ensure that this dataset is in the correct data type and has no null or empty value.

For the purpose of this analysis, however, only Apple Store data will be used as the other contains no additional information aside from the descriptions of all 7197 apps listed on the App Store.

# Check data structure
str(apps)
#> 'data.frame':    7197 obs. of  17 variables:
#>  $ X               : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ id              : int  281656475 281796108 281940292 282614216 282935706 283619399 283646709 284035177 284666222 284736660 ...
#>  $ track_name      : chr  "PAC-MAN Premium" "Evernote - stay organized" "WeatherBug - Local Weather, Radar, Maps, Alerts" "eBay: Best App to Buy, Sell, Save! Online Shopping" ...
#>  $ size_bytes      : num  1.01e+08 1.59e+08 1.01e+08 1.29e+08 9.28e+07 ...
#>  $ currency        : chr  "USD" "USD" "USD" "USD" ...
#>  $ price           : num  3.99 0 0 0 0 0.99 0 0 9.99 3.99 ...
#>  $ rating_count_tot: int  21292 161065 188583 262241 985920 8253 119487 1126879 1117 7885 ...
#>  $ rating_count_ver: int  26 26 2822 649 5320 5516 879 3594 4 40 ...
#>  $ user_rating     : num  4 4 3.5 4 4.5 4 4 4 4.5 4 ...
#>  $ user_rating_ver : num  4.5 3.5 4.5 4.5 5 4 4.5 4.5 5 4 ...
#>  $ ver             : chr  "6.3.5" "8.2.2" "5.0.0" "5.10.0" ...
#>  $ cont_rating     : chr  "4+" "4+" "4+" "12+" ...
#>  $ prime_genre     : chr  "Games" "Productivity" "Weather" "Shopping" ...
#>  $ sup_devices.num : int  38 37 37 37 37 47 37 37 37 38 ...
#>  $ ipadSc_urls.num : int  5 5 5 5 5 5 0 4 5 0 ...
#>  $ lang.num        : int  10 23 3 9 45 1 19 1 1 10 ...
#>  $ vpp_lic         : int  1 1 1 1 1 1 1 1 1 1 ...

Data Coercion

# Change data type
apps$X <- as.character(apps$X)
apps$id <- as.character(apps$id)
apps$currency <- as.factor(apps$currency)
apps$user_rating <- as.factor(apps$user_rating)
apps$user_rating_ver <- as.factor(apps$user_rating_ver)
apps$cont_rating <- as.factor(apps$cont_rating)
apps$prime_genre <- as.factor(apps$prime_genre)
apps$vpp_lic <- as.factor(apps$vpp_lic)

str(apps)
#> 'data.frame':    7197 obs. of  17 variables:
#>  $ X               : chr  "1" "2" "3" "4" ...
#>  $ id              : chr  "281656475" "281796108" "281940292" "282614216" ...
#>  $ track_name      : chr  "PAC-MAN Premium" "Evernote - stay organized" "WeatherBug - Local Weather, Radar, Maps, Alerts" "eBay: Best App to Buy, Sell, Save! Online Shopping" ...
#>  $ size_bytes      : num  1.01e+08 1.59e+08 1.01e+08 1.29e+08 9.28e+07 ...
#>  $ currency        : Factor w/ 1 level "USD": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ price           : num  3.99 0 0 0 0 0.99 0 0 9.99 3.99 ...
#>  $ rating_count_tot: int  21292 161065 188583 262241 985920 8253 119487 1126879 1117 7885 ...
#>  $ rating_count_ver: int  26 26 2822 649 5320 5516 879 3594 4 40 ...
#>  $ user_rating     : Factor w/ 10 levels "0","1","1.5",..: 8 8 7 8 9 8 8 8 9 8 ...
#>  $ user_rating_ver : Factor w/ 10 levels "0","1","1.5",..: 9 7 9 9 10 8 9 9 10 8 ...
#>  $ ver             : chr  "6.3.5" "8.2.2" "5.0.0" "5.10.0" ...
#>  $ cont_rating     : Factor w/ 4 levels "12+","17+","4+",..: 3 3 3 1 3 3 3 1 3 3 ...
#>  $ prime_genre     : Factor w/ 23 levels "Book","Business",..: 8 16 23 18 17 8 6 12 22 8 ...
#>  $ sup_devices.num : int  38 37 37 37 37 47 37 37 37 38 ...
#>  $ ipadSc_urls.num : int  5 5 5 5 5 5 0 4 5 0 ...
#>  $ lang.num        : int  10 23 3 9 45 1 19 1 1 10 ...
#>  $ vpp_lic         : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...

Analysing app size in Bytes might be confusing and so, converting it to MB may be more convenient.

# Convert app size from Bytes to MB
apps$size_mb <- apps$size_bytes / 1000000

Missing Value

colSums(is.na(apps))
#>                X               id       track_name       size_bytes 
#>                0                0                0                0 
#>         currency            price rating_count_tot rating_count_ver 
#>                0                0                0                0 
#>      user_rating  user_rating_ver              ver      cont_rating 
#>                0                0                0                0 
#>      prime_genre  sup_devices.num  ipadSc_urls.num         lang.num 
#>                0                0                0                0 
#>          vpp_lic          size_mb 
#>                0                0
anyNA(apps)
#> [1] FALSE

No missing value found in this data frame.

Data Overview

summary(apps)
#>       X                  id             track_name          size_bytes       
#>  Length:7197        Length:7197        Length:7197        Min.   :5.898e+05  
#>  Class :character   Class :character   Class :character   1st Qu.:4.692e+07  
#>  Mode  :character   Mode  :character   Mode  :character   Median :9.715e+07  
#>                                                           Mean   :1.991e+08  
#>                                                           3rd Qu.:1.819e+08  
#>                                                           Max.   :4.026e+09  
#>                                                                              
#>  currency       price         rating_count_tot  rating_count_ver  
#>  USD:7197   Min.   :  0.000   Min.   :      0   Min.   :     0.0  
#>             1st Qu.:  0.000   1st Qu.:     28   1st Qu.:     1.0  
#>             Median :  0.000   Median :    300   Median :    23.0  
#>             Mean   :  1.726   Mean   :  12893   Mean   :   460.4  
#>             3rd Qu.:  1.990   3rd Qu.:   2793   3rd Qu.:   140.0  
#>             Max.   :299.990   Max.   :2974676   Max.   :177050.0  
#>                                                                   
#>   user_rating   user_rating_ver     ver            cont_rating
#>  4.5    :2663   4.5    :2205    Length:7197        12+:1155   
#>  4      :1626   0      :1443    Class :character   17+: 622   
#>  0      : 929   4      :1237    Mode  :character   4+ :4433   
#>  3.5    : 702   5      : 964                       9+ : 987   
#>  5      : 492   3.5    : 533                                  
#>  3      : 383   3      : 304                                  
#>  (Other): 402   (Other): 511                                  
#>            prime_genre   sup_devices.num ipadSc_urls.num    lang.num     
#>  Games           :3862   Min.   : 9.00   Min.   :0.000   Min.   : 0.000  
#>  Entertainment   : 535   1st Qu.:37.00   1st Qu.:3.000   1st Qu.: 1.000  
#>  Education       : 453   Median :37.00   Median :5.000   Median : 1.000  
#>  Photo & Video   : 349   Mean   :37.36   Mean   :3.707   Mean   : 5.435  
#>  Utilities       : 248   3rd Qu.:38.00   3rd Qu.:5.000   3rd Qu.: 8.000  
#>  Health & Fitness: 180   Max.   :47.00   Max.   :5.000   Max.   :75.000  
#>  (Other)         :1570                                                   
#>  vpp_lic     size_mb       
#>  0:  50   Min.   :   0.59  
#>  1:7147   1st Qu.:  46.92  
#>           Median :  97.15  
#>           Mean   : 199.13  
#>           3rd Qu.: 181.93  
#>           Max.   :4025.97  
#> 

This summary indicates that:

  • Prices are expressed in US dollars.
  • Although most apps were free, the most expensive app was priced at US$299.99.
  • On average, each app was given 12,893 ratings but there was one app with over 2 million ratings!
  • The mean for every app’s latest version recorded in mid 2017 was 460 ratings, with a maximum of 177,050 ratings.
  • On the scale of 0 to 5, a large proportion of apps or more than 2,600 apps were rated 4.5 stars, while 4 stars being the second most popular. Even more than 900 apps were unrated.
  • Similarly, 4.5 was the most common rating given to each app’s latest version per mid 2017.
  • Apps have age limits between 4+, 9+, 12+, and 17+ years of age.
    • The majority of apps (4,500+) were rated 4+, which suggests that they were generally suitable for all users.
    • However, almost 1200 apps required users to be at least 12 years old before they can download.
  • The most popular app categories were Game apps, followed by Entertainment, Education, Photo & Video, and Utilities apps.
  • Typically, apps supported 37 devices up to a maximum of 47 Apple devices. But some apps were only compatible with 9 Apple devices.
  • The majority of apps supported at least 1 or 2 languages.
  • Of all 7197 apps published on the Apple App Store, the average file size was 199MB.

Detailed summary

These are some random interesting facts!

1. The different kind of genres or categories

unique(apps$prime_genre)
#>  [1] Games             Productivity      Weather           Shopping         
#>  [5] Reference         Finance           Music             Utilities        
#>  [9] Travel            Social Networking Sports            Business         
#> [13] Health & Fitness  Entertainment     Photo & Video     Navigation       
#> [17] Education         Lifestyle         Food & Drink      News             
#> [21] Book              Medical           Catalogs         
#> 23 Levels: Book Business Catalogs Education Entertainment ... Weather

There was a total of 23 genres in this dataset.

2. Total free and paid apps

summary(apps$price==0)
#>    Mode   FALSE    TRUE 
#> logical    3141    4056
apps$price_type <- apps$price == 0
apps$price_type <- ifelse(apps$price_type == TRUE, "FREE", "PAID")

4056 apps were free and 3141 were paid apps.

# Free and paid apps in each genre
table(apps$prime_genre, apps$price_type)
#>                    
#>                     FREE PAID
#>   Book                66   46
#>   Business            20   37
#>   Catalogs             9    1
#>   Education          132  321
#>   Entertainment      334  201
#>   Finance             84   20
#>   Food & Drink        43   20
#>   Games             2257 1605
#>   Health & Fitness    76  104
#>   Lifestyle           94   50
#>   Medical              8   15
#>   Music               67   71
#>   Navigation          20   26
#>   News                58   17
#>   Photo & Video      167  182
#>   Productivity        62  116
#>   Reference           20   44
#>   Shopping           121    1
#>   Social Networking  143   24
#>   Sports              79   35
#>   Travel              56   25
#>   Utilities          109  139
#>   Weather             31   41

2257 of 3862 Game apps were free, which means more than half were free! Other than Games, Education genre had the most paid apps and Entertainment genre had the most free apps.

3. Total rating count for each genre and user rating

xtabs(formula = rating_count_tot ~ prime_genre + user_rating, data = apps)
#>                    user_rating
#> prime_genre                0        1      1.5        2      2.5        3
#>   Book                     0        1        0        0     1798      201
#>   Business                 0        0        0       53     2601     6581
#>   Catalogs                 0        0        0        0        0        0
#>   Education                0        2      522    13722     7174   132595
#>   Entertainment            0      428      597    80677    98604   684139
#>   Finance                  0        0      715     1051     3166    45323
#>   Food & Drink             0        1        0     4103     1529     2294
#>   Games                    0      151     1331   190053    97885   843853
#>   Health & Fitness         0        1       42      663      502     9084
#>   Lifestyle                0       70     3324     1318     7059     6010
#>   Medical                  0        5      428        0        0        0
#>   Music                    0        0        0     3018      788     6348
#>   Navigation               0        0        0        0        0       82
#>   News                     0        0       28      209    12108    80664
#>   Photo & Video            0      131      888      207   645977    53351
#>   Productivity             0        0      431      862      152      500
#>   Reference                0        0      103        0       15       22
#>   Shopping                 0        0        0        0    54936   198298
#>   Social Networking        0      118      525     2692    36199   429661
#>   Sports                   0       27      470     4245    81718   155790
#>   Travel                   0        2        7     1775     6209    88475
#>   Utilities                0      364     1973     3519     3389   122260
#>   Weather                  0        0       12        0      498     2040
#>                    user_rating
#> prime_genre              3.5        4      4.5        5
#>   Book                252376    67767   163701    88205
#>   Business              1947    84943   141905    34891
#>   Catalogs               213     2458     1309    13345
#>   Education            40689   103096   618951    97620
#>   Entertainment       766330  1028667  1337793    33283
#>   Finance             202646   385558   408792   101705
#>   Food & Drink        125551    29321   452495   262839
#>   Games              1577551  4265483 41357161  4545023
#>   Health & Fitness    197980   566758   898052   111289
#>   Lifestyle           156184   219304   472396    21629
#>   Medical                194       29    11774     1204
#>   Music                55668  2141110  1771872     1395
#>   Navigation           29917     7539   507743        1
#>   News                580422   195335    99160     8204
#>   Photo & Video       127416   211752  3586071   383153
#>   Productivity        210940   574069   608754    37428
#>   Reference            29148   228607  1140884    35515
#>   Shopping            275544   450641  1110843   180808
#>   Social Networking  3743658   980500  2314495    90468
#>   Sports              597038   542651   213113     4018
#>   Travel              458115   314422   275036      444
#>   Utilities           624840   111998   657157   176728
#>   Weather             828926   230687   515750    19121

In most genres, higher rating counts were found in apps rated 4 stars or above. But for Social Networking apps, ratings of 3.5 stars were given by over 3.7 million users!

4. Names of Productivity apps with above average file size

apps[apps$prime_genre=='Productivity' & apps$size_mb>199.13, 
     c('track_name', 'prime_genre', 'size_mb')]

Some of Microsoft’s and Google’s Productivity apps tend to have bigger file sizes.

5. Highest rated social networking apps

head(apps[apps$prime_genre=='Social Networking' & apps$user_rating=='5',], 1)

Why did We Heart It become the highest rated social networking app? Perhaps because it supported 31 languages, did not require much space, was compatible with 37 Apple devices, and most importantly, free!

Analysis

In most cases, users tend to be more drawn to apps with higher ratings when browsing the App Store and eventually end up downloading them. This makes user ratings a crucial aspect of an app that can immediately grab the target audience’s attention, which ultimately determines future growth and retention.

This section will further explore this dataset to identify the most popular apps based on user rating.

Best apps on the App Store

The best apps are normally characterised by having 5 stars, but those rated 4.5 stars will also be considered. To increase validity, apps of below average total rating count will be eliminated.

Therefore, the best apps in this case were those rated 4.5 stars and above with rating counts of greater than or equal to 12,893.

best <- apps[(apps$user_rating=='5' | apps$user_rating=='4.5') & apps$rating_count_tot>=12893,]

head(best)
nrow(best)
#> [1] 607

607 out of 7197 apps were the best apps

table(best$price_type)
#> 
#> FREE PAID 
#>  501  106

… and over 501 of them were free!

Top 3 app categories

# App categories count
sort(table(best$prime_genre), decreasing = TRUE)
#> 
#>             Games     Photo & Video Social Networking     Entertainment 
#>               398                35                21                20 
#>             Music          Shopping         Utilities         Education 
#>                17                14                14                13 
#>      Productivity  Health & Fitness           Weather           Finance 
#>                13                11                10                 7 
#>         Reference         Lifestyle          Business      Food & Drink 
#>                 6                 5                 4                 4 
#>            Travel              Book            Sports        Navigation 
#>                 4                 3                 3                 2 
#>              News          Catalogs           Medical 
#>                 2                 1                 0
# App categories in percentage
round(sort(prop.table(table(best$prime_genre)), decreasing = T)*100,2)
#> 
#>             Games     Photo & Video Social Networking     Entertainment 
#>             65.57              5.77              3.46              3.29 
#>             Music          Shopping         Utilities         Education 
#>              2.80              2.31              2.31              2.14 
#>      Productivity  Health & Fitness           Weather           Finance 
#>              2.14              1.81              1.65              1.15 
#>         Reference         Lifestyle          Business      Food & Drink 
#>              0.99              0.82              0.66              0.66 
#>            Travel              Book            Sports        Navigation 
#>              0.66              0.49              0.49              0.33 
#>              News          Catalogs           Medical 
#>              0.33              0.16              0.00
top3 <- best[(best$prime_genre=='Games') | 
       (best$prime_genre=='Photo & Video') | 
       (best$prime_genre=='Social Networking'),]

head(top3)
# top3 total
nrow(top3)
#> [1] 454

In 2017, the top 3 categories with the highest number of best apps were Games (65.57%), Photo & Video (5.77%), and Social Networking (3.46%).

Other random interesting facts:

  • Game apps were the most popular, at least 10 times more popular than any other categories.
  • Most users found video games interesting or perhaps more addictive than any other categories.
  • Many users were more willing to give high ratings for apps that have entertainment purposes.
  • The gap between categories (games excluded) was relatively close.

Average price

best_price <- aggregate(x=price ~ prime_genre, data = best, FUN = mean)
best_price[order(best_price$price, decreasing = T),]

Business apps had the highest average price, which means the majority of the best Business apps required payments. The same also applied to Health & Fitness, Productivity, and Weather apps.

App genres such as Music, Games, and Entertainment were some consist of paid apps, although did not seem as expensive as Business or Health & Fitness apps.

Some of the best apps that users tend to use on a daily basis, such as Social Networking and Navigation apps, were free.

Rating counts

# App genres and user rating with the highest total rating count
best_rating <- aggregate(x=rating_count_tot ~ prime_genre + user_rating + price_type, data = best, FUN = sum)
best_rating[order(best_rating$rating_count_tot, decreasing = T),]
  • The total rating counts for all Games apps rated 4.5 stars were almost 39 million!
  • Users were more likely to give 4.5 stars to the apps they like.
  • Higher rating counts were found in free apps.
# Most popular app in the top 3 list
top3_rating <- top3[, c('track_name','prime_genre', 'user_rating', 'rating_count_tot')]
top3_rating[order(top3_rating$rating_count_tot, decreasing = T),]

Although the list was highly dominated by Gaming apps, Instagram earned the highest total rating count. This indicates that aside from being the best, Instagram was the most popular app in 2017, followed by Clash of Clans, Temple Run, and Pinterest; all of which had over a million ratings each.

Conclusion

Of over 7000 app details, 607 were apps rated 4.5+ stars and raved by at least 12500 iPhone users. These are referred to as the best apps in this analysis. The top 3 categories with the highest number of best apps were Games (65.57%), Photo & Video (5.77%), and Social Networking (3.46%).

More than 80% of the best apps were free. They did not cost a cent upfront, but offered users limitless capabilities and entertainment. For example, Games were mostly free and at least 10 times more popular than any other categories, yet still earned the highest ratings and impressed over 35 million users through their positive feedback. From the analysis, it can be concluded that the cheaper the app, the greater the chance of users leaving a positive feedback or a high rating.

Many would also argue that the performance of an app can be seen from its user ratings. High ratings can often determine users level of satisfaction with the app and how their user experience have been. This is the reason why user ratings were taken into account in most aspects of this analysis. For instance, ratings of 4.5 stars or above were required for apps to be considered top or popular. Overall, users were more inclined to giving high ratings for apps with entertainment purposes. These include Games, Social Networking, Entertainment, and Music apps.