October 15, 2023

Introduction

In this analysis, we are going to be looking at various elements that effect the top downloaded applications in the IOS Apple Store. From this dataset I obtained it from kaggle.com:

https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps

According to the post: “This data set contains more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website.”

Data overview

##        X               id             track_name          size_bytes       
##  Min.   :    1   Min.   :2.817e+08   Length:7197        Min.   :5.898e+05  
##  1st Qu.: 2090   1st Qu.:6.001e+08   Class :character   1st Qu.:4.692e+07  
##  Median : 4380   Median :9.781e+08   Mode  :character   Median :9.715e+07  
##  Mean   : 4759   Mean   :8.631e+08                      Mean   :1.991e+08  
##  3rd Qu.: 7223   3rd Qu.:1.082e+09                      3rd Qu.:1.819e+08  
##  Max.   :11097   Max.   :1.188e+09                      Max.   :4.026e+09  
##    currency             price         rating_count_tot  rating_count_ver  
##  Length:7197        Min.   :  0.000   Min.   :      0   Min.   :     0.0  
##  Class :character   1st Qu.:  0.000   1st Qu.:     28   1st Qu.:     1.0  
##  Mode  :character   Median :  0.000   Median :    300   Median :    23.0  
##                     Mean   :  1.726   Mean   :  12893   Mean   :   460.4  
##                     3rd Qu.:  1.990   3rd Qu.:   2793   3rd Qu.:   140.0  
##                     Max.   :299.990   Max.   :2974676   Max.   :177050.0  
##   user_rating    user_rating_ver     ver            cont_rating       
##  Min.   :0.000   Min.   :0.000   Length:7197        Length:7197       
##  1st Qu.:3.500   1st Qu.:2.500   Class :character   Class :character  
##  Median :4.000   Median :4.000   Mode  :character   Mode  :character  
##  Mean   :3.527   Mean   :3.254                                        
##  3rd Qu.:4.500   3rd Qu.:4.500                                        
##  Max.   :5.000   Max.   :5.000                                        
##  prime_genre        sup_devices.num ipadSc_urls.num    lang.num     
##  Length:7197        Min.   : 9.00   Min.   :0.000   Min.   : 0.000  
##  Class :character   1st Qu.:37.00   1st Qu.:3.000   1st Qu.: 1.000  
##  Mode  :character   Median :37.00   Median :5.000   Median : 1.000  
##                     Mean   :37.36   Mean   :3.707   Mean   : 5.435  
##                     3rd Qu.:38.00   3rd Qu.:5.000   3rd Qu.: 8.000  
##                     Max.   :47.00   Max.   :5.000   Max.   :75.000  
##     vpp_lic      
##  Min.   :0.0000  
##  1st Qu.:1.0000  
##  Median :1.0000  
##  Mean   :0.9931  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

Most Popular Genres

Hypothesis Test: Is There a Statistical Difference in the Average User Ratings Between Paid and Free Apps in ‘Games’?

With the Games genre clearly being the highest recorded genre in the IOS Mobile Store in 2017, there is a lot of different questions we can ask and see if there is a statisical pattern among the data set.

Let’s start by seeing how we can make this use RStudio to make this dataset:

Defining the Hypothesis Test:

Null Hypothesis (\(H_O\)): There is no difference between free and paid games \[ H_O: \mu_{\text{Free}} = \mu_{\text{Paid}} \]

Alternative Hypothesis (\(H_a\)): There is a difference between free and paid games. \[ H_a: \mu_{\text{Free}} \neq \mu_{\text{Paid}} \]

Where: \(\mu_{\text{Free}}\): user rating of free games \(\mu_{\text{Paid}}\): user rating of paid games

T-Test:

\[ t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }} \]

Where:

  • \(n_1\) and \(n_2\) are the sample sizes of both average ratings of paid games and free games

  • \(\bar{X_1}\) and \(\bar{X_2}\) are the sample means of both average ratings of paid games and free games

  • \(s_1^2\) and \(s_2^2\) are the sample variances of both average rating of paid and free games

The t-statistc is used to determine to reject the null hypothesis or fail to reject the null hypothesis.

Code for the Hypothesis Test Boxplot and t-Test

games <- data %>% filter(prime_genre == “Games”) %>% mutate(appType = ifelse(price == 0, “Free”, “Paid”))

user <- mean(games$user_rating)

ggplot(games, aes(x = appType, y = user, fill = appType)) + geom_boxplot() + labs(title = “User Rating of Free Games vs Paid Games”, x = “App Type”, y = “Average User Rating”) + theme_minimal() + scale_fill_manual(values = c(“Free” = “pink”, “Paid” = “green”))

t_test <- t.test(user_rating ~ appType, data = games)

t_test

User Rating of Free Games vs Paid Games

With the given output of this boxplot: Paid apps have a higher median at 4.5 compared to free apps with a median of 4 with few outliers for both.

t-test:

## 
##  Welch Two Sample t-test
## 
## data:  user_rating by appType
## t = -8.3759, df = 3859.8, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Free and group Paid is not equal to 0
## 95 percent confidence interval:
##  -0.4645133 -0.2883000
## sample estimates:
## mean in group Free mean in group Paid 
##           3.528578           3.904984

Meaning of the output:

t-Test Result: \(t = -8.3759, p < 0.0001\)

  • Since the t value is negative, this means that Free apps has a lower mean than Paid apps

  • Usually we compare the p value to 5%, but since the outputted p value is so small, there suggests strong evidence to reject the null hypothesis.

  • Conclusion: We reject the null hypothesis.

    • There is a significant difference between the average rating between free games and paid games.
    • Paid games have a higher average rating.

3D Plot of Paid Games

We are taking into account the price, user rating, and the rating count total to see if there is a correlation between the 3 elements:

So What is the Top Rated, Most Rated Paid Games and Prices in 2017?

##                   track_name price rating_count_tot user_rating
## 1        Fruit Ninja Classic  1.99           698516         4.5
## 2         Clear Vision (17+)  0.99           541693         4.5
## 3  Minecraft: Pocket Edition  6.99           522012         4.5
## 4         Plants vs. Zombies  0.99           426463         5.0
## 5                Doodle Jump  0.99           395261         4.5
## 6             Draw Something  2.99           360974         4.5
## 7             Infinity Blade  0.99           326482         5.0
## 8              Geometry Dash  1.99           266440         5.0
## 9                 Tiny Wings  0.99           219418         4.5
## 10              Traffic Rush  0.99           213092         3.5

Why these games?

A lot of these top high reviewed games have one thing in common:

  • Constant maintance checks for bugs

  • Consistent updates with new features

    • Some of these features are free and some are paid content.

No photo

Thank you for listening!
Questions?