suman ghorai

11/04/2019

========================================================

Introduction about the Data set!

In this document we are going to explore the top trending apps in iOS app store. The dataset has more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website.

Univariate Plots Section

## [1] 7197   17
## 'data.frame':    7197 obs. of  17 variables:
##  $ X               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ id              : int  281656475 281796108 281940292 282614216 282935706 283619399 283646709 284035177 284666222 284736660 ...
##  $ track_name      : Factor w/ 7195 levels "-The ç©´é\200šã\201—3D- å\220›ã\201®è¨\230憶力xå\217\215射神経をå•\217ã\201†! ~Mr.CURVEã\201‹ã‚‰ã\201®æŒ‘æ\210¦çж ~",..: 4747 2617 6946 2479 1169 5572 4820 4766 4823 4357 ...
##  $ size_bytes      : num  1.01e+08 1.59e+08 1.01e+08 1.29e+08 9.28e+07 ...
##  $ currency        : Factor w/ 1 level "USD": 1 1 1 1 1 1 1 1 1 1 ...
##  $ price           : num  3.99 0 0 0 0 0.99 0 0 9.99 3.99 ...
##  $ rating_count_tot: int  21292 161065 188583 262241 985920 8253 119487 1126879 1117 7885 ...
##  $ rating_count_ver: int  26 26 2822 649 5320 5516 879 3594 4 40 ...
##  $ user_rating     : num  4 4 3.5 4 4.5 4 4 4 4.5 4 ...
##  $ user_rating_ver : num  4.5 3.5 4.5 4.5 5 4 4.5 4.5 5 4 ...
##  $ ver             : Factor w/ 1590 levels "0.0.15","0.13",..: 1380 1515 1211 1237 1473 455 1360 1523 1009 1068 ...
##  $ cont_rating     : Factor w/ 4 levels "12+","17+","4+",..: 3 3 3 1 3 3 3 1 3 3 ...
##  $ prime_genre     : Factor w/ 23 levels "Book","Business",..: 8 16 23 18 17 8 6 12 22 8 ...
##  $ sup_devices.num : int  38 37 37 37 37 47 37 37 37 38 ...
##  $ ipadSc_urls.num : int  5 5 5 5 5 5 0 4 5 0 ...
##  $ lang.num        : int  10 23 3 9 45 1 19 1 1 10 ...
##  $ vpp_lic         : int  1 1 1 1 1 1 1 1 1 1 ...

From the above histogram, we see that the top app category in our data is Games. We can investigate better by deviding categories; Games, General App’)

Let’s check a quick summary about free and paid apps

##    Mode   FALSE    TRUE 
## logical    3141    4056

Will be a good option to add a new feature to check if it’s paid or a free app.

Most of our data focus on applications that are for +4 ages. That makes a good sense since we already have most of the data category are games which are +4 age

We excluded 0 rating from our apps to better investigate the dataset. Most of apps are rated 4.5 and very few are rated as average 5.

## [1] "Top 10 Games based on user rating and total number of ratings"
##                                              track_name rating_count_tot
## 1105                                        Head Soccer           481564
## 303                                  Plants vs. Zombies           426463
## 3087         Sniper 3D Assassin: Shoot to Kill Gun Game           386521
## 2178                                 Geometry Dash Lite           370370
## 499                                      Infinity Blade           326482
## 1878                                      Geometry Dash           266440
## 2732                                       CSR Racing 2           257100
## 1803 Pictoword: Fun 2 Pics Guess What's the Word Trivia           186089
## 351                               Plants vs. Zombies HD           163598
## 1485                                           The Room           143908
## [1] "Top 10 General apps based on user rating and total number of ratings"
##                                              track_name rating_count_tot
## 811                                  Domino's Pizza USA           258624
## 468                                      Flashlight â“„           130450
## 885  Pic Collage - Picture Editor & Photo Collage Maker           123433
## 546    Zappos: shop shoes & clothes, fast free shipping           103655
## 1287 Credit Karma: Free Credit Scores, Reports & Alerts           101679
## 1412 We Heart It - Fashion, wallpapers, quotes, tattoos            90414
## 3415  Google Photos - unlimited photo and video storage            88742
## 4208       Color Therapy Adult Coloring Book for Adults            84062
## 2656                 Elevate - Brain Training and Games            58092
## 927  FotoRus -Camera & Photo Editor & Pic Collage Maker            32558
##            prime_genre
## 811       Food & Drink
## 468          Utilities
## 885      Photo & Video
## 546           Shopping
## 1287           Finance
## 1412 Social Networking
## 3415     Photo & Video
## 4208              Book
## 2656         Education
## 927      Photo & Video
## [1] "Top 10 apps based on total count of user rating"
##                   track_name rating_count_tot
## 17                  Facebook          2974676
## 520                Instagram          2161558
## 1347          Clash of Clans          2130805
## 708               Temple Run          1724546
## 8    Pandora - Music & Radio          1126879
## 756                Pinterest          1061624
## 5                      Bible           985920
## 1494        Candy Crush Saga           961794
## 179            Spotify Music           878563
## 276              Angry Birds           824451

Univariate Analysis

Structure of the dataset:

There are 7197 observation about apps in the US iOS app store in our dataset with 16 features. Categorical Variables Are: prime_genre, currency, user_rating, cont_rating, is_game. Numerical Variables Are: price, rating_count_tot, size_bytes, size_mb, user_rating_ver, ver, rating_count_ver.

The primary category for most of the observation are about Games, the rest are for general apps.

Price can be devided into Free or Paid.

75% of apps are less than $2 in price and have average rating of 4.5

Main feature(s) of interest in the dataset?

The main features of interest in our dataset are the app details like price, and user rating.

Other features in the dataset I think will help support my
investigation into the feature(s) of interest?

Lanugage Supported and size may be useful in future state to determine the relation between ratings.

All currencies in our dataset are USD. Which means no need to check any other currency for this oarticular dataset.

What about creating new variables in the dataset?

I created app_size by mb to better understand the size of the app and how it relects to the user ratings. I also created a variable of “Is free?” to better understand if the app is paid or free will reflect into the user ratings? I created a new variable is_game .. to cateorize application based if it’s a general app or a game.

I subset the data to exclude data for 0 number ratings. It’s not fair to calculate zero ratings in our investigations.

GGally: Let’s have an overview of all variables together, that should give us a quick look to the relation between each variable and the rest of them.

ggcorrplot: A quick look to correlation between our numeric columns.

Bivariate Plots Section

Is there a relation between the price of an app and it’s mean user rating? Let’s deep onto our data set to answer this question ^^

In the above plot, I included prices less than 50 to exclude some few applications their prices over $100. As we see there are some outer points in general apps for 4-5 ratings, which fairly indicates that user rating/quality for a general app may be the reason for increasing the price. This rarely could be happen (4% acuurancy).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   1.000   5.435   8.000  75.000

Yes, number of lanuages supported by an app may affect on it’s total rating. We will check whether it affects on the price or not in a further analysis.

75% of our data set have 8 language supported or less. Most of them have only one language supported.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   37.00   37.00   37.36   38.00   47.00
## [1] -0.04245146

The minumum number of supported devices is 9. Median is 37

75% of our data set have 39 language supported or less. Most of them have 37 devices supported.

Correlation is negative 4/100.

##  12+  17+   4+   9+ 
## 1155  622 4433  987

All content based apps are nearly have 3 in user rating. There are some slight diffferences but not that much to decide that the content rating may affect on the user ratings!

Q: Which category has the most highgest rating?

However,

We see that books in paid apps have high mean rating, however very less in free apps. The same in Catalogs.

Q: Which category has the highgest total number of ratings?

WoW! The magic begins to appear! Users don’t give feedback or rating to paid apps. However they do in free apps!

Let’s check if price may affect on app category?

Someting strange, Medical category are the most expensive ones! Yes the category field can affect on the price of the app!! Shopping apps have very less price than other apps

Correlation between current version user rating and total overall user rating is 0.7 which leads to a strong positive correlation between them as the above plot shows.

yes! When app size increases, average user rating for the app increases as well.

Bivariate Analysis

What are the relationships I observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

  • When we look to the relation between price and the total user ratings for an app, we find that users rarely give high rating to expensive apps specially if it’s a game. Most af apps ratings are for the apps that are less than USD 50.

  • Most of the apps in this dataset have prices less tan $5. Let’s have a closer look to prices based on content ratings in further analysis.

  • We see that books in paid apps have high mean rating, However very less in free apps. The opposite in Catalogs. That indicates that paid books apps worth the money paid. They have good rating in paid apps rather than free ones.

  • Medical app category is the most expensive ones! Yes the category field can affect on the price of the app!!
  • On the other side shopping and finance apps have less price than other apps.

  • Does the average user ratings may depend on the app content? NO! All content based apps are nearly have 3 in user rating. There are some slight diffferences but not that much to decide that the content rating may affect on the user ratings!

Some observation in the relationships between the other features
(not the main feature(s) of interest):

  • Number of supported languages not strongly affect on the average user rating. The smooth line seems to be at the same level of usee rating (4).

  • It seems that the total rating always increases with new versions!

  • What is the relation between app size and price or user_rating? App size clearly does not affect on the price of the app. Games are more stable in price even if the app size being increased.

  • Is the the same regarding the relation between user rating and app size? NO! Average user rating increases while app size increases! It seems users give good rating for large sized apps!

What was the strongest relationship you found?

Relation between user rating for the current version and total average of user rating was the strongest relation among what we have found so far.

Multivariate Plots Section

## [1] "Summary of total number of ratings /n"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    -Inf   1.447   2.477    -Inf   3.446   6.473

Utilities are expensive apps which have low user ratings.

## [1] "Correlation between user rating and total number of ratings"
## [1] 0.08330997

Multivariate Analysis

Low prices apps are not good enough in general, the user rating increases in
most of category when the price increases; specially in Navigation, Education and References apps.


Final Plots and Summary

Plot One

Description One

  • Medical app category is the most expensive ones, may be because they provide valuable information?
  • The category field can affect on the price of the app!! Shopping and finance apps have less price than other apps. Make sense because they provide services and users already pay to use the service.

Plot Two

Description Two

Newer versions of most of the apps have better rating than the median rating. Developers always try to publish a better app that worth a better rating always. Correlation is 0.7 positive between twose two variables.

Plot Three

Description Three

  • What is the relation between user rating, price and app size? Average user rating increases with good reviews while app size increases! It seems users give good rating for large sized apps!

  • App size clearly does not affect on the price of the app. Games are more stable in price even if the app size being increased.


Reflection

Our dataset has more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. The main goal for investigating this data set is to predict if the app details
(i.e: price, content rating and size) affect on the average user rating for it?

  1. Productivity and Music have the highest average rating in free apps.
  2. Cataloges and Shopping have the highest average rating in paid apps.
  3. Books in paid apps have high mean rating, however very less in free apps. The same in Catalogs. Books paid app worth the money.
  1. Facebook (General App)
  2. Instagram (General App)
  3. Clash of Clans (Game)
  1. Domino’s Pizza USA (General App)
  2. Flashlight (General App)
  3. Head Soccer (Game)
  4. Plants vs. Zombies (Game)

For Future work: Would it possible to predict success of an app by creating a model? We may suppose that app which has more than 4 user_rating is succeful. We may also think for a better approach to calculate the ration of a user rating based on total number of ratings count.