Main feature(s) of interest in the dataset?
The main features of interest in our dataset are the app details like price, and user rating.
========================================================
In this document we are going to explore the top trending apps in iOS app store. The dataset has more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website.
## [1] 7197 17
## 'data.frame': 7197 obs. of 17 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ id : int 281656475 281796108 281940292 282614216 282935706 283619399 283646709 284035177 284666222 284736660 ...
## $ track_name : Factor w/ 7195 levels "-The ç©´é\200šã\201â3D- Ã¥\220âºÃ£\201®è¨\230æâ ¶åŠâºxÃ¥\217\215å°â神çµÅãââÃ¥â¢\217ã\201â ! ~Mr.CURVEã\201â¹Ã£ââ°Ã£\201®æÅâæ\210¦çж ~",..: 4747 2617 6946 2479 1169 5572 4820 4766 4823 4357 ...
## $ size_bytes : num 1.01e+08 1.59e+08 1.01e+08 1.29e+08 9.28e+07 ...
## $ currency : Factor w/ 1 level "USD": 1 1 1 1 1 1 1 1 1 1 ...
## $ price : num 3.99 0 0 0 0 0.99 0 0 9.99 3.99 ...
## $ rating_count_tot: int 21292 161065 188583 262241 985920 8253 119487 1126879 1117 7885 ...
## $ rating_count_ver: int 26 26 2822 649 5320 5516 879 3594 4 40 ...
## $ user_rating : num 4 4 3.5 4 4.5 4 4 4 4.5 4 ...
## $ user_rating_ver : num 4.5 3.5 4.5 4.5 5 4 4.5 4.5 5 4 ...
## $ ver : Factor w/ 1590 levels "0.0.15","0.13",..: 1380 1515 1211 1237 1473 455 1360 1523 1009 1068 ...
## $ cont_rating : Factor w/ 4 levels "12+","17+","4+",..: 3 3 3 1 3 3 3 1 3 3 ...
## $ prime_genre : Factor w/ 23 levels "Book","Business",..: 8 16 23 18 17 8 6 12 22 8 ...
## $ sup_devices.num : int 38 37 37 37 37 47 37 37 37 38 ...
## $ ipadSc_urls.num : int 5 5 5 5 5 5 0 4 5 0 ...
## $ lang.num : int 10 23 3 9 45 1 19 1 1 10 ...
## $ vpp_lic : int 1 1 1 1 1 1 1 1 1 1 ...
From the above histogram, we see that the top app category in our data is Games. We can investigate better by deviding categories; Games, General App’)
Let’s check a quick summary about free and paid apps
## Mode FALSE TRUE
## logical 3141 4056
Will be a good option to add a new feature to check if it’s paid or a free app.
Most of our data focus on applications that are for +4 ages. That makes a good sense since we already have most of the data category are games which are +4 age
We excluded 0 rating from our apps to better investigate the dataset. Most of apps are rated 4.5 and very few are rated as average 5.
## [1] "Top 10 Games based on user rating and total number of ratings"
## track_name rating_count_tot
## 1105 Head Soccer 481564
## 303 Plants vs. Zombies 426463
## 3087 Sniper 3D Assassin: Shoot to Kill Gun Game 386521
## 2178 Geometry Dash Lite 370370
## 499 Infinity Blade 326482
## 1878 Geometry Dash 266440
## 2732 CSR Racing 2 257100
## 1803 Pictoword: Fun 2 Pics Guess What's the Word Trivia 186089
## 351 Plants vs. Zombies HD 163598
## 1485 The Room 143908
## [1] "Top 10 General apps based on user rating and total number of ratings"
## track_name rating_count_tot
## 811 Domino's Pizza USA 258624
## 468 Flashlight âââ 130450
## 885 Pic Collage - Picture Editor & Photo Collage Maker 123433
## 546 Zappos: shop shoes & clothes, fast free shipping 103655
## 1287 Credit Karma: Free Credit Scores, Reports & Alerts 101679
## 1412 We Heart It - Fashion, wallpapers, quotes, tattoos 90414
## 3415 Google Photos - unlimited photo and video storage 88742
## 4208 Color Therapy Adult Coloring Book for Adults 84062
## 2656 Elevate - Brain Training and Games 58092
## 927 FotoRus -Camera & Photo Editor & Pic Collage Maker 32558
## prime_genre
## 811 Food & Drink
## 468 Utilities
## 885 Photo & Video
## 546 Shopping
## 1287 Finance
## 1412 Social Networking
## 3415 Photo & Video
## 4208 Book
## 2656 Education
## 927 Photo & Video
## [1] "Top 10 apps based on total count of user rating"
## track_name rating_count_tot
## 17 Facebook 2974676
## 520 Instagram 2161558
## 1347 Clash of Clans 2130805
## 708 Temple Run 1724546
## 8 Pandora - Music & Radio 1126879
## 756 Pinterest 1061624
## 5 Bible 985920
## 1494 Candy Crush Saga 961794
## 179 Spotify Music 878563
## 276 Angry Birds 824451
There are 7197 observation about apps in the US iOS app store in our dataset with 16 features. Categorical Variables Are: prime_genre, currency, user_rating, cont_rating, is_game. Numerical Variables Are: price, rating_count_tot, size_bytes, size_mb, user_rating_ver, ver, rating_count_ver.
The primary category for most of the observation are about Games, the rest are for general apps.
Price can be devided into Free or Paid.
The main features of interest in our dataset are the app details like price, and user rating.
Lanugage Supported and size may be useful in future state to determine the relation between ratings.
All currencies in our dataset are USD. Which means no need to check any other currency for this oarticular dataset.
I created app_size by mb to better understand the size of the app and how it relects to the user ratings. I also created a variable of “Is free?” to better understand if the app is paid or free will reflect into the user ratings? I created a new variable is_game .. to cateorize application based if it’s a general app or a game.
I subset the data to exclude data for 0 number ratings. It’s not fair to calculate zero ratings in our investigations.
GGally: Let’s have an overview of all variables together, that should give us a quick look to the relation between each variable and the rest of them.
ggcorrplot: A quick look to correlation between our numeric columns.
Is there a relation between the price of an app and it’s mean user rating? Let’s deep onto our data set to answer this question ^^
In the above plot, I included prices less than 50 to exclude some few applications their prices over $100. As we see there are some outer points in general apps for 4-5 ratings, which fairly indicates that user rating/quality for a general app may be the reason for increasing the price. This rarely could be happen (4% acuurancy).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 1.000 5.435 8.000 75.000
Yes, number of lanuages supported by an app may affect on it’s total rating. We will check whether it affects on the price or not in a further analysis.
75% of our data set have 8 language supported or less. Most of them have only one language supported.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 37.00 37.00 37.36 38.00 47.00
## [1] -0.04245146
The minumum number of supported devices is 9. Median is 37
75% of our data set have 39 language supported or less. Most of them have 37 devices supported.
Correlation is negative 4/100.
## 12+ 17+ 4+ 9+
## 1155 622 4433 987
All content based apps are nearly have 3 in user rating. There are some slight diffferences but not that much to decide that the content rating may affect on the user ratings!
Q: Which category has the most highgest rating?
However,
We see that books in paid apps have high mean rating, however very less in free apps. The same in Catalogs.
Q: Which category has the highgest total number of ratings?
WoW! The magic begins to appear! Users don’t give feedback or rating to paid apps. However they do in free apps!
Let’s check if price may affect on app category?
Someting strange, Medical category are the most expensive ones! Yes the category field can affect on the price of the app!! Shopping apps have very less price than other apps
Correlation between current version user rating and total overall user rating is 0.7 which leads to a strong positive correlation between them as the above plot shows.
yes! When app size increases, average user rating for the app increases as well.
When we look to the relation between price and the total user ratings for an app, we find that users rarely give high rating to expensive apps specially if it’s a game. Most af apps ratings are for the apps that are less than USD 50.
Most of the apps in this dataset have prices less tan $5. Let’s have a closer look to prices based on content ratings in further analysis.
We see that books in paid apps have high mean rating, However very less in free apps. The opposite in Catalogs. That indicates that paid books apps worth the money paid. They have good rating in paid apps rather than free ones.
On the other side shopping and finance apps have less price than other apps.
Does the average user ratings may depend on the app content? NO! All content based apps are nearly have 3 in user rating. There are some slight diffferences but not that much to decide that the content rating may affect on the user ratings!
Number of supported languages not strongly affect on the average user rating. The smooth line seems to be at the same level of usee rating (4).
It seems that the total rating always increases with new versions!
What is the relation between app size and price or user_rating? App size clearly does not affect on the price of the app. Games are more stable in price even if the app size being increased.
Is the the same regarding the relation between user rating and app size? NO! Average user rating increases while app size increases! It seems users give good rating for large sized apps!
Relation between user rating for the current version and total average of user rating was the strongest relation among what we have found so far.
## [1] "Summary of total number of ratings /n"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -Inf 1.447 2.477 -Inf 3.446 6.473
Utilities are expensive apps which have low user ratings.
## [1] "Correlation between user rating and total number of ratings"
## [1] 0.08330997
Low prices apps are not good enough in general, the user rating increases in
most of category when the price increases; specially in Navigation, Education and References apps.
Low prices of Navigation apps are not good. Moderate expensive music apps have good rating though.
Users love to give good rating when they notice the app already has a good rating. That’s why while total number of rating increases, the average of user_rating increases too.
Newer versions of most of the apps have better rating than the median rating. Developers always try to publish a better app that worth a better rating always. Correlation is 0.7 positive between twose two variables.
What is the relation between user rating, price and app size? Average user rating increases with good reviews while app size increases! It seems users give good rating for large sized apps!
App size clearly does not affect on the price of the app. Games are more stable in price even if the app size being increased.
Our dataset has more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. The main goal for investigating this data set is to predict if the app details
(i.e: price, content rating and size) affect on the average user rating for it?
Categories based on price? Medical category are the most expensive ones, the category field can affect on the price of the app
For Future work: Would it possible to predict success of an app by creating a model? We may suppose that app which has more than 4 user_rating is succeful. We may also think for a better approach to calculate the ration of a user rating based on total number of ratings count.