“id” : App ID
“track_name”: App Name
“size_bytes”: Size (in Bytes)
“currency”: Currency Type
“price”: Price amount
“ratingcounttot”: User Rating counts (for all version)
“ratingcountver”: User Rating counts (for current version)
“user_rating” : Average User Rating value (for all version)
“userratingver”: Average User Rating value (for current version)
“ver” : Latest version code
“cont_rating”: Content Rating 適合哪個年齡層使用 有4個level
分別是 4+,9+,12+,17+
“prime_genre”: Primary Genre
“sup_devices.num”: Number of supporting devices
“ipadSc_urls.num”: Number of screenshots showed for display “可以視為功能的展現”
“lang.num”: Number of supported languages
“vpp_lic”: Vpp Device Based Licensing Enabled
備註: Apple批量購買計劃(VPP)是一項服務,允許已註冊Apple VPP的組織批量購買iOS應用,但不能以折扣價購買。主要應該是用於企業的大量購買,此變數
其中id ,app name 都沒包含有用的資訊,而所有app都使用美金計價,ver(版本)部分由於各家版本號過於凌亂所以也予以刪除, 是故我只留下12個變數,其中又只有prime_genre,vpp_lic,cont_rating和是屬於類別型變數,其他變數都是連續型
然後新增一個虛擬變數為付費與否 另外由於bytes並非常用的單位,是故把她轉換成MB
非常可惜的是並沒有公布APP的下載量
charge<- as.factor(ifelse( ios$price>0,"paid","free" ))
ios[,13] <- charge
ios<- ios %>% rename(charge=V13)
ios$cont_rating <- as.factor(ios$cont_rating )
ios$prime_genre <- as.factor(ios$prime_genre )
ios$size_MB <- ios$size_bytes/1000000
ios <- ios[,-1]
1.哪些變數會影響APP的評分?
2.付費軟體的評分有比較好嗎?
3.大部分的APP的定價趨勢為何?
先使用簡單的線性回歸來看
m1 <- lm(user_rating ~. ,ios)
summary(m1)
##
## Call:
## lm(formula = user_rating ~ ., data = ios)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2596 -0.4391 0.0409 0.2901 3.9984
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.7668902987 0.2108192745 3.638
## price 0.0049929616 0.0020943212 2.384
## rating_count_tot 0.0000002440 0.0000001528 1.596
## rating_count_ver 0.0000031042 0.0000029083 1.067
## user_rating_ver 0.6275645946 0.0067876684 92.457
## cont_rating17+ -0.0798582462 0.0491942711 -1.623
## cont_rating4+ -0.0712391264 0.0328462937 -2.169
## cont_rating9+ -0.0563561783 0.0420943357 -1.339
## prime_genreBusiness 0.4109398427 0.1557158530 2.639
## prime_genreCatalogs -0.3681081682 0.3135893167 -1.174
## prime_genreEducation 0.2520276155 0.1015278675 2.482
## prime_genreEntertainment 0.2586697509 0.0989807420 2.613
## prime_genreFinance 0.3419748904 0.1298366861 2.634
## prime_genreFood & Drink 0.5565562638 0.1498499921 3.714
## prime_genreGames 0.2327540903 0.0919633153 2.531
## prime_genreHealth & Fitness 0.4215205607 0.1152081397 3.659
## prime_genreLifestyle 0.3167339549 0.1201637200 2.636
## prime_genreMedical 0.0772279290 0.2184458334 0.354
## prime_genreMusic 0.3328295577 0.1218829074 2.731
## prime_genreNavigation 0.1715624129 0.1668457855 1.028
## prime_genreNews 0.4599554260 0.1423885659 3.230
## prime_genrePhoto & Video 0.4173457055 0.1042918353 4.002
## prime_genreProductivity 0.4019609813 0.1161857042 3.460
## prime_genreReference 0.2527691465 0.1493553744 1.692
## prime_genreShopping 0.8014607497 0.1249376249 6.415
## prime_genreSocial Networking 0.3144874388 0.1176587121 2.673
## prime_genreSports 0.2520433566 0.1267822200 1.988
## prime_genreTravel 0.4681812351 0.1390352061 3.367
## prime_genreUtilities 0.3134406837 0.1087195208 2.883
## prime_genreWeather 0.3777386394 0.1441724645 2.620
## sup_devices.num -0.0063791234 0.0031498794 -2.025
## ipadSc_urls.num 0.0516021847 0.0064638920 7.983
## lang.num 0.0049373377 0.0015058672 3.279
## vpp_lic 0.4884489768 0.1381592055 3.535
## chargepaid 0.0545156332 0.0253416099 2.151
## size_MB -0.0000210260 0.0000348257 -0.604
## Pr(>|t|)
## (Intercept) 0.000277 ***
## price 0.017149 *
## rating_count_tot 0.110436
## rating_count_ver 0.285844
## user_rating_ver < 0.0000000000000002 ***
## cont_rating17+ 0.104564
## cont_rating4+ 0.030126 *
## cont_rating9+ 0.180676
## prime_genreBusiness 0.008332 **
## prime_genreCatalogs 0.240492
## prime_genreEducation 0.013075 *
## prime_genreEntertainment 0.008985 **
## prime_genreFinance 0.008460 **
## prime_genreFood & Drink 0.000205 ***
## prime_genreGames 0.011397 *
## prime_genreHealth & Fitness 0.000255 ***
## prime_genreLifestyle 0.008411 **
## prime_genreMedical 0.723699
## prime_genreMusic 0.006335 **
## prime_genreNavigation 0.303858
## prime_genreNews 0.001242 **
## prime_genrePhoto & Video 0.00006352040437397 ***
## prime_genreProductivity 0.000544 ***
## prime_genreReference 0.090613 .
## prime_genreShopping 0.00000000014990069 ***
## prime_genreSocial Networking 0.007537 **
## prime_genreSports 0.046849 *
## prime_genreTravel 0.000763 ***
## prime_genreUtilities 0.003951 **
## prime_genreWeather 0.008810 **
## sup_devices.num 0.042884 *
## ipadSc_urls.num 0.00000000000000165 ***
## lang.num 0.001048 **
## vpp_lic 0.000410 ***
## chargepaid 0.031491 *
## size_MB 0.546029
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9497 on 7161 degrees of freedom
## Multiple R-squared: 0.6104, Adjusted R-squared: 0.6085
## F-statistic: 320.6 on 35 and 7161 DF, p-value: < 0.00000000000000022
qplot(user_rating, data = ios, geom = "density",
fill = charge, alpha = I(.5),
main="Distribution of App rating",
xlab="Rating",
ylab="Density")
mean(ios$user_rating)
## [1] 3.526956
mean(ios$user_rating[which(ios$V13=="paid" )])
## [1] NaN
mean(ios$user_rating[which(ios$V13=="free" )])
## [1] NaN
所有APP的平均評分為3.526956,付費APP的評分為3.720949,免費APP為3.376726
# Compute the analysis of variance
res.aov <- aov(user_rating ~ charge, data = ios)
# Summary of the analysis
summary(res.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## charge 1 210 209.75 92.18 <0.0000000000000002 ***
## Residuals 7195 16371 2.28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
one_way anova table也告訴我們,如果評分代表著APP的品質的話,那麼付費APP確實在統計上品質顯著大於免費APP
sum(is.na(ios$price))
## [1] 0
#there is no NA in price
#we draw the ecdf of this data
plot(ecdf(ios$price ))
object<- table(ios$price )
barplot(log(object))
#plot(sort(unique(applestore$price)) ,log(object) )
#log(table(applestore$price ))
#qplot(price,data=applestore,geom="histogram" )
#qplot(price,data=applestore,geom="histogram",log = "y")
#plot(applestore$price, log="y", type='histogram')
APP的訂價顯然是免費居多,而且訂價有指數分布的趨勢存在