一 變數介紹:

“id” : App ID

“track_name”: App Name

“size_bytes”: Size (in Bytes)

“currency”: Currency Type

“price”: Price amount

“ratingcounttot”: User Rating counts (for all version)

“ratingcountver”: User Rating counts (for current version)

“user_rating” : Average User Rating value (for all version)

“userratingver”: Average User Rating value (for current version)

“ver” : Latest version code

“cont_rating”: Content Rating 適合哪個年齡層使用 有4個level

分別是 4+,9+,12+,17+

“prime_genre”: Primary Genre

“sup_devices.num”: Number of supporting devices

“ipadSc_urls.num”: Number of screenshots showed for display “可以視為功能的展現”

“lang.num”: Number of supported languages

“vpp_lic”: Vpp Device Based Licensing Enabled

備註: Apple批量購買計劃(VPP)是一項服務,允許已註冊Apple VPP的組織批量購買iOS應用,但不能以折扣價購買。主要應該是用於企業的大量購買,此變數

二 變數處理方式

其中id ,app name 都沒包含有用的資訊,而所有app都使用美金計價,ver(版本)部分由於各家版本號過於凌亂所以也予以刪除, 是故我只留下12個變數,其中又只有prime_genre,vpp_lic,cont_rating和是屬於類別型變數,其他變數都是連續型

然後新增一個虛擬變數為付費與否 另外由於bytes並非常用的單位,是故把她轉換成MB

非常可惜的是並沒有公布APP的下載量

charge<- as.factor(ifelse( ios$price>0,"paid","free"  ))
ios[,13] <- charge

ios<- ios %>% rename(charge=V13)

ios$cont_rating <- as.factor(ios$cont_rating )
ios$prime_genre <- as.factor(ios$prime_genre  )
ios$size_MB <- ios$size_bytes/1000000
ios <- ios[,-1]

三 感興趣的問題

1.哪些變數會影響APP的評分?

2.付費軟體的評分有比較好嗎?

3.大部分的APP的定價趨勢為何?

1 哪些變數會影響APP的評分?

先使用簡單的線性回歸來看

m1 <- lm(user_rating   ~.     ,ios)
summary(m1)
## 
## Call:
## lm(formula = user_rating ~ ., data = ios)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2596 -0.4391  0.0409  0.2901  3.9984 
## 
## Coefficients:
##                                   Estimate    Std. Error t value
## (Intercept)                   0.7668902987  0.2108192745   3.638
## price                         0.0049929616  0.0020943212   2.384
## rating_count_tot              0.0000002440  0.0000001528   1.596
## rating_count_ver              0.0000031042  0.0000029083   1.067
## user_rating_ver               0.6275645946  0.0067876684  92.457
## cont_rating17+               -0.0798582462  0.0491942711  -1.623
## cont_rating4+                -0.0712391264  0.0328462937  -2.169
## cont_rating9+                -0.0563561783  0.0420943357  -1.339
## prime_genreBusiness           0.4109398427  0.1557158530   2.639
## prime_genreCatalogs          -0.3681081682  0.3135893167  -1.174
## prime_genreEducation          0.2520276155  0.1015278675   2.482
## prime_genreEntertainment      0.2586697509  0.0989807420   2.613
## prime_genreFinance            0.3419748904  0.1298366861   2.634
## prime_genreFood & Drink       0.5565562638  0.1498499921   3.714
## prime_genreGames              0.2327540903  0.0919633153   2.531
## prime_genreHealth & Fitness   0.4215205607  0.1152081397   3.659
## prime_genreLifestyle          0.3167339549  0.1201637200   2.636
## prime_genreMedical            0.0772279290  0.2184458334   0.354
## prime_genreMusic              0.3328295577  0.1218829074   2.731
## prime_genreNavigation         0.1715624129  0.1668457855   1.028
## prime_genreNews               0.4599554260  0.1423885659   3.230
## prime_genrePhoto & Video      0.4173457055  0.1042918353   4.002
## prime_genreProductivity       0.4019609813  0.1161857042   3.460
## prime_genreReference          0.2527691465  0.1493553744   1.692
## prime_genreShopping           0.8014607497  0.1249376249   6.415
## prime_genreSocial Networking  0.3144874388  0.1176587121   2.673
## prime_genreSports             0.2520433566  0.1267822200   1.988
## prime_genreTravel             0.4681812351  0.1390352061   3.367
## prime_genreUtilities          0.3134406837  0.1087195208   2.883
## prime_genreWeather            0.3777386394  0.1441724645   2.620
## sup_devices.num              -0.0063791234  0.0031498794  -2.025
## ipadSc_urls.num               0.0516021847  0.0064638920   7.983
## lang.num                      0.0049373377  0.0015058672   3.279
## vpp_lic                       0.4884489768  0.1381592055   3.535
## chargepaid                    0.0545156332  0.0253416099   2.151
## size_MB                      -0.0000210260  0.0000348257  -0.604
##                                          Pr(>|t|)    
## (Intercept)                              0.000277 ***
## price                                    0.017149 *  
## rating_count_tot                         0.110436    
## rating_count_ver                         0.285844    
## user_rating_ver              < 0.0000000000000002 ***
## cont_rating17+                           0.104564    
## cont_rating4+                            0.030126 *  
## cont_rating9+                            0.180676    
## prime_genreBusiness                      0.008332 ** 
## prime_genreCatalogs                      0.240492    
## prime_genreEducation                     0.013075 *  
## prime_genreEntertainment                 0.008985 ** 
## prime_genreFinance                       0.008460 ** 
## prime_genreFood & Drink                  0.000205 ***
## prime_genreGames                         0.011397 *  
## prime_genreHealth & Fitness              0.000255 ***
## prime_genreLifestyle                     0.008411 ** 
## prime_genreMedical                       0.723699    
## prime_genreMusic                         0.006335 ** 
## prime_genreNavigation                    0.303858    
## prime_genreNews                          0.001242 ** 
## prime_genrePhoto & Video      0.00006352040437397 ***
## prime_genreProductivity                  0.000544 ***
## prime_genreReference                     0.090613 .  
## prime_genreShopping           0.00000000014990069 ***
## prime_genreSocial Networking             0.007537 ** 
## prime_genreSports                        0.046849 *  
## prime_genreTravel                        0.000763 ***
## prime_genreUtilities                     0.003951 ** 
## prime_genreWeather                       0.008810 ** 
## sup_devices.num                          0.042884 *  
## ipadSc_urls.num               0.00000000000000165 ***
## lang.num                                 0.001048 ** 
## vpp_lic                                  0.000410 ***
## chargepaid                               0.031491 *  
## size_MB                                  0.546029    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9497 on 7161 degrees of freedom
## Multiple R-squared:  0.6104, Adjusted R-squared:  0.6085 
## F-statistic: 320.6 on 35 and 7161 DF,  p-value: < 0.00000000000000022

2 付費軟體的評分有比較好嗎?

qplot(user_rating, data = ios, geom = "density",
  fill = charge, alpha = I(.5),
  main="Distribution of App rating",
  xlab="Rating",
  ylab="Density")

mean(ios$user_rating)
## [1] 3.526956
mean(ios$user_rating[which(ios$V13=="paid"       )])
## [1] NaN
mean(ios$user_rating[which(ios$V13=="free"       )])
## [1] NaN

所有APP的平均評分為3.526956,付費APP的評分為3.720949,免費APP為3.376726

# Compute the analysis of variance
res.aov <- aov(user_rating ~ charge, data = ios)
# Summary of the analysis
summary(res.aov)
##               Df Sum Sq Mean Sq F value              Pr(>F)    
## charge         1    210  209.75   92.18 <0.0000000000000002 ***
## Residuals   7195  16371    2.28                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

one_way anova table也告訴我們,如果評分代表著APP的品質的話,那麼付費APP確實在統計上品質顯著大於免費APP

3大部分的APP的定價趨勢為何?

sum(is.na(ios$price))
## [1] 0
#there is no NA in price
#we draw the ecdf of this data

plot(ecdf(ios$price  ))

object<- table(ios$price  )
barplot(log(object))

#plot(sort(unique(applestore$price)) ,log(object)     )

#log(table(applestore$price  ))
#qplot(price,data=applestore,geom="histogram"     )

#qplot(price,data=applestore,geom="histogram",log = "y")

#plot(applestore$price, log="y", type='histogram')

APP的訂價顯然是免費居多,而且訂價有指數分布的趨勢存在