setwd("/Users/mariaseo/Desktop/MSQM/R/mydata")
library(readr)
apps <- read_csv("/Users/mariaseo/Desktop/MSQM/R/mydata/apps.csv")
## New names:
## Rows: 9659 Columns: 14
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (10): App, Category, Installs, Type, Price, Content Rating, Genres, Last... dbl
## (4): ...1, Rating, Reviews, Size
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
#create subset with app categories of interest
apps_subset <- subset(apps, Category == "BUSINESS" |
Category == "FINANCE" |
Category == "DATING" |
Category == "EDUCATION" |
Category == "EVENTS" |
Category == "SOCIAL" |
Category == "FOOD_AND_DRINK" |
Category == "GAME")
#summary
summary(apps_subset)
## ...1 App Category Rating
## Min. : 187 Length:2429 Length:2429 Min. :1.000
## 1st Qu.: 1667 Class :character Class :character 1st Qu.:4.000
## Median : 5640 Mode :character Mode :character Median :4.300
## Mean : 5180 Mean :4.199
## 3rd Qu.: 8129 3rd Qu.:4.500
## Max. :10835 Max. :5.000
## NA's :358
## Reviews Size Installs Type
## Min. : 0 Min. : 0.00 Length:2429 Length:2429
## 1st Qu.: 37 1st Qu.: 6.70 Class :character Class :character
## Median : 2555 Median : 19.00 Mode :character Mode :character
## Mean : 368980 Mean : 26.65
## 3rd Qu.: 55256 3rd Qu.: 39.00
## Max. :78158306 Max. :100.00
## NA's :299
## Price Content Rating Genres Last Updated
## Length:2429 Length:2429 Length:2429 Length:2429
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Current Ver Android Ver
## Length:2429 Length:2429
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
#consolidate categories
apps_subset$Category[apps_subset$Category == "BUSINESS"] <- "Business & Finance"
apps_subset$Category[apps_subset$Category == "FINANCE"] <- "Business & Finance"
apps_subset$Category[apps_subset$Category == "DATING"] <- "Dating"
apps_subset$Category[apps_subset$Category == "EDUCATION"] <- "Education"
apps_subset$Category[apps_subset$Category == "EVENTS"] <- "Events & Social"
apps_subset$Category[apps_subset$Category == "SOCIAL"] <- "Events & Social"
apps_subset$Category[apps_subset$Category == "FOOD_AND_DRINK"] <- "Food"
apps_subset$Category[apps_subset$Category == "GAME"] <- "Game"
#generate bar chart
library(ggplot2)
app_rating <- ggplot(apps_subset, aes(x=Category, y=Rating)) +
geom_bar(fun="mean", stat="summary", fill="#d6d6d6") +
theme(panel.background = element_blank(), axis.line = element_line(colour = "black"))
app_rating
## Warning: Removed 358 rows containing non-finite values (`stat_summary()`).
Although there isn’t a significant difference, education apps received higher reviews than others. Games and event & social apps also received high ratings. On the other hand, dating apps had the lowest ratings, although the difference wasn’t substantial.
#generate bar chart
options(repr.plot.width =9, repr.plot.height =9)
app_reviews <- ggplot(apps_subset, aes(x=Category, y=Reviews)) +
geom_bar(fun="mean", stat="summary", fill="#d6d6d6") +
theme(panel.background = element_blank(), axis.line = element_line(colour = "black"))
app_reviews
Event & social apps received the highest number of reviews, followed by game apps, suggesting there are the most popular. Dating apps received the lowest number of reviews.
#generate bar chart
app_download <- ggplot(apps_subset, aes(x=Category, y=Installs)) +
geom_bar(fun="mean", stat="summary", fill="#d6d6d6") +
theme(panel.background = element_blank(), axis.line = element_line(colour = "black"))
app_download
Event & social apps and game apps were downloaded the most, further suggesting that they are the most popular. Education apps are the least downloaded.
Many people downloaded event & social and game apps, giving them high ratings and writing the most reviews. It’s interesting that business & finance apps had a relatively high number of downloads but received fewer reviews compared to other apps.