knitr::opts_chunk$set(echo = TRUE)
###P1 Realice todo el trabajo de datos de la Actividad 2 hasta la pregunta 3 (incluyendola). Además cargue el paquete ggplot2 que se utilizará en esta actividad. Utilice la base de datos A3.csv
1.1
library(data.table)
app<-fread("A3.csv", fill= T)
1.2
class(app)
## [1] "data.table" "data.frame"
head(app)
## app category
## 1: Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN
## 2: Coloring book moana ART_AND_DESIGN
## 3: U Launcher Lite âÂ\200“ FREE Live Cool Themes, Hide Apps ART_AND_DESIGN
## 4: Sketch - Draw & Paint ART_AND_DESIGN
## 5: Pixel Draw - Number Art Coloring Book ART_AND_DESIGN
## 6: Paper flowers instructions ART_AND_DESIGN
## rating reviews size installs type price contentrating
## 1: 4.1 159 19M 10,000+ Free 0 Everyone
## 2: 3.9 967 14M 500,000+ Free 0 Everyone
## 3: 4.7 87510 8.7M 5,000,000+ Free 0 Everyone
## 4: 4.5 215644 25M 50,000,000+ Free 0 Teen
## 5: 4.3 967 2.8M 100,000+ Free 0 Everyone
## 6: 4.4 167 5.6M 50,000+ Free 0 Everyone
## genres lastupdated currentver
## 1: Art & Design January 7, 2018 1.0.0
## 2: Art & Design;Pretend Play January 15, 2018 2.0.0
## 3: Art & Design August 1, 2018 1.2.4
## 4: Art & Design June 8, 2018 Varies with device
## 5: Art & Design;Creativity June 20, 2018 1.1
## 6: Art & Design March 26, 2017 1.0
## androidver
## 1: 4.0.3 and up
## 2: 4.0.3 and up
## 3: 4.0.3 and up
## 4: 4.2 and up
## 5: 4.4 and up
## 6: 2.3 and up
names(app)
## [1] "app" "category" "rating" "reviews"
## [5] "size" "installs" "type" "price"
## [9] "contentrating" "genres" "lastupdated" "currentver"
## [13] "androidver"
app<-app[!duplicated(app)]
1.3
names(app)
## [1] "app" "category" "rating" "reviews"
## [5] "size" "installs" "type" "price"
## [9] "contentrating" "genres" "lastupdated" "currentver"
## [13] "androidver"
app1<-app[,.(app,category,rating,reviews,installs,type,price,`contentrating`)]
library(ggplot2)
###P2 Cree un gráfico de barra con ggplot que contenga el conteo de cada categoría de la base de datos que ha generado en la pregunta 1. Pista: Es normal la saturación del gráfico por la cantidad de categorías.
categorias<-table(app1[,category])
categorias<-app1[,.N,by=category]
app1[,.N,by=category]
## category N
## 1: ART_AND_DESIGN 61
## 2: AUTO_AND_VEHICLES 73
## 3: BEAUTY 42
## 4: BOOKS_AND_REFERENCE 169
## 5: BUSINESS 263
## 6: COMICS 54
## 7: COMMUNICATION 256
## 8: DATING 134
## 9: EDUCATION 118
## 10: ENTERTAINMENT 102
## 11: EVENTS 45
## 12: FINANCE 302
## 13: FOOD_AND_DRINK 94
## 14: HEALTH_AND_FITNESS 244
## 15: HOUSE_AND_HOME 62
## 16: LIBRARIES_AND_DEMO 64
## 17: LIFESTYLE 301
## 18: GAME 912
## 19: FAMILY 1608
## 20: MEDICAL 290
## 21: SOCIAL 203
## 22: SHOPPING 180
## 23: PHOTOGRAPHY 263
## 24: SPORTS 260
## 25: TRAVEL_AND_LOCAL 187
## 26: TOOLS 718
## 27: PERSONALIZATION 298
## 28: PRODUCTIVITY 301
## 29: PARENTING 50
## 30: WEATHER 72
## 31: VIDEO_PLAYERS 148
## 32: NEWS_AND_MAGAZINES 204
## 33: MAPS_AND_NAVIGATION 118
## category N
ggplot(data=categorias,aes(x=category , weights=N)) + geom_bar()
###P3 Cree un scatter-plot con ggplot que muestre la relación entre el número de comentarios (reviews) (eje x) y el rating (eje y). Pista: Revisar la pista de la pregunta 6 de la Actividad 2.
class(app1[,rating])
## [1] "numeric"
ggplot(data=app1,aes( x=reviews ,y = rating)) + geom_point()
P3<-ggplot(data=app1,aes( x=reviews ,y = rating))+ geom_point()
###P4 realice un histograma doble con ggplot que muestre la distribución del precio (price) de las aplicaciones para las categorías SOCIAL y PHOTOGRAPHY. Es decir, un histograma para cada categoría pero en un mismo gráfico como muestra el diagrama de ejemplo. Pista: Revisa la pregunta 5 de la actividad 2 y recuerda la función facet_wrap() vista en el último taller en clases.
app_final<-app1[category%in% c("SOCIAL","PHOTOGRAPHY")]
ggplot(data=app_final , aes(x=rating ,weights= price , fill=category)) + geom_histogram() + facet_wrap(~category)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.