Cargue la base de datos Act3.csv y los paquetes que son necesarios para trabajar un data.table.
library(data.table)
Act3<-fread("Act3.csv", fill = T)
Revise la base de datos y elimine las aplicaciones que se encuentran duplicadas.
duplicated(Act3)
Act3<-Act3[!duplicated(App)]
Cree un nuevo objeto que contenga las siguientes variables de interés: App,Category,Rating,Reviews,Installs,Type,Price y Content Rating.
Nuevo<-Act3[, .N, by=c("App","Category","Rating","Reviews","Installs","Type","Price","Content Rating")]
Para el objeto anterior, elimine la categoría 1.9 de la variable Category
| Category | N |
|---|---|
| 1.9 | 1 |
Nuevo<-Nuevo[!Nuevo$Category==1.9,]
| Category | N |
|---|
Vemos que ya se eliminó.
Cree un objeto que este compuesto por el número de observaciones de cada categoría.
Numobs<-Nuevo[,.N,by=Category]
| Category | N |
|---|---|
| ART_AND_DESIGN | 64 |
| AUTO_AND_VEHICLES | 85 |
| BEAUTY | 53 |
| BOOKS_AND_REFERENCE | 222 |
| BUSINESS | 420 |
| COMICS | 56 |
| COMMUNICATION | 315 |
| DATING | 171 |
| EDUCATION | 119 |
| ENTERTAINMENT | 102 |
| EVENTS | 64 |
| FINANCE | 345 |
| FOOD_AND_DRINK | 112 |
| HEALTH_AND_FITNESS | 288 |
| HOUSE_AND_HOME | 74 |
| LIBRARIES_AND_DEMO | 84 |
| LIFESTYLE | 369 |
| GAME | 959 |
| FAMILY | 1832 |
| MEDICAL | 395 |
| SOCIAL | 239 |
| SHOPPING | 202 |
| PHOTOGRAPHY | 281 |
| SPORTS | 325 |
| TRAVEL_AND_LOCAL | 219 |
| TOOLS | 827 |
| PERSONALIZATION | 376 |
| PRODUCTIVITY | 374 |
| PARENTING | 60 |
| WEATHER | 79 |
| VIDEO_PLAYERS | 163 |
| NEWS_AND_MAGAZINES | 254 |
| MAPS_AND_NAVIGATION | 131 |
Reemplace el objeto anterior para que ahora contenga el número de observaciones, el promedio de comentarios (Reviews) y el promedio del Rating por categoría.
Promreviews<-Nuevo[,mean(as.numeric(Reviews)), by=Category]
Promrating<-Nuevo[,mean(as.numeric(Rating),na.rm=T), by=Category]
Numobs<-merge(x=Numobs, y=Promreviews, by="Category")
Numobs<-merge(x=Numobs, y=Promrating, by="Category")
| Category | N | V1.x | V1.y |
|---|---|---|---|
| ART_AND_DESIGN | 64 | 22175.047 | 4.357377 |
| AUTO_AND_VEHICLES | 85 | 13690.188 | 4.190411 |
| BEAUTY | 53 | 7476.226 | 4.278571 |
| BOOKS_AND_REFERENCE | 222 | 75321.234 | 4.344970 |
| BUSINESS | 420 | 23548.202 | 4.098479 |
| COMICS | 56 | 41822.696 | 4.181482 |
| COMMUNICATION | 315 | 907337.676 | 4.121484 |
| DATING | 171 | 21190.316 | 3.970149 |
| EDUCATION | 119 | 112303.765 | 4.364407 |
| ENTERTAINMENT | 102 | 340810.294 | 4.135294 |
| EVENTS | 64 | 2515.906 | 4.435556 |
| FAMILY | 1832 | 78507.362 | 4.179664 |
| FINANCE | 345 | 36701.757 | 4.115563 |
| FOOD_AND_DRINK | 112 | 56473.464 | 4.172340 |
| GAME | 959 | 648903.763 | 4.247368 |
| HEALTH_AND_FITNESS | 288 | 74171.372 | 4.243033 |
| HOUSE_AND_HOME | 74 | 26079.014 | 4.150000 |
| LIBRARIES_AND_DEMO | 84 | 10795.607 | 4.178125 |
| LIFESTYLE | 369 | 32066.859 | 4.093356 |
| MAPS_AND_NAVIGATION | 131 | 135337.008 | 4.036441 |
| MEDICAL | 395 | 2994.863 | 4.166552 |
| NEWS_AND_MAGAZINES | 254 | 91063.890 | 4.121569 |
| PARENTING | 60 | 15972.183 | 4.300000 |
| PERSONALIZATION | 376 | 142401.809 | 4.332215 |
| PHOTOGRAPHY | 281 | 374915.552 | 4.157414 |
| PRODUCTIVITY | 374 | 148638.099 | 4.183389 |
| SHOPPING | 202 | 220553.119 | 4.230000 |
| SOCIAL | 239 | 953672.808 | 4.247291 |
| SPORTS | 325 | 108765.578 | 4.216154 |
| TOOLS | 827 | 277335.644 | 4.039554 |
| TRAVEL_AND_LOCAL | 219 | 122464.571 | 4.069519 |
| VIDEO_PLAYERS | 163 | 414015.755 | 4.044595 |
| WEATHER | 79 | 155634.987 | 4.243056 |
Renombre las columnas del objeto de la pregunta 6. La primera columna debe ser Categoria, la segunda columna Numero_Observaciones, la tercera columa Promedio_Comentarios y la cuarta columna Promedio_Rating.
names(Numobs)=c("Categoria", "Numero_Observaciones", "Promedio_Comentarios", "Promedio_Rating")
| Categoria | Numero_Observaciones | Promedio_Comentarios | Promedio_Rating |
|---|---|---|---|
| ART_AND_DESIGN | 64 | 22175.047 | 4.357377 |
| AUTO_AND_VEHICLES | 85 | 13690.188 | 4.190411 |
| BEAUTY | 53 | 7476.226 | 4.278571 |
| BOOKS_AND_REFERENCE | 222 | 75321.234 | 4.344970 |
| BUSINESS | 420 | 23548.202 | 4.098479 |
| COMICS | 56 | 41822.696 | 4.181482 |
| COMMUNICATION | 315 | 907337.676 | 4.121484 |
| DATING | 171 | 21190.316 | 3.970149 |
| EDUCATION | 119 | 112303.765 | 4.364407 |
| ENTERTAINMENT | 102 | 340810.294 | 4.135294 |
| EVENTS | 64 | 2515.906 | 4.435556 |
| FAMILY | 1832 | 78507.362 | 4.179664 |
| FINANCE | 345 | 36701.757 | 4.115563 |
| FOOD_AND_DRINK | 112 | 56473.464 | 4.172340 |
| GAME | 959 | 648903.763 | 4.247368 |
| HEALTH_AND_FITNESS | 288 | 74171.372 | 4.243033 |
| HOUSE_AND_HOME | 74 | 26079.014 | 4.150000 |
| LIBRARIES_AND_DEMO | 84 | 10795.607 | 4.178125 |
| LIFESTYLE | 369 | 32066.859 | 4.093356 |
| MAPS_AND_NAVIGATION | 131 | 135337.008 | 4.036441 |
| MEDICAL | 395 | 2994.863 | 4.166552 |
| NEWS_AND_MAGAZINES | 254 | 91063.890 | 4.121569 |
| PARENTING | 60 | 15972.183 | 4.300000 |
| PERSONALIZATION | 376 | 142401.809 | 4.332215 |
| PHOTOGRAPHY | 281 | 374915.552 | 4.157414 |
| PRODUCTIVITY | 374 | 148638.099 | 4.183389 |
| SHOPPING | 202 | 220553.119 | 4.230000 |
| SOCIAL | 239 | 953672.808 | 4.247291 |
| SPORTS | 325 | 108765.578 | 4.216154 |
| TOOLS | 827 | 277335.644 | 4.039554 |
| TRAVEL_AND_LOCAL | 219 | 122464.571 | 4.069519 |
| VIDEO_PLAYERS | 163 | 414015.755 | 4.044595 |
| WEATHER | 79 | 155634.987 | 4.243056 |
Ahora queremos analizar el nivel de competencia que existe por categoría para identificar cómo se encuentra valorada cada categoría de aplicaciones. Para realizar esto, debe crear un objeto con el identificador de competencia para cada categoría.
El indicador esta definido como \(Icompetencia=(reviews*rating)/1000\)
Este identificador debe calcularse con el objeto de la Pregunta 4.
Icompet<-Nuevo[, mean((as.numeric(Reviews)*Rating/1000), na.rm=T), by=Category]
| Category | V1 |
|---|---|
| ART_AND_DESIGN | 104.11229 |
| AUTO_AND_VEHICLES | 71.53203 |
| BEAUTY | 40.79644 |
| BOOKS_AND_REFERENCE | 445.11041 |
| BUSINESS | 162.33082 |
| COMICS | 190.48859 |
| COMMUNICATION | 4782.78510 |
| DATING | 112.84285 |
| EDUCATION | 521.48102 |
| ENTERTAINMENT | 1460.01230 |
| EVENTS | 14.94825 |
| FINANCE | 183.75453 |
| FOOD_AND_DRINK | 294.91254 |
| HEALTH_AND_FITNESS | 400.01601 |
| HOUSE_AND_HOME | 137.40528 |
| LIBRARIES_AND_DEMO | 57.78913 |
| LIFESTYLE | 169.66879 |
| GAME | 3039.76130 |
| FAMILY | 391.18366 |
| MEDICAL | 18.74200 |
| SOCIAL | 4789.05880 |
| SHOPPING | 1102.56610 |
| PHOTOGRAPHY | 1772.95729 |
| SPORTS | 589.64451 |
| TRAVEL_AND_LOCAL | 624.60582 |
| TOOLS | 1449.45133 |
| PERSONALIZATION | 810.04649 |
| PRODUCTIVITY | 817.55453 |
| PARENTING | 88.10854 |
| WEATHER | 753.09723 |
| VIDEO_PLAYERS | 2008.25232 |
| NEWS_AND_MAGAZINES | 487.97032 |
| MAPS_AND_NAVIGATION | 660.85370 |
Renombrar las columnas del objeto de la Pregunta 8. La primera columna debe ser Categoria y la segunda columna Identificador.
names(Icompet)=c("Categoria", "Identificador")
| Categoria | Identificador |
|---|---|
| ART_AND_DESIGN | 104.11229 |
| AUTO_AND_VEHICLES | 71.53203 |
| BEAUTY | 40.79644 |
| BOOKS_AND_REFERENCE | 445.11041 |
| BUSINESS | 162.33082 |
| COMICS | 190.48859 |
| COMMUNICATION | 4782.78510 |
| DATING | 112.84285 |
| EDUCATION | 521.48102 |
| ENTERTAINMENT | 1460.01230 |
| EVENTS | 14.94825 |
| FINANCE | 183.75453 |
| FOOD_AND_DRINK | 294.91254 |
| HEALTH_AND_FITNESS | 400.01601 |
| HOUSE_AND_HOME | 137.40528 |
| LIBRARIES_AND_DEMO | 57.78913 |
| LIFESTYLE | 169.66879 |
| GAME | 3039.76130 |
| FAMILY | 391.18366 |
| MEDICAL | 18.74200 |
| SOCIAL | 4789.05880 |
| SHOPPING | 1102.56610 |
| PHOTOGRAPHY | 1772.95729 |
| SPORTS | 589.64451 |
| TRAVEL_AND_LOCAL | 624.60582 |
| TOOLS | 1449.45133 |
| PERSONALIZATION | 810.04649 |
| PRODUCTIVITY | 817.55453 |
| PARENTING | 88.10854 |
| WEATHER | 753.09723 |
| VIDEO_PLAYERS | 2008.25232 |
| NEWS_AND_MAGAZINES | 487.97032 |
| MAPS_AND_NAVIGATION | 660.85370 |
Reemplace el objeto de la Pregunta 7 realizando un merge entre el objeto de la Pregunta 7 y el objeto de la Pregunta 9.
Numobs<-merge(x=Numobs, y=Icompet, by="Categoria")
| Categoria | Numero_Observaciones | Promedio_Comentarios | Promedio_Rating | Identificador |
|---|---|---|---|---|
| ART_AND_DESIGN | 64 | 22175.047 | 4.357377 | 104.11229 |
| AUTO_AND_VEHICLES | 85 | 13690.188 | 4.190411 | 71.53203 |
| BEAUTY | 53 | 7476.226 | 4.278571 | 40.79644 |
| BOOKS_AND_REFERENCE | 222 | 75321.234 | 4.344970 | 445.11041 |
| BUSINESS | 420 | 23548.202 | 4.098479 | 162.33082 |
| COMICS | 56 | 41822.696 | 4.181482 | 190.48859 |
| COMMUNICATION | 315 | 907337.676 | 4.121484 | 4782.78510 |
| DATING | 171 | 21190.316 | 3.970149 | 112.84285 |
| EDUCATION | 119 | 112303.765 | 4.364407 | 521.48102 |
| ENTERTAINMENT | 102 | 340810.294 | 4.135294 | 1460.01230 |
| EVENTS | 64 | 2515.906 | 4.435556 | 14.94825 |
| FAMILY | 1832 | 78507.362 | 4.179664 | 391.18366 |
| FINANCE | 345 | 36701.757 | 4.115563 | 183.75453 |
| FOOD_AND_DRINK | 112 | 56473.464 | 4.172340 | 294.91254 |
| GAME | 959 | 648903.763 | 4.247368 | 3039.76130 |
| HEALTH_AND_FITNESS | 288 | 74171.372 | 4.243033 | 400.01601 |
| HOUSE_AND_HOME | 74 | 26079.014 | 4.150000 | 137.40528 |
| LIBRARIES_AND_DEMO | 84 | 10795.607 | 4.178125 | 57.78913 |
| LIFESTYLE | 369 | 32066.859 | 4.093356 | 169.66879 |
| MAPS_AND_NAVIGATION | 131 | 135337.008 | 4.036441 | 660.85370 |
| MEDICAL | 395 | 2994.863 | 4.166552 | 18.74200 |
| NEWS_AND_MAGAZINES | 254 | 91063.890 | 4.121569 | 487.97032 |
| PARENTING | 60 | 15972.183 | 4.300000 | 88.10854 |
| PERSONALIZATION | 376 | 142401.809 | 4.332215 | 810.04649 |
| PHOTOGRAPHY | 281 | 374915.552 | 4.157414 | 1772.95729 |
| PRODUCTIVITY | 374 | 148638.099 | 4.183389 | 817.55453 |
| SHOPPING | 202 | 220553.119 | 4.230000 | 1102.56610 |
| SOCIAL | 239 | 953672.808 | 4.247291 | 4789.05880 |
| SPORTS | 325 | 108765.578 | 4.216154 | 589.64451 |
| TOOLS | 827 | 277335.644 | 4.039554 | 1449.45133 |
| TRAVEL_AND_LOCAL | 219 | 122464.571 | 4.069519 | 624.60582 |
| VIDEO_PLAYERS | 163 | 414015.755 | 4.044595 | 2008.25232 |
| WEATHER | 79 | 155634.987 | 4.243056 | 753.09723 |
Cree un objeto, a partir del objeto de la pregunta anterior, que contenga información solo de las categrías que más se relacionan con el rubro de la empresa (SOCIAL y PHOTOGRAPHY).
SyP<-Numobs[Categoria=="SOCIAL" | Categoria=="PHOTOGRAPHY"]
| Categoria | Numero_Observaciones | Promedio_Comentarios | Promedio_Rating | Identificador |
|---|---|---|---|---|
| PHOTOGRAPHY | 281 | 374915.6 | 4.157414 | 1772.957 |
| SOCIAL | 239 | 953672.8 | 4.247291 | 4789.059 |
Realizar un gráfico que relacione dos variables que para usted sean relevantes relacionar. Para esto utilice el objeto de la Pregunta 3. Explique su gráfico.
Grafico<-Nuevo[Category=="SOCIAL" & Price==0 | Category=="PHOTOGRAPHY" & Price==0]
CR<-Grafico[,.N, by="Content Rating"]
RT<-Grafico[,mean(Rating, na.rm=T) , by= "Content Rating"]
plot(x=RT$V1, y=CR$N, type="p", main="Relación: rating y número de apps según segmento de clientes", ylab = "Número de apps", xlab = "Rating promedio")
Vemos la relación entre el número de aplicaciones y el rating promedio, según el segmento de clientes. Cabe destacar que filtramos entre las aplicaciones con relación a PHOTOGRAPHY y SOCIAL además de, PRICE=0, ya que la empresa quiere hacer una app de fotografía gratuita, entonces estas son comparables.
De izquierda a derecha, vemos que los puntos representan, según segmento de clientes:
Mature 17+: Hay 48 aplicaciones en este segmento, con un rating promedio de app. 4,096.
Everyone 10+: Hay 2 aplicaciones en este segmento, con un rating promedio de 4,100.
Everyone: Hay 334 aplicaciones en este segmento, con un rating promedio de app. 4,197.
Teen: Hay 114 aplicaciones en este segmento, con un rating promedio de app. 4,286.
Dada la intepretación del gráfico, vemos que competir en el segmento de clientes “Everyone” no es muy conveniente dada la gran rivalidad (número alto de apps) y rating alto. Creemos que la mejor opción para la empresa es hacer una app de fotografía gratuita para el segmento de clientes “Mature+17” que tiene rivalidad media y un rating bajo en comparación a otros segmentos.