The dataset I have taken is about the apps in google playstore from kaggle. The dataset consists of app name, category, ratings,reviews and etc… And also it gives the information about the app is paid or free and version of android to be support.
'data.frame': 300 obs. of 13 variables:
$ App : chr "Photo Editor & Candy Camera & Grid & ScrapBook" "Coloring book moana" "U Launcher Lite – FREE Live Cool Themes, Hide Apps" "Sketch - Draw & Paint" ...
$ Category : chr "ART_AND_DESIGN" "ART_AND_DESIGN" "ART_AND_DESIGN" "ART_AND_DESIGN" ...
$ Rating : num 4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
$ Reviews : chr "159" "967" "87510" "215644" ...
$ Size : chr "19M" "14M" "8.7M" "25M" ...
$ Installs : chr "10,000+" "500,000+" "5,000,000+" "50,000,000+" ...
$ Type : chr "Free" "Free" "Free" "Free" ...
$ Price : chr "0" "0" "0" "0" ...
$ Content.Rating: chr "Everyone" "Everyone" "Everyone" "Teen" ...
$ Genres : chr "Art & Design" "Art & Design;Pretend Play" "Art & Design" "Art & Design" ...
$ Last.Updated : chr "07-Jan-18" "15-Jan-18" "01-Aug-18" "08-Jun-18" ...
$ Current.Ver : chr "1.0.0" "2.0.0" "1.2.4" "Varies with device" ...
$ Android.Ver : chr "4.0.3 and up" "4.0.3 and up" "4.0.3 and up" "4.2 and up" ...
From this the structure of the dataset is shown whether the column is numeric or character.
[1] "App" "Category" "Rating" "Reviews"
[5] "Size" "Installs" "Type" "Price"
[9] "Content.Rating" "Genres" "Last.Updated" "Current.Ver"
[13] "Android.Ver"
It displays the names of the column of our dataset.
App Category Rating Reviews
Length:300 Length:300 Min. :3.100 Length:300
Class :character Class :character 1st Qu.:4.100 Class :character
Mode :character Mode :character Median :4.300 Mode :character
Mean :4.304
3rd Qu.:4.500
Max. :4.900
NA's :11
Size Installs Type Price
Length:300 Length:300 Length:300 Length:300
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Content.Rating Genres Last.Updated Current.Ver
Length:300 Length:300 Length:300 Length:300
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Android.Ver
Length:300
Class :character
Mode :character
Summary function used to display mean,median,sd and length of the dataset we taken and it used to do plot later.
App Category Rating
1 Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN 4.1
2 Coloring book moana ART_AND_DESIGN 3.9
3 U Launcher Lite – FREE Live Cool Themes, Hide Apps ART_AND_DESIGN 4.7
4 Sketch - Draw & Paint ART_AND_DESIGN 4.5
5 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN 4.3
6 Paper flowers instructions ART_AND_DESIGN 4.4
Reviews Size Installs Type Price Content.Rating Genres
1 159 19.0 10,000+ Free 0 Everyone Art & Design
2 967 14.0 500,000+ Free 0 Everyone Art & Design;Pretend Play
3 87510 8.7 5,000,000+ Free 0 Everyone Art & Design
4 215644 25.0 50,000,000+ Free 0 Teen Art & Design
5 967 2.8 100,000+ Free 0 Everyone Art & Design;Creativity
6 167 5.6 50,000+ Free 0 Everyone Art & Design
Last.Updated Current.Ver Android.Ver
1 07-Jan-18 1.0.0 4.0.3 and up
2 15-Jan-18 2.0.0 4.0.3 and up
3 01-Aug-18 1.2.4 4.0.3 and up
4 08-Jun-18 Varies with device 4.2 and up
5 20-Jun-18 1.1 4.4 and up
6 26-Mar-17 1 2.3 and up
In the dataset from beauty category top installed apps are ploted in barplot. In that the most installed app in beauty category is Hush-Beauty for everyone and it has rating 4.7 and the least installed app in beauty category is Ipsy:Mackup,beauty and tips and it has 4.9 rating.
In the dataset from Art and Design category top installed apps are ploted in barplot. In that there are four top most installed app in Art and Design category and it has rating 4.7 and the least installed app in Art and Design category is Canva and it also has 4.7 rating.
In this barplot we have the app category with respect to rating is shown along type of app i.e, free or paid. In this only in business category have paid app with 4.7 rating and all other category have free apps only. In this all categories have almost equal rating between 4.7 to 4.9.
This is the bar chart for app category with content rating which would divided by rating greater than 4.5. In this only comics category all apps have rating above 4.5 and others are mixed. The business category is the only category which has the content rating only given by everyone.
This bar chart shows that the app category with size of the app and content rating. In this we observe that only in business category the content rating given is everyone and all other categories are rating given by all others. The maximum size of the app is in Auto and Vehicles and which has size nearly 200.
In this category wise app rating is shown. In this comics category only have rating 4.7, in Books and reference 40% of apps have 4.5 rating and in beauty category the apps have rating between 3.7 to 4.9.
This hiatogram shows that app size with frequency. It is seen that most of the apps are in size 0 to 20mb. Only one app have 200mb and few app have 50 to 70mb. The mean value of the app size is aprox 20mb.
It shows that most of apps have 10 to 20mb and with 53 apps greater than 4.5 rating and 15 apps less than 4.5 rating. In this one app have 75mb and one have 200mb size these two are outliers.
#Scatterplot
In this plot rating along with reviews is shown with respect to the category of app. In this books and reviews have one outlier value in reviews. If the rating is low then reviews are alos less and viceversa.
In this boxplot Art and Design, Auto and Vehicles and Beauty category apps with rating and android version is shown. In Art and Design category more apps have version 4.2 and up which has median rating value of 3.8. The Auto and Vehicles category has one outlier app with least rating.
Content rating percentage for app category is shown in bar chart. In this business category have 100% everyone rating and Art and Design have 90% everyone rating, 8% teen rating and 2% 10+ rating. comics category has 50% adults and 50% teens rating the apps.
[1] "So there are all total of 6 categories in the dataset."
[1] "In the play store most of the apps are under Family category and least are of Comics Category."
[1] "The Everyone content rating has the highest number of apps."
[1] "Most of the apps in the google play store are rated between 3.5 to 4.8."
[1] "The Family category apps are mostly installed."
[1] "The top rating app in the Beauty category is Ipsy:Mackup,Beauty abd Tips"
---
title: "Google_playstore_app"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
social: menu
theme: united
storyboard: TRUE
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(plyr)
library(dplyr)
library(magrittr)
library(lattice)
library(ggplot2)
library(DT)
gplay0=read.csv("C:/Users/HP/Downloads/googleplaystore.csv")
gplay=head(gplay0,300)
```
-----------------------------------------------------------------------
# Introduction{.tabset}
The dataset I have taken is about the apps in google playstore from kaggle. The dataset consists of app name, category, ratings,reviews and etc... And also it gives the information about the app is paid or free and version of android to be support.
## DD{.tabset}
### Structure
```{r}
str(gplay)
```
From this the structure of the dataset is shown whether the column is numeric or character.
### Columns
```{r}
names(gplay)
```
It displays the names of the column of our dataset.
### Summary
```{r}
summary(gplay)
```
Summary function used to display mean,median,sd and length of the dataset we taken and it used to do plot later.
### Change the type and remove Null value
```{r}
gplay$Size = substr(gplay$Size,1,nchar(gplay$Size)-1)
gplay$Size <- as.numeric(gplay$Size)
gplay$Reviews <- as.numeric(gplay$Reviews)
gplay<- na.omit(gplay)
head(gplay)
```
# Top apps
## DD{.tabset}
### Top installed apps under Beauty
```{r}
gp1<- subset(gplay, Category=="BEAUTY", select = c(App, Installs, Rating))
gp1<-slice_max(gp1,n=5,gp1$Rating)
ggplot(gp1, aes(x=gp1$App, y=gp1$Installs, fill=gp1$Rating))+geom_bar(stat="identity")+labs(title="Most installed Apps under Beauty Category",x="Apps",y="Installs")
```
In the dataset from beauty category top installed apps are ploted in barplot. In that the most installed app in beauty category is Hush-Beauty for everyone and it has rating 4.7 and the least installed app in beauty category is Ipsy:Mackup,beauty and tips and it has 4.9 rating.
### Top installed apps under Art and Design
```{r}
gp2=subset(gplay,Category=="ART_AND_DESIGN",select = c(App, Installs, Rating))
gp2=slice_max(gp2,n=10,gp2$Rating)
ggplot(gp2,aes(x=App,y=Installs,fill=Rating))+geom_bar(stat = "identity")
```
In the dataset from Art and Design category top installed apps are ploted in barplot. In that there are four top most installed app in Art and Design category and it has rating 4.7 and the least installed app in Art and Design category is Canva and it also has 4.7 rating.
# Category wise
## DD{.tabset}
### Rating with category
```{r}
ggplot(gplay, aes(x= Category, y= Rating, fill = Type)) +
geom_bar(position='dodge',stat = "identity")
```
In this barplot we have the app category with respect to rating is shown along type of app i.e, free or paid. In this only in business category have paid app with 4.7 rating and all other category have free apps only. In this all categories have almost equal rating between 4.7 to 4.9.
### Content Rate with category
```{r}
ggplot(gplay, aes(x= Category, y=Content.Rating , fill = Rating>4.5)) +
geom_bar(position='dodge',stat='identity')
```
This is the bar chart for app category with content rating which would divided by rating greater than 4.5. In this only comics category all apps have rating above 4.5 and others are mixed. The business category is the only category which has the content rating only given by everyone.
### Size with category
```{r}
ggplot(gplay, aes(x= Category, y= Size, fill = Content.Rating)) +
geom_bar(position='dodge',stat='identity')
```
This bar chart shows that the app category with size of the app and content rating. In this we observe that only in business category the content rating given is everyone and all other categories are rating given by all others. The maximum size of the app is in Auto and Vehicles and which has size nearly 200.
### Rating with category
```{r}
histogram(~gplay$Rating|gplay$Category,col=c(1,3))
```
In this category wise app rating is shown. In this comics category only have rating 4.7, in Books and reference 40% of apps have 4.5 rating and in beauty category the apps have rating between 3.7 to 4.9.
# Histogram
## DD{.tabset}
### Histogram of size
```{r}
hist(gplay$Size,col = "green")
abline(v=mean(gplay$Size),col="black",lwd=2)
```
This hiatogram shows that app size with frequency. It is seen that most of the apps are in size 0 to 20mb. Only one app have 200mb and few app have 50 to 70mb. The mean value of the app size is aprox 20mb.
### Size with rating
```{r}
qplot(Size,data=gplay,geom="histogram",fill=Rating>4)
```
It shows that most of apps have 10 to 20mb and with 53 apps greater than 4.5 rating and 15 apps less than 4.5 rating. In this one app have 75mb and one have 200mb size these two are outliers.
#Scatterplot
## DD{.tabset}
### Rating with Reviews
```{r}
ggplot(gplay, aes(x=Rating, y=Reviews)) +
geom_point(aes(color = Category))+geom_smooth()
```
In this plot rating along with reviews is shown with respect to the category of app. In this books and reviews have one outlier value in reviews. If the rating is low then reviews are alos less and viceversa.
# Boxplot
## DD{.tabset}
### Category with version
```{r}
gp3=subset(gplay, Category==c("ART_AND_DESIGN","AUTO_AND_VEHICLES","BEAUTY"))
ggplot(gp3,aes(x=Category,y=Rating))+geom_boxplot(aes(fill=Android.Ver),outlier.shape = 2,outlier.color = "red")+theme_classic()+coord_flip()
```
In this boxplot Art and Design, Auto and Vehicles and Beauty category apps with rating and android version is shown. In Art and Design category more apps have version 4.2 and up which has median rating value of 3.8. The Auto and Vehicles category has one outlier app with least rating.
# content rating
## DD{.tabset}
```{r}
ggplot(gplay,aes(x=Category,fill=Content.Rating))+geom_bar(position = "fill")
```
Content rating percentage for app category is shown in bar chart. In this business category have 100% everyone rating and Art and Design have 90% everyone rating, 8% teen rating and 2% 10+ rating. comics category has 50% adults and 50% teens rating the apps.
# Inference
## DD{.tabset}
```{r}
"So there are all total of 6 categories in the dataset."
"In the play store most of the apps are under Family category and least are of Comics Category."
"The Everyone content rating has the highest number of apps."
"Most of the apps in the google play store are rated between 3.5 to 4.8."
"The Family category apps are mostly installed."
"The top rating app in the Beauty category is Ipsy:Mackup,Beauty abd Tips"
```
# Download
```{r}
datatable(gplay,extensions='Buttons',options=list(dom="Bftrip",buttons=c('copy','print','csv','pdf')))
```