The Tidyverses is an collection of R packages.When Tidyverse is loaded it loads ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats.
Forcats and ggplot:
For the implementation of Tidyverse, I have selected Forcats and ggplot libraries from this package.dplyr was used as well. I have selected Disney movies gross income dataset from the 1937-2016 from Kaggle.
The purpose this blog is to categorized Disney movies according to their genre. Those movies gross income is also going to be analyzed.
https://www.kaggle.com/rashikrahmanpritom/disney-movies-19372016-total-gross
library(dplyr)
library(forcats)
library(ggplot2)
library(kableExtra)
library(data.table)
library(tidyverse)
disney_movies_total_gross <- read.csv("https://raw.githubusercontent.com/maliat-hossain/FileProcessing/main/disney_movies_total_gross.csv")
head(disney_movies_total_gross)%>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 10) %>%
scroll_box(height = "500px", width = "100%")======= library(tidyverse)
url <- “https://raw.githubusercontent.com/maliat-hossain/FileProcessing/main/disney_movies_total_gross.csv”
disney_movies_total_gross <- read.csv(url)
head(disney_movies_total_gross)%>% kable() %>% kable_styling(bootstrap_options = “striped”, font_size = 10) %>% scroll_box(height = “500px”, width = “100%”)
#### Only necessary rows and columns have been selected using Tidyverse package dplyr. For this assignment I am focusing on the Disney movies released from 1937 to 1961.
```r
DisneyMovies<-
disney_movies_total_gross %>%
dplyr::select(1)
DisneyMovies1<-
DisneyMovies[1:10,]
The dataframe has been factorized for the purpose of implementing categories. The movies have been categorized as musical,adventure,comedy and drama.Forcats from tidyverse works really well to manipulate categorical variable.
DisneyMovies2<-
factor(DisneyMovies1)
view(DisneyMovies2)%>%
kable() %>%
kable_styling(bootstrap_options = "striped",
font_size = 10) %>%
scroll_box(height = "500px", width = "100%")| x |
|---|
| Snow White and the Seven Dwarfs |
| Pinocchio |
| Fantasia |
| Song of the South |
| Cinderella |
| 20,000 Leagues Under the Sea |
| Lady and the Tramp |
| Sleeping Beauty |
| 101 Dalmatians |
| The Absent Minded Professor |
DisneyMovies2<-
fct_recode(DisneyMovies2,
Musical="Snow White and the Seven Dwarfs",
Adventure="Pinocchio",
Musical="Fantasia",
Adventure="Song of the South",
Drama="Cinderella",
Adventure="20,000 Leagues Under the Sea",
Drama="Lady and the Tramp",
Drama="Sleeping Beauty",
Comedy="101 Dalmatians",
Comedy="The Absent Minded Professor")Total gross income column for these movies have been added.
DisneyMovies3<-
disney_movies_total_gross %>%
dplyr::select(1,5)
DisneyMovies3<-
DisneyMovies3[1:10,]Summary statistics for total gross revenue from Disney movies has been calculated.
summary(DisneyMovies3)## movie_title total_gross
## Length:10 Min. : 9464608
## Class :character 1st Qu.: 37400000
## Mode :character Median : 83810000
## Mean : 81219150
## 3rd Qu.: 91450000
## Max. :184925485
case_when from dplyr is used for binning the gross income for movies.A variable named comparison_movies has been created which shows if the gross income of selected movie is “Below Average”, “Around Average”,or “Above Average”. To determine the average information from the summary statistics have been used.
DisneyMovies4<-
DisneyMovies3 %>%
mutate(comparison_movies=case_when(
total_gross < 81219150 ~ "Below Average",
total_gross > 81219150 & total_gross <83810000 ~ "Around Average",
TRUE ~ "Above Average"))%>%
select(movie_title,total_gross,comparison_movies)view(DisneyMovies4)%>%
kable() %>%
kable_styling(bootstrap_options = "striped",
font_size = 10) %>%
scroll_box(height = "500px",
width = "100%")| movie_title | total_gross | comparison_movies |
|---|---|---|
| Snow White and the Seven Dwarfs | 184925485 | Above Average |
| Pinocchio | 84300000 | Above Average |
| Fantasia | 83320000 | Around Average |
| Song of the South | 65000000 | Below Average |
| Cinderella | 85000000 | Above Average |
| 20,000 Leagues Under the Sea | 28200000 | Below Average |
| Lady and the Tramp | 93600000 | Above Average |
| Sleeping Beauty | 9464608 | Below Average |
| 101 Dalmatians | 153000000 | Above Average |
| The Absent Minded Professor | 25381407 | Below Average |
The outcome of selected movies’ income has been visualized through the barplot. Each color represents different income status.
ggplot(data = DisneyMovies4,aes(x = movie_title,fill = comparison_movies))+
geom_bar(position = "dodge")+
coord_flip()