Tidyverse Recepie:
The Tidyverses is an collection of R packages.When Tidyverse is loaded it loads ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats.
Forcats and ggplot:
For the implementation of Tidyverse, I have selected Forcats and ggplot libraries from this package.dplyr was used as well. I have selected Disney movies gross income dataset from the 1937-2016 from Kaggle.
The purpose this analysis is to categorized Disney movies according to their genre. Those movies gross income is also going to be analyzed.
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(forcats)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
disney_movies_total_gross <- read.csv("https://raw.githubusercontent.com/maliat-hossain/FileProcessing/main/disney_movies_total_gross.csv")
head(disney_movies_total_gross)%>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 10) %>%
scroll_box(height = "500px", width = "100%")
|
movie_title
|
release_date
|
genre
|
mpaa_rating
|
total_gross
|
inflation_adjusted_gross
|
|
Snow White and the Seven Dwarfs
|
12/21/1937
|
Musical
|
G
|
184925485
|
5228953251
|
|
Pinocchio
|
2/9/1940
|
Adventure
|
G
|
84300000
|
2188229052
|
|
Fantasia
|
11/13/1940
|
Musical
|
G
|
83320000
|
2187090808
|
|
Song of the South
|
11/12/1946
|
Adventure
|
G
|
65000000
|
1078510579
|
|
Cinderella
|
2/15/1950
|
Drama
|
G
|
85000000
|
920608730
|
|
20,000 Leagues Under the Sea
|
12/23/1954
|
Adventure
|
|
28200000
|
528279994
|
Only necessary rows and columns have been selected using Tidyverse package dplyr. For this assignment I am focusing on the Disney movies released from 1937 to 1954.
DisneyMovies<-disney_movies_total_gross %>% dplyr::select(1)
DisneyMovies1<-DisneyMovies[1:10,]
The dataframe has been factorized for the purpose of implementing categories. The movies have been categorized as musical,adventure,comedy and drama.Forcats from tidyverse works really well to manipulate categorical variable.
DisneyMovies2<-factor(DisneyMovies1)
view(DisneyMovies2)%>% kable() %>%
kable_styling(bootstrap_options = "striped", font_size = 10) %>%
scroll_box(height = "500px", width = "100%")
|
x
|
|
Snow White and the Seven Dwarfs
|
|
Pinocchio
|
|
Fantasia
|
|
Song of the South
|
|
Cinderella
|
|
20,000 Leagues Under the Sea
|
|
Lady and the Tramp
|
|
Sleeping Beauty
|
|
101 Dalmatians
|
|
The Absent Minded Professor
|
DisneyMovies2<-fct_recode(DisneyMovies2 ,Musical="Snow White and the Seven Dwarfs" ,Adventure="Pinocchio",Musical="Fantasia",Adventure ="Song of the South",Drama="Cinderella",Adventure="20,000 Leagues Under the Sea",Drama="Lady and the Tramp",Drama="Sleeping Beauty",Comedy="101 Dalmatians",Comedy="The Absent Minded Professor")
Total gross income column for these movies have been added.
DisneyMovies3<-disney_movies_total_gross %>% dplyr::select(1,5)
DisneyMovies3<-DisneyMovies3[1:10,]
Summary statistics for total gross revenue from Disney movies has been calculated.
## movie_title total_gross
## Length:10 Min. : 9464608
## Class :character 1st Qu.: 37400000
## Mode :character Median : 83810000
## Mean : 81219150
## 3rd Qu.: 91450000
## Max. :184925485
The outcome of selected movies’ income has been visualized through the barplot. Each color represents different income status.
library(ggplot2)
ggplot(data = DisneyMovies4, aes(x = movie_title, fill = comparison_movies)) +
geom_bar(position = "dodge")+coord_flip()

Conclusion
The plot shows most of the Disney movies have earned above average from the year 1937 to 1954.