The data that I have use is top 100 of 2003-2022 according to IMB. The data may have blanks since the movies have not confirm filming location and certificate rating base on research on movies. Matnus is an example of unconfirmed rating since the movie is not well popular and known that movie exist. The graphs are bar graphs, line graph, and scatter plot. The list of variables:
Title
Rating
Year
Month
Certificate
Run time
Directors
Filming Location
Load the required libraries:
library(tidyverse)
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'stringr' was built under R version 4.3.2
library(ggplot2)
library(knitr)
## Warning: package 'knitr' was built under R version 4.3.2
library(cleanrmd)
## Warning: package 'cleanrmd' was built under R version 4.3.2
library(rmarkdown)
library(htmlTable)
## Warning: package 'htmlTable' was built under R version 4.3.2
library(plotly)
## Warning: package 'plotly' was built under R version 4.3.2
library(jpeg)
library(dplyr)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.3.2
library(ggstream)
## Warning: package 'ggstream' was built under R version 4.3.2
library(hrbrthemes)
## Warning: package 'hrbrthemes' was built under R version 4.3.2
library(tidyquant)
## Warning: package 'tidyquant' was built under R version 4.3.2
## Warning: package 'PerformanceAnalytics' was built under R version 4.3.2
## Warning: package 'xts' was built under R version 4.3.2
## Warning: package 'zoo' was built under R version 4.3.2
## Warning: package 'quantmod' was built under R version 4.3.2
## Warning: package 'TTR' was built under R version 4.3.2
library(showtext)
## Warning: package 'showtext' was built under R version 4.3.2
## Warning: package 'sysfonts' was built under R version 4.3.2
## Warning: package 'showtextdb' was built under R version 4.3.2
library(tinytex)
## Warning: package 'tinytex' was built under R version 4.3.2
movies <- read.csv("C:/Users/Sophi/Downloads/movies.csv")
#Removing duplicates
movies <- movies %>%
distinct(Directors, .keep_all = TRUE)
#Group by the Highest Values
movies <- movies %>%
arrange(desc(Rating)) %>%
group_by(Directors) %>%
slice(1:3)
movies = subset(movies, select = -c(X,X.1,X.2,X.3))
head(movies)
## # A tibble: 6 × 8
## # Groups: Directors [6]
## Title Rating Year Month Certificate Runtime Directors Filming_location
## <chr> <dbl> <int> <chr> <chr> <chr> <chr> <chr>
## 1 Like Stars … 8.4 2007 Dece… PG 165 Aamir Kh… India
## 2 Brother Bear 6.8 2003 Nove… G 85 Aaron Bl… Unknown
## 3 Circle 6 2015 Octo… Not Rated 87 Aaron Ha… Unknown
## 4 Greyhound 7 2020 July PG-13 91 Aaron Sc… USA
## 5 The Trial o… 7.7 2020 Octo… R 129 Aaron So… USA
## 6 Blue Is the… 7.7 2013 Octo… NC-17 180 Abdellat… France
Bar graph of movies that were certified as G, PG, PG-13,or R.
movies |>
filter(!is.na(Certificate)) |>
ggplot(aes(x=Certificate, fill= Certificate))+
geom_bar()+
geom_text(aes(label= after_stat(count)),stat = "count", vjust = 1.5, colours = "white")+
ggtitle("Count of Certified Rating of Movies")+
theme(plot.title = element_text(margin=margin(t=40, b=30)))
## Warning in geom_text(aes(label = after_stat(count)), stat = "count", vjust =
## 1.5, : Ignoring unknown parameters: `colours`
#ggplot(data = movies, aes(x=Certificate, fill= Certificate))+ geom_bar(stat = "count")
A bar graph to show who have the highest movies rating of directors from the USA. Since most of the movies location are set in USA in year of 2022, higher than 6.4 rating. Most popular director with the highest rating is Joseph Kosinski of Top Gun: Maverick. The lowest rating in the year of 2022 of USA is 6.5 with the Director Jaume Collet Sera of the Black Adam.
movies %>%
filter(Rating > 6.4) %>%
filter(Year > 2021)%>%
filter(Filming_location == "USA") %>%
ggplot(aes(x=Directors, y=Rating, fill=Directors)) +
geom_col()+
geom_text(aes(label = Rating), vjust = -0.2)+
ggtitle("The highest ratings movies in the year 2022 of Directors in the USA")+
theme(plot.title = element_text(margin=margin(t=10, b=20)))
This is shaded area graph that grab all the year and ratings. Unforgettably, this also graph all the data of the movies title, director, filming location,certificate, and run time. This is a cluster shaded area graph which record from 2003 to 2022. The shaded area show the highest in point of 8.5 from the year 2011-2022. The lowest point with 1.9 rating in the year 2008 called Disater Movie directed by Jason Friedberg and Aaron Seltzer.
movies %>%
filter(Rating > 6.5) %>%
filter(Filming_location == "USA") %>%
ggplot(data = movies, mapping= aes(x=Year, y= Rating))+
geom_area(colour="black", fill="blue", alpha=.2)+
geom_text(aes(label = Rating), vjust = -0.2)+
ggtitle("The Rating of Movies during the 2003-2022")+
theme(plot.title = element_text(margin=margin(t=40, b=30)))
## Warning: Removed 1 rows containing non-finite values (`stat_align()`).
## Warning: Removed 1 rows containing missing values (`geom_text()`).
Scatter Plot in this graph to show relation between the run time and year from different director in the USA of 2022. The relation between of run time and year is non-linear which mean that there a weak correlation between run time and rating. Also point out the run time show the top of 98 but the low point is 107 minutes.
movies %>%
filter(Rating > 6.5) %>%
filter(Year > 2021 ) %>%
filter(Filming_location == "USA"|Filming_location == "Canada")%>%
ggplot(movies, mapping= aes(x=Rating, y=Runtime)) +
geom_point(alpha = 1, aes(color= Directors))+
ggtitle("Relation between the Run Time and Directors")+
theme(plot.title = element_text(margin=margin(t=40, b=30)))