About the Graph and the Data

The data that I have use is top 100 of 2003-2022 according to IMB. The data may have blanks since the movies have not confirm filming location and certificate rating base on research on movies. Matnus is an example of unconfirmed rating since the movie is not well popular and known that movie exist. The graphs are bar graphs, line graph, and scatter plot. The list of variables:

Packages

Load the required libraries:

library(tidyverse)
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'stringr' was built under R version 4.3.2
library(ggplot2)
library(knitr)
## Warning: package 'knitr' was built under R version 4.3.2
library(cleanrmd)
## Warning: package 'cleanrmd' was built under R version 4.3.2
library(rmarkdown)
library(htmlTable)
## Warning: package 'htmlTable' was built under R version 4.3.2
library(plotly)
## Warning: package 'plotly' was built under R version 4.3.2
library(jpeg)
library(dplyr)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.3.2
library(ggstream)
## Warning: package 'ggstream' was built under R version 4.3.2
library(hrbrthemes)
## Warning: package 'hrbrthemes' was built under R version 4.3.2
library(tidyquant)
## Warning: package 'tidyquant' was built under R version 4.3.2
## Warning: package 'PerformanceAnalytics' was built under R version 4.3.2
## Warning: package 'xts' was built under R version 4.3.2
## Warning: package 'zoo' was built under R version 4.3.2
## Warning: package 'quantmod' was built under R version 4.3.2
## Warning: package 'TTR' was built under R version 4.3.2
library(showtext)
## Warning: package 'showtext' was built under R version 4.3.2
## Warning: package 'sysfonts' was built under R version 4.3.2
## Warning: package 'showtextdb' was built under R version 4.3.2
library(tinytex)
## Warning: package 'tinytex' was built under R version 4.3.2

Reading the csv files

movies <- read.csv("C:/Users/Sophi/Downloads/movies.csv")

#Removing duplicates
movies <- movies %>%
  distinct(Directors, .keep_all = TRUE)

#Group by the Highest Values
movies <- movies %>%
  arrange(desc(Rating)) %>%
  group_by(Directors) %>%
  slice(1:3)
movies = subset(movies, select = -c(X,X.1,X.2,X.3))
head(movies)
## # A tibble: 6 × 8
## # Groups:   Directors [6]
##   Title        Rating  Year Month Certificate Runtime Directors Filming_location
##   <chr>         <dbl> <int> <chr> <chr>       <chr>   <chr>     <chr>           
## 1 Like Stars …    8.4  2007 Dece… PG          165     Aamir Kh… India           
## 2 Brother Bear    6.8  2003 Nove… G           85      Aaron Bl… Unknown         
## 3 Circle          6    2015 Octo… Not Rated   87      Aaron Ha… Unknown         
## 4 Greyhound       7    2020 July  PG-13       91      Aaron Sc… USA             
## 5 The Trial o…    7.7  2020 Octo… R           129     Aaron So… USA             
## 6 Blue Is the…    7.7  2013 Octo… NC-17       180     Abdellat… France

Plotting Graphs 1

Bar graph of movies that were certified as G, PG, PG-13,or R.

movies |>
  filter(!is.na(Certificate)) |>
  ggplot(aes(x=Certificate, fill= Certificate))+
  geom_bar()+
  geom_text(aes(label= after_stat(count)),stat = "count", vjust = 1.5, colours = "white")+
  ggtitle("Count of Certified Rating of Movies")+
  theme(plot.title = element_text(margin=margin(t=40, b=30)))
## Warning in geom_text(aes(label = after_stat(count)), stat = "count", vjust =
## 1.5, : Ignoring unknown parameters: `colours`

#ggplot(data = movies, aes(x=Certificate, fill= Certificate))+ geom_bar(stat = "count")

Plotting Graph 2

A bar graph to show who have the highest movies rating of directors from the USA. Since most of the movies location are set in USA in year of 2022, higher than 6.4 rating. Most popular director with the highest rating is Joseph Kosinski of Top Gun: Maverick. The lowest rating in the year of 2022 of USA is 6.5 with the Director Jaume Collet Sera of the Black Adam.

movies %>%
  filter(Rating > 6.4) %>%
  filter(Year > 2021)%>%
  filter(Filming_location == "USA") %>%
  ggplot(aes(x=Directors, y=Rating, fill=Directors)) +
  geom_col()+
  geom_text(aes(label = Rating), vjust = -0.2)+
  ggtitle("The highest ratings movies in the year 2022 of Directors in the USA")+
  theme(plot.title = element_text(margin=margin(t=10, b=20)))

Plotting Graph 3

This is shaded area graph that grab all the year and ratings. Unforgettably, this also graph all the data of the movies title, director, filming location,certificate, and run time. This is a cluster shaded area graph which record from 2003 to 2022. The shaded area show the highest in point of 8.5 from the year 2011-2022. The lowest point with 1.9 rating in the year 2008 called Disater Movie directed by Jason Friedberg and Aaron Seltzer.

movies %>%
  filter(Rating > 6.5) %>%
  filter(Filming_location == "USA") %>%
  ggplot(data = movies, mapping= aes(x=Year, y= Rating))+
  geom_area(colour="black", fill="blue", alpha=.2)+
  geom_text(aes(label = Rating), vjust = -0.2)+
  ggtitle("The Rating of Movies during the 2003-2022")+
  theme(plot.title = element_text(margin=margin(t=40, b=30)))
## Warning: Removed 1 rows containing non-finite values (`stat_align()`).
## Warning: Removed 1 rows containing missing values (`geom_text()`).

Plotting Graph 4

Scatter Plot in this graph to show relation between the run time and year from different director in the USA of 2022. The relation between of run time and year is non-linear which mean that there a weak correlation between run time and rating. Also point out the run time show the top of 98 but the low point is 107 minutes.

movies %>%
  filter(Rating > 6.5) %>%
  filter(Year > 2021 ) %>%
  filter(Filming_location == "USA"|Filming_location == "Canada")%>%
  ggplot(movies, mapping= aes(x=Rating, y=Runtime)) +
  geom_point(alpha = 1, aes(color= Directors))+
   ggtitle("Relation between the Run Time and Directors")+
  theme(plot.title = element_text(margin=margin(t=40, b=30)))