library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(tidyr)
library(dplyr)
Movie_data <- read.csv("C:/Users/dbrusche/Desktop/movies (2).csv", sep = ";")
head(Movie_data)
## Name ReleaseDate Action Adventure Children
## 1 Toy Story (1995) 1995 0 0 1
## 2 Jumanji (1995) 1995 0 1 1
## 3 Grumpier Old Men (1995) 1995 0 0 0
## 4 Waiting to Exhale (1995) 1995 0 0 0
## 5 Father of the Bride Part II (1995) 1995 0 0 0
## 6 Heat (1995) 1995 1 0 0
## Comedy Crime Documentary Drama Fantasy Noir Horror Musical Mystery Romance
## 1 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 1 0 0 0 0 0
## 3 1 0 0 0 0 0 0 0 0 1
## 4 1 0 0 1 0 0 0 0 0 0
## 5 1 0 0 0 0 0 0 0 0 0
## 6 0 1 0 0 0 0 0 0 0 0
## SciFi Thriller War Western AvgRating Watches
## 1 0 0 0 0 4.15 2077
## 2 0 0 0 0 3.20 701
## 3 0 0 0 0 3.02 478
## 4 0 0 0 0 2.73 170
## 5 0 0 0 0 3.01 296
## 6 0 1 0 0 3.88 940
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
movie_long <- Movie_data %>%
pivot_longer(
cols = starts_with("Action"):starts_with("Western"),
names_to = "category",
values_to = "is_in_category"
) %>%
filter(is_in_category == 1) %>%
select(Name, ReleaseDate, AvgRating, Watches, category)
selected_data <- movie_long %>%
filter(ReleaseDate == 1995 &
category %in% c("Drama", "Crime", "Comedy", "Romance", "Children"))
average_ratings <- selected_data %>%
group_by(category) %>%
summarise(avg_rating = mean(AvgRating, na.rm = TRUE))
#I first converted my movie data into a long format by consolidating the individual columns of movie categories into a single category column. Next, I filtered the long dataset to focus on the year 1995 and specific categories: Drama, Crime, Comedy, Romance, and Children. Lastly, I calculated the average rating for these categories to visualize the results.
You can also embed plots, for example:
# Load necessary libraries
library(ggplot2)
# Create a bar plot for the count of movies by category
ggplot(average_ratings, aes(x = category, y = avg_rating)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(
title = "Average Ratings by Movie Category (1995)",
x = "Category",
y = "Average Rating"
) +
theme_minimal()
#From the graph, we can see that in 1995, Crime movies were the most popular, with an average rating of 3.41, followed by Drama at 3.28, Romance at 3.18, Comedy at 3.07, and Children at 2.84.