title: “IMDBDATADIVE2” output: html_document date: “2023-09-04”
data <- read.csv(“C:/Users/lohit allaparti/Downloads/imdb.csv”)
summary(data\(date_x) summary(data\)score) summary(data\(budjet_x) summary(data\)revenue)
(These above coloumns are the Numeric Summaries in my dataset)
table(data\(name) table(data\)genre) table(data\(overview) table(data\)crew) table(data\(orig_title) table(data\)status) table(data\(orig_lang) table(data\)country)
(The above coloumns are categorical summaries in my dataset)
Coloumn Summaries # What Is the Distribution of Movie Ratings (Scores)? # How Do Movie Budgets Vary Across Different Genres? Data Documentation # Is There a Correlation Between Movie Budget and Revenue? # How Have Viewer Ratings Evolved Over Time? Projects Goal/ Purpose # which genres are most profitable? # Do User Ratings Vary by Movie Genre?
#Group by
Group by original language and calculate the average revenue # Load the dplyr package library(dplyr)
language_avg_revenue <- data %>% group_by(orig_lang) %>% summarize(avg_revenue = mean(revenue, na.rm = TRUE))
language_avg_revenue
(The above code will help us to calculate the average revenue for the movies grouped by their original language)
Calculating the average movie score
avg_score <- mean(data$score, na.rm = TRUE)
avg_score
(The above code helps us to determine the average scores for movies in my dataset)
Creating a scatter plot for ‘budget_x’ vs. ‘revenue’
ggplot(data, aes(x = budget_x, y = revenue)) + geom_point() + labs(title = “Budget vs. Revenue”, x = “Budget”, y = “Revenue”) + theme_minimal()
( The above scatter plot helps us to visualize the realtion between Budjet and the revenue in my dataset)
Creating a box plot for ‘genre’ vs. ‘score’ ggplot(data, aes(x = genre, y = score)) + geom_boxplot() + labs(title = “Distribution of Scores by Genre”, x = “Genre”, y = “Score”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
(The above box plot helps us to visualize the distribution of movie scores by genre in my data set)
Creating a bar plot for ‘status’ ggplot(data, aes(x = status)) + geom_bar() + labs(title = “Distribution of Movies by Status”, x = “Status”, y = “Count”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
(The above bar plot helps us to visualize the distribution of movies by their status in my dataset. )