title: “IMDBDATADIVE2” output: html_document date: “2023-09-04”

data <- read.csv(“C:/Users/lohit allaparti/Downloads/imdb.csv”)

Numeric summary for selected columns

summary(data\(date_x) summary(data\)score) summary(data\(budjet_x) summary(data\)revenue)

(These above coloumns are the Numeric Summaries in my dataset)

Categorical summary for selected columns

table(data\(name) table(data\)genre) table(data\(overview) table(data\)crew) table(data\(orig_title) table(data\)status) table(data\(orig_lang) table(data\)country)

(The above coloumns are categorical summaries in my dataset)

5 novel questions

Coloumn Summaries # What Is the Distribution of Movie Ratings (Scores)? # How Do Movie Budgets Vary Across Different Genres? Data Documentation # Is There a Correlation Between Movie Budget and Revenue? # How Have Viewer Ratings Evolved Over Time? Projects Goal/ Purpose # which genres are most profitable? # Do User Ratings Vary by Movie Genre?

USE OF AGGREGATION FUNCTIONS

#Group by

Group by original language and calculate the average revenue # Load the dplyr package library(dplyr)

language_avg_revenue <- data %>% group_by(orig_lang) %>% summarize(avg_revenue = mean(revenue, na.rm = TRUE))

language_avg_revenue

(The above code will help us to calculate the average revenue for the movies grouped by their original language)

Calculating the average movie score

avg_score <- mean(data$score, na.rm = TRUE)

avg_score

(The above code helps us to determine the average scores for movies in my dataset)

VISUAL SUMMARY OF 5 COLOUMNS OF THE DATA

Creating a scatter plot for ‘budget_x’ vs. ‘revenue’

ggplot(data, aes(x = budget_x, y = revenue)) + geom_point() + labs(title = “Budget vs. Revenue”, x = “Budget”, y = “Revenue”) + theme_minimal()

( The above scatter plot helps us to visualize the realtion between Budjet and the revenue in my dataset)

Creating a box plot for ‘genre’ vs. ‘score’ ggplot(data, aes(x = genre, y = score)) + geom_boxplot() + labs(title = “Distribution of Scores by Genre”, x = “Genre”, y = “Score”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))

(The above box plot helps us to visualize the distribution of movie scores by genre in my data set)

Creating a bar plot for ‘status’ ggplot(data, aes(x = status)) + geom_bar() + labs(title = “Distribution of Movies by Status”, x = “Status”, y = “Count”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))

(The above bar plot helps us to visualize the distribution of movies by their status in my dataset. )