{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)
{r} library(tidyverse)
I have choosen TMDb Movie Data for my Investigation in this project from Kaggle, https://www.kaggle.com/datasets/nagrajdesai/latest-10000-movies-dataset-from-tmdb/code.
This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue.
Project Goals:
1st goal is to learn R
Know how to investigate problems in a data set wrangle the data into a format that can be used.
To draw the Insights from The Movie Data set
Importing the Data Set:
```{r}
IMDB_Movies
Summary of Data Set:
```{r}
summary(IMDB_Movies)
Names Date_x Score Genre
Length:10178 Length:10178 Min. : 0.0 Length:10178
Class :character Class :character 1st Qu.: 59.0 Class :character
Mode :character Mode :character Median : 65.0 Mode :character
Mean : 63.5
3rd Qu.: 71.0
Max. :100.0
Overview Crew Orig_title Status
Length:10178 Length:10178 Length:10178 Length:10178
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Orig_lang Budget_x Revenue Country
Length:10178 Min. : 1 Min. :0.000e+00 Length:10178
Class :character 1st Qu.: 15000000 1st Qu.:2.859e+07 Class :character
Mode :character Median : 50000000 Median :1.529e+08 Mode :character
Mean : 64882379 Mean :2.531e+08
3rd Qu.:105000000 3rd Qu.:4.178e+08
Max. :460000000 Max. :2.924e+09
Data documentation for the variables:
These descriptions provide an overview of the data types, lengths, and summary statistics for each variable in your dataset.
Plotting Budget_x vs Score
```{r}
ggplot(data = IMDB_Movies) + geom_point (mapping = aes(x = Budget_x, y = Score ))
Plotting Top ten orig_lang vs Budget_x
```{r}
top_movies <- IMDB_Movies |>
arrange(desc(Budget_x)) |>
group_by(Orig_lang) |>
slice(1:10)
```{r}
ggplot(data = top_movies, aes(x = Orig_lang, y = Budget_x,)) + geom_bar(stat = “identity”) + labs(title = “Top Ten Movies by Original Language vs. Budget”, x = “Original Language”, y = “Budget”) + theme(axis.text.x = element_text(angle = 45, hjust = 1))
```{r}
IMDB_Movies |>
group_by(Genre) |>
summarise(Avg_Rating = mean(Rating, na.rm = TRUE)) |>
arrange(desc(Avg_Rating))
p <- ggplot(IMDB_Movies, aes(x = Rating))
p <- p + geom_histogram(binwidth = 0.5, fill = "blue", alpha = 0.7)
p <- p + labs(title = "Distribution of Ratings")
ggplot(IMDB_Movies, aes(x = Genre, y = Rating)) +
geom_boxplot(fill = "purple", alpha = 0.7) +
labs(title = "Rating Distribution by Genre")
ggplot(IMDB_Movies, aes(x = Year, y = Rating, color = Genre)) +
geom_point() +
labs(title = "Rating Trends Over the Years by Genre")