pacotes <- c("readr", "dplyr", "ggplot2", "PerformanceAnalytics", "tidyr" )
lapply(pacotes, library, character.only = TRUE)
filmes <- read_csv("imdb_movies.csv")
filmes$Runtime <- gsub(" min", "", filmes$Runtime)
filmes$Runtime <- as.numeric(filmes$Runtime)
filmes$Released_Year <- as.numeric(as.character(filmes$Released_Year))
From the analyses performed during the innitial analysis, it is already possible to get an idea of which factors most influence a movie’s high revenue. However, additional analyses focused on revenue can be performed.
chart.Correlation((filmes[, c(3, 5, 7, 9, 15, 16)]), histogram = TRUE)
From the correlation table, we can infer that only the number of votes has a significant influence on the movie’s revenue. This indicates that popular movies, which tend to receive a large number of votes, are the ones that earn the most, regardless of their ratings.
Therefore, a studio aiming for high revenue for its movie needs to ensure that it is widely known and popular. Investing in marketing is essential to achieve this goal, ensuring broad exposure that generates curiosity and interest among the audience. This effective promotion will not only increase the film’s visibility but also encourage people to pay to watch it, thus boosting its revenue.
In addition to marketing, we cannot ignore the fact that some directors and actors have the power to attract more audiences to their films. Coincidentally, I explored this relationship between actors and revenue during the EDA phase, as the challenge aims to assist a studio with a high financial investment project.
During the EDA, I investigated which actors, directors, and genres are frequently found in high-revenue films. However, I did not seek to confirm whether these elements actually influence revenue. Therefore, I will now use ANOVA to validate whether actors, directors, and genre really have a significant impact on revenue.
anova1 <- aov(Gross ~ Genre, data = filmes)
anova2 <- aov(Gross ~ Director, data = filmes)
anova3 <- aov(Gross ~ Star1, data = filmes)
summary(anova1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Genre 181 4.315e+18 2.384e+16 2.719 <2e-16 ***
## Residuals 648 5.681e+18 8.767e+15
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 169 observations deleted due to missingness
summary(anova2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Director 471 7.521e+18 1.597e+16 2.31 <2e-16 ***
## Residuals 358 2.475e+18 6.912e+15
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 169 observations deleted due to missingness
summary(anova3)
## Df Sum Sq Mean Sq F value Pr(>F)
## Star1 554 8.127e+18 1.467e+16 2.159 1.38e-12 ***
## Residuals 275 1.869e+18 6.795e+15
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 169 observations deleted due to missingness
Considering a 99% confidence level, all analyzed variables were statistically significant for movie revenue, with p-values below 0.01.
Based on these data, it can be concluded that the most important factors for a movie’s high revenue are investment in marketing to increase its popularity, as well as the strategic choice of actors, directors, and the film’s genre.
This does not mean that other factors, such as the intrinsic quality of the film or public perception of quality, are not influential for revenue. It simply means that, based on the analyzed data, popularity and the composition of actors, directors, and genre are the main drivers of revenue.