For this task, i will choose the “success” column based on the “revenue” variable to create a binary outcome. Movies with revenue above the median will be considered “successful” (1), while those below or equal to the median will be “not successful” (0).
# Creating a binary success variable based on revenue
data$success <- ifelse(data$revenue > median(data$revenue), 1, 0)
head(data)
## names date_x score
## 1 Creed III 03/02/2023 73
## 2 Avatar: The Way of Water 12/15/2022 78
## 3 The Super Mario Bros. Movie 04/05/2023 76
## 4 Mummies 01/05/2023 70
## 5 Supercell 03/17/2023 61
## 6 Cocaine Bear 02/23/2023 66
## genre
## 1 Drama, Action
## 2 Science Fiction, Adventure, Action
## 3 Animation, Adventure, Family, Fantasy, Comedy
## 4 Animation, Comedy, Family, Adventure, Fantasy
## 5 Action
## 6 Thriller, Comedy, Crime
## overview
## 1 After dominating the boxing world, Adonis Creed has been thriving in both his career and family life. When a childhood friend and former boxing prodigy, Damien Anderson, resurfaces after serving a long sentence in prison, he is eager to prove that he deserves his shot in the ring. The face-off between former friends is more than just a fight. To settle the score, Adonis must put his future on the line to battle Damien — a fighter who has nothing to lose.
## 2 Set more than a decade after the events of the first film, learn the story of the Sully family (Jake, Neytiri, and their kids), the trouble that follows them, the lengths they go to keep each other safe, the battles they fight to stay alive, and the tragedies they endure.
## 3 While working underground to fix a water main, Brooklyn plumbers—and brothers—Mario and Luigi are transported down a mysterious pipe and wander into a magical new world. But when the brothers are separated, Mario embarks on an epic quest to find Luigi.
## 4 Through a series of unfortunate events, three mummies end up in present-day London and embark on a wacky and hilarious journey in search of an old ring belonging to the Royal Family, stolen by ambitious archaeologist Lord Carnaby.
## 5 Good-hearted teenager William always lived in hope of following in his late father’s footsteps and becoming a storm chaser. His father’s legacy has now been turned into a storm-chasing tourist business, managed by the greedy and reckless Zane Rogers, who is now using William as the main attraction to lead a group of unsuspecting adventurers deep into the eye of the most dangerous supercell ever seen.
## 6 Inspired by a true story, an oddball group of cops, criminals, tourists and teens converge in a Georgia forest where a 500-pound black bear goes on a murderous rampage after unintentionally ingesting cocaine.
## crew
## 1 Michael B. Jordan, Adonis Creed, Tessa Thompson, Bianca Taylor, Jonathan Majors, Damien Anderson, Wood Harris, Tony 'Little Duke' Evers, Phylicia Rashād, Mary Anne Creed, Mila Davis-Kent, Amara Creed, Florian Munteanu, Viktor Drago, José Benavidez Jr., Felix Chavez, Selenis Leyva, Laura Chavez
## 2 Sam Worthington, Jake Sully, Zoe Saldaña, Neytiri, Sigourney Weaver, Kiri / Dr. Grace Augustine, Stephen Lang, Colonel Miles Quaritch, Kate Winslet, Ronal, Cliff Curtis, Tonowari, Joel David Moore, Norm Spellman, CCH Pounder, Mo'at, Edie Falco, General Frances Ardmore
## 3 Chris Pratt, Mario (voice), Anya Taylor-Joy, Princess Peach (voice), Charlie Day, Luigi (voice), Jack Black, Bowser (voice), Keegan-Michael Key, Toad (voice), Seth Rogen, Donkey Kong (voice), Fred Armisen, Cranky Kong (voice), Kevin Michael Richardson, Kamek (voice), Sebastian Maniscalco, Spike (voice)
## 4 Óscar Barberán, Thut (voice), Ana Esther Alborg, Nefer (voice), Luis Pérez Reina, Carnaby (voice), María Luisa Solá, Madre (voice), Jaume Solà, Sekhem (voice), José Luis Mediavilla, Ed (voice), José Javier Serrano Rodríguez, Danny (voice), Aleix Estadella, Dennis (voice), María Moscardó, Usi (voice)
## 5 Skeet Ulrich, Roy Cameron, Anne Heche, Dr Quinn Brody, Daniel Diemer, William Brody, Jordan Kristine Seamón, Harper Hunter, Alec Baldwin, Zane Rogers, Richard Gunn, Bill Brody, Praya Lundberg, Amy, Johnny Wactor, Martin, Anjul Nigam, Ramesh
## 6 Keri Russell, Sari, Alden Ehrenreich, Eddie, O'Shea Jackson Jr., Daveed, Ray Liotta, Syd, Kristofer Hivju, Olaf (Kristoffer), Margo Martindale, Ranger Liz, Christian Convery, Henry, Isiah Whitlock Jr., Bob, Jesse Tyler Ferguson, Peter
## orig_title status orig_lang budget_x revenue
## 1 Creed III Released English 7.50e+07 271616668
## 2 Avatar: The Way of Water Released English 4.60e+08 2316794914
## 3 The Super Mario Bros. Movie Released English 1.00e+08 724459031
## 4 Momias Released Spanish, Castilian 1.23e+07 34200000
## 5 Supercell Released English 7.70e+07 340941959
## 6 Cocaine Bear Released English 3.50e+07 80000000
## country success
## 1 AU 1
## 2 AU 1
## 3 AU 1
## 4 AU 0
## 5 US 1
## 6 AU 0
library(dplyr)
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-8
# Checking if 'success' variable exists in the data frame
if (!"success" %in% colnames(data)) {
data$success <- ifelse(data$revenue > median(data$revenue), 1, 0)
}
# Verifying the structure of 'success' variable
str(data$success) # Checking the structure
## num [1:10178] 1 1 1 0 1 0 1 1 1 1 ...
# Building a logistic regression model with 'budget_x' as the explanatory variable
model <- glm(success ~ budget_x,
data = data, family = binomial(link = "logit"))
# Displaying model summary
summary(model)
##
## Call:
## glm(formula = success ~ budget_x, family = binomial(link = "logit"),
## data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.739e+00 5.353e-02 -51.16 <2e-16 ***
## budget_x 4.854e-08 8.928e-10 54.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 14109.7 on 10177 degrees of freedom
## Residual deviance: 7309.8 on 10176 degrees of freedom
## AIC: 7313.8
##
## Number of Fisher Scoring iterations: 6
Intercept: The intercept represents the log-odds of success when the budget is zero. In our context, this value is not practically meaningful, as budgets are typically positive values.
budget_x Coefficient: The coefficient for “budget_x” is approximately 4.854e-08. This coefficient signifies that for every one-unit increase in the movie budget (e.g., increasing the budget by $1), the log-odds of a movie being successful increase by 4.854e-08.
The results of the logistic regression model indicate that there is a statistically significant positive relationship between the movie budget and the likelihood of a movie’s success. As the budget increases, the log-odds of success also increase.
It’s important to note that the model assumes a linear relationship between the budget and the log-odds of success.
# Loading necessary libraries
library(ggplot2)
# Creating a scatter plot for 'budget_x'
ggplot(data, aes(x = budget_x, y = success)) +
geom_point() +
labs(x = "Budget (X)", y = "Success (0 or 1)")
The scatter plot reveals a linear relationship between “budget_x” and “success.” As the budget increases, there is a positive trend in the likelihood of success. While the relationship appears linear, it’s important to consider the potential impact of outliers or influential data points. Based on our initial scatter plot analysis, the linear relationship between “budget_x” and “success” suggests that a transformation may not be necessary. The trend of increasing success with higher budgets is well-captured by the linear model. In this analysis, we find that a transformation for “budget_x” may not be necessary. The linear relationship adequately represents the observed trend, and no nonlinear patterns or issues are evident in the scatter plot.