data <- read.csv ("C:\\Users\\91630\\OneDrive\\Desktop\\statistics\\age_gaps.CSV")
Creating a column mean_ages by calculating the mean of actor_1_age and actor_2_age
Creating a column movie_age by calculating how old the movie is(as of 2024) using the column release_year
# Calculate mean ages
data$mean_ages <- (data$actor_1_age + data$actor_2_age) / 2
summary(data$mean_ages)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 30.00 34.50 35.42 39.75 69.00
# printing mean_ages
head(data[, c("actor_1_age", "actor_2_age", "mean_ages")])
## actor_1_age actor_2_age mean_ages
## 1 75 23 49.0
## 2 74 24 49.0
## 3 69 20 44.5
## 4 68 23 45.5
## 5 81 38 59.5
## 6 59 17 38.0
# Calculate movie age
data$movie_age <- 2024 - data$release_year
# printing the first few rows
head(data[, c("release_year", "movie_age")])
## release_year movie_age
## 1 1971 53
## 2 2006 18
## 3 2002 22
## 4 1998 26
## 5 2010 14
## 6 1992 32
# Visualization for actor_1_age and actor_2_age vs. mean age
plot(data$actor_1_age, data$mean_ages,
xlab = "Actor 1 Age", ylab = "Mean Ages",
main = "Actors Age vs. Mean Ages",
col = "blue", pch = 16)
points(data$actor_2_age, data$mean_ages, col = "red", pch = 16)
legend("topright", legend = c("Actor 1", "Actor 2"), col = c("blue", "red"), pch = 16)
# Visualization for release_year vs. movie_age
plot(data$release_year, data$movie_age,
xlab = "Release Year", ylab = "Movie Age",
main = "Release Year vs. Movie Age",
col = "green", pch = 16)
Insight: The graphic illustrates how the average age
of films is influenced by the combined ages of actors 1 and 2. It sheds
light on casting patterns, revealing whether films typically star older
or younger actors and how this affects the age range of the films as a
whole.
Significance: Filmmakers can make well-informed casting
decisions based on target audience demographics by knowing the
relationship between actor ages and mean ages. It also reveals the
preferences of the audience with relation to the age range of actors in
motion pictures.
Insights: This graphic shows how the chronological age
of films changes according to the years of their release. It displays
patterns in the lifetime of films and viewer engagement over time
periods.
Significance: Filmmakers and distributors can learn
more about audience involvement with older versus fresher content by
analyzing the relationship between release year and movie age. It also
directs content acquisition tactics for platforms that serve a wide
range of age groups and audience preferences.
1. How do box office performance and audience engagement in movies
relate to the age distribution of the main and supporting actors?
2. Are there any obvious patterns in the actors cast for various film
genres, and if so, how do these patterns affect the age distribution of
films as a whole?
3. What influences the age distribution of actors in movies, and how do
industry norms or casting preferences manifest themselves as outliers or
clusters within this distribution?
cor_actor1_mean <- cor(data$actor_1_age, data$mean_ages)
cor_actor2_mean <- cor(data$actor_2_age, data$mean_ages)
# Printing correlation coefficients
cat("Correlation coefficient between actor_1_age and mean_ages:", cor_actor1_mean, "\n")
## Correlation coefficient between actor_1_age and mean_ages: 0.9262646
cat("Correlation coefficient between actor_2_age and mean_ages:", cor_actor2_mean, "\n")
## Correlation coefficient between actor_2_age and mean_ages: 0.8516591
cor_release_movie <- cor(data$release_year, data$movie_age)
# Printing correlation coefficient
cat("Correlation coefficient between release_year and movie_age:", cor_release_movie, "\n")
## Correlation coefficient between release_year and movie_age: -1
Insights:
1. An increased association between actor_1_age and mean_ages implies
that older actors frequently play lead roles, which affects the ages of
the entire cast.
2. There are differences in age ranges within casts, according to
correlations with both actor_1_age and actor_2_age; star performers may
have a greater influence on the mean age.
3. Significant associations suggest intentional age-based casting
decisions that align with audience expectations and industry
standards.
Significance:
1. Correlation coefficients measure the dynamics of casting and provide
insights into methods to improve audience engagement and character
authenticity.
2. The relationships show the prevalence of older performers in lead
roles and their effect on cast demographics, mirroring industry
standards.
3. Knowing the relationships between ages makes it easier to create
casts that appeal to a wide range of audience demographics, which
increases the marketability and box office success of a film.
confidence_interval_mean_ages <- t.test(data$mean_ages)$conf.int
confidence_interval_movie_age <- t.test(data$movie_age)$conf.int
print("Confidence interval for mean_ages:")
## [1] "Confidence interval for mean_ages:"
print(confidence_interval_mean_ages)
## [1] 34.96038 35.88637
## attr(,"conf.level")
## [1] 0.95
print("Confidence interval for movie_age:")
## [1] "Confidence interval for movie_age:"
print(confidence_interval_movie_age)
## [1] 22.25604 24.14569
## attr(,"conf.level")
## [1] 0.95
Insights:
1. We can determine the range in which the actual mean age of actors
in movies falls by using the confidence interval for mean ages.
Understanding the demographic distribution of actors in various films is
made easier with the help of this insight, which offers a peek into the
central tendency of actor ages in the dataset.
2. In a similar vein, the movie age confidence interval sheds light on
the average age distribution of the films in the dataset. It provides
insight into the temporal distribution of films and possible patterns
over time by pointing out the range of release years that the majority
of them fall into.
Significance:
1. It is important to comprehend the confidence interval for mean ages
since it offers important insights into the constancy and variability of
actor ages across various films. A bigger interval denotes higher
diversity in actor aging, while a narrower range shows more
homogeneity.
2. The range of release years for the films in the dataset can be
ascertained by analyzing the movie age confidence interval. It provides
insights into how the film business has changed over time and aids in
evaluating the consistency or diversity of the temporal distribution of
films.
Further Questions:
Factors Affecting Mean Ages: What variables affect the actors’ mean ages across different movies? Are various movie genres or styles linked to particular actor age demographics?
Long-Term Patterns in Film Age: How have film ages evolved throughout the years? Do the release years of the movies in the dataset exhibit any noteworthy trends or patterns?
Relationship Between Actor and Film Age: Is there a relationship between actors’ ages and the years that films are released? Are casting selections and movie age influenced by other considerations, or do older actors typically feature in films released in a particular decade?