data <- read.csv ("C:\\Users\\91630\\OneDrive\\Desktop\\statistics\\age_gaps.CSV")

Creating 2 new columns:

# Calculate mean ages
data$mean_ages <- (data$actor_1_age + data$actor_2_age) / 2
summary(data$mean_ages)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   30.00   34.50   35.42   39.75   69.00
# printing mean_ages
head(data[, c("actor_1_age", "actor_2_age", "mean_ages")])
##   actor_1_age actor_2_age mean_ages
## 1          75          23      49.0
## 2          74          24      49.0
## 3          69          20      44.5
## 4          68          23      45.5
## 5          81          38      59.5
## 6          59          17      38.0
# Calculate movie age
data$movie_age <- 2024 - data$release_year

# printing the first few rows
head(data[, c("release_year", "movie_age")])
##   release_year movie_age
## 1         1971        53
## 2         2006        18
## 3         2002        22
## 4         1998        26
## 5         2010        14
## 6         1992        32

Visualization:

# Visualization for actor_1_age and actor_2_age vs. mean age
plot(data$actor_1_age, data$mean_ages, 
     xlab = "Actor 1 Age", ylab = "Mean Ages",
     main = "Actors Age vs. Mean Ages",
     col = "blue", pch = 16)
points(data$actor_2_age, data$mean_ages, col = "red", pch = 16)
legend("topright", legend = c("Actor 1", "Actor 2"), col = c("blue", "red"), pch = 16)

# Visualization for release_year vs. movie_age
plot(data$release_year, data$movie_age, 
     xlab = "Release Year", ylab = "Movie Age",
     main = "Release Year vs. Movie Age",
     col = "green", pch = 16)

1. Actors age vs Mean Ages

Insight: The graphic illustrates how the average age of films is influenced by the combined ages of actors 1 and 2. It sheds light on casting patterns, revealing whether films typically star older or younger actors and how this affects the age range of the films as a whole.

Significance: Filmmakers can make well-informed casting decisions based on target audience demographics by knowing the relationship between actor ages and mean ages. It also reveals the preferences of the audience with relation to the age range of actors in motion pictures.

2.Release Year vs Movie Age:


Insights: This graphic shows how the chronological age of films changes according to the years of their release. It displays patterns in the lifetime of films and viewer engagement over time periods.

Significance: Filmmakers and distributors can learn more about audience involvement with older versus fresher content by analyzing the relationship between release year and movie age. It also directs content acquisition tactics for platforms that serve a wide range of age groups and audience preferences.

Further questions

1. How do box office performance and audience engagement in movies relate to the age distribution of the main and supporting actors?
2. Are there any obvious patterns in the actors cast for various film genres, and if so, how do these patterns affect the age distribution of films as a whole?
3. What influences the age distribution of actors in movies, and how do industry norms or casting preferences manifest themselves as outliers or clusters within this distribution?

cor_actor1_mean <- cor(data$actor_1_age, data$mean_ages)

cor_actor2_mean <- cor(data$actor_2_age, data$mean_ages)

# Printing correlation coefficients
cat("Correlation coefficient between actor_1_age and mean_ages:", cor_actor1_mean, "\n")
## Correlation coefficient between actor_1_age and mean_ages: 0.9262646
cat("Correlation coefficient between actor_2_age and mean_ages:", cor_actor2_mean, "\n")
## Correlation coefficient between actor_2_age and mean_ages: 0.8516591
cor_release_movie <- cor(data$release_year, data$movie_age)
# Printing correlation coefficient
cat("Correlation coefficient between release_year and movie_age:", cor_release_movie, "\n")
## Correlation coefficient between release_year and movie_age: -1

Insights:

1. An increased association between actor_1_age and mean_ages implies that older actors frequently play lead roles, which affects the ages of the entire cast.
2. There are differences in age ranges within casts, according to correlations with both actor_1_age and actor_2_age; star performers may have a greater influence on the mean age.
3. Significant associations suggest intentional age-based casting decisions that align with audience expectations and industry standards.


Significance:
1. Correlation coefficients measure the dynamics of casting and provide insights into methods to improve audience engagement and character authenticity.
2. The relationships show the prevalence of older performers in lead roles and their effect on cast demographics, mirroring industry standards.
3. Knowing the relationships between ages makes it easier to create casts that appeal to a wide range of audience demographics, which increases the marketability and box office success of a film.

Confidence interval for each response variable:

confidence_interval_mean_ages <- t.test(data$mean_ages)$conf.int

confidence_interval_movie_age <- t.test(data$movie_age)$conf.int

print("Confidence interval for mean_ages:")
## [1] "Confidence interval for mean_ages:"
print(confidence_interval_mean_ages)
## [1] 34.96038 35.88637
## attr(,"conf.level")
## [1] 0.95
print("Confidence interval for movie_age:")
## [1] "Confidence interval for movie_age:"
print(confidence_interval_movie_age)
## [1] 22.25604 24.14569
## attr(,"conf.level")
## [1] 0.95

Insights:

1. We can determine the range in which the actual mean age of actors in movies falls by using the confidence interval for mean ages. Understanding the demographic distribution of actors in various films is made easier with the help of this insight, which offers a peek into the central tendency of actor ages in the dataset.

2. In a similar vein, the movie age confidence interval sheds light on the average age distribution of the films in the dataset. It provides insight into the temporal distribution of films and possible patterns over time by pointing out the range of release years that the majority of them fall into.

Significance:

1. It is important to comprehend the confidence interval for mean ages since it offers important insights into the constancy and variability of actor ages across various films. A bigger interval denotes higher diversity in actor aging, while a narrower range shows more homogeneity.

2. The range of release years for the films in the dataset can be ascertained by analyzing the movie age confidence interval. It provides insights into how the film business has changed over time and aids in evaluating the consistency or diversity of the temporal distribution of films.

Further Questions: