movie_data <- read.csv("C:\\Users\\varsh\\OneDrive\\Desktop\\Gitstuff\\age_gaps.CSV")
mean_ages: This column describes the mean value for actor_1_age and actor_2_age.
std_dev_ages: This column calculates the standard deviation for actor_1_age and actor_2_age.
diff_mean_age_difference: This column stores the difference of mean(age_difference) and age_difference
movie_data$mean_ages <- rowMeans(movie_data[, c('actor_1_age', 'actor_2_age')])
movie_data$std_dev_ages <- apply(movie_data[, c('actor_1_age', 'actor_2_age')], 1, sd)
mean_age_difference <- mean(movie_data$age_difference)
movie_data$diff_mean_age_difference <- movie_data$age_difference - mean_age_difference
head(movie_data)
## movie_name release_year director age_difference couple_number
## 1 Harold and Maude 1971 Hal Ashby 52 1
## 2 Venus 2006 Roger Michell 50 1
## 3 The Quiet American 2002 Phillip Noyce 49 1
## 4 The Big Lebowski 1998 Joel Coen 45 1
## 5 Beginners 2010 Mike Mills 43 1
## 6 Poison Ivy 1992 Katt Shea 42 1
## actor_1_name actor_2_name character_1_gender character_2_gender
## 1 Ruth Gordon Bud Cort woman man
## 2 Peter O'Toole Jodie Whittaker man woman
## 3 Michael Caine Do Thi Hai Yen man woman
## 4 David Huddleston Tara Reid man woman
## 5 Christopher Plummer Goran Visnjic man man
## 6 Tom Skerritt Drew Barrymore man woman
## actor_1_birthdate actor_2_birthdate actor_1_age actor_2_age mean_ages
## 1 1896-10-30 29-03-1948 75 23 49.0
## 2 02-08-1932 03-06-1982 74 24 49.0
## 3 14-03-1933 01-10-1982 69 20 44.5
## 4 17-09-1930 08-11-1975 68 23 45.5
## 5 13-12-1929 09-09-1972 81 38 59.5
## 6 25-08-1933 22-02-1975 59 17 38.0
## std_dev_ages diff_mean_age_difference
## 1 36.76955 41.57576
## 2 35.35534 39.57576
## 3 34.64823 38.57576
## 4 31.81981 34.57576
## 5 30.40559 32.57576
## 6 29.69848 31.57576
library(ggplot2)
plot1 <- ggplot(movie_data, aes(x = age_difference, y = diff_mean_age_difference)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "age_difference vs diff_mean_age_difference",
x = "age_difference",
y = "diff_mean_age_difference")
plot2 <- ggplot(movie_data, aes(x = actor_1_age, y = mean_ages)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "actor_1_age vs mean_ages",
x = "actor_1_age",
y = "mean_ages")
plot3 <- ggplot(movie_data, aes(x = actor_2_age, y = std_dev_ages)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "actor_2_age vs std_dev_ages",
x = "actor_2_age",
y = "std_dev_ages")
print(plot1)
## `geom_smooth()` using formula = 'y ~ x'
print(plot2)
## `geom_smooth()` using formula = 'y ~ x'
print(plot3)
## `geom_smooth()` using formula = 'y ~ x'
This plot displays the relationship between age_difference and diff_mean_age_difference, along with a linear regression line fitted to the data points.
The plot displays a positive linear relationship between age_difference and diff_mean_age_difference, which is further demonstrated by an increasing trend in the regression line. It suggests that when the absolute difference between age_difference and its mean value (diff_mean_age_difference) increases, so does the actual age_difference.
This plot visualizes the relationship between actor_1_age and mean_ages, along with a linear regression line.
The plot shows a strong positive relationship between actor_1_age and mean_ages, as shown by the regression line’s significant upward trend. This shows that as the first actor’s age (actor_1_age) rises, so does the average age of all actors in the film (mean_ages).
This plot illustrates the relationship between actor_2_age and std_dev_ages, along a linear regression line.
The plot demonstrates a weak negative linear relationship between actor_2_age and std_dev_ages, as shown by the regression line’s slightly downward slope. However, the association appears to be rather scattered, indicating that the data points may differ.
1. Analysing outliers or clusters of data points in the age_difference vs diff_mean_age_difference plot may show specific patterns or aspects related to casting decisions in films.
Investigating the significant relationship between actor_1_age and mean_ages could shed light on whether older actors are typically cast in lead roles, potentially influencing the cast’s overall mean age, and looking at age distributions across different genres or movie types may provide additional insights.
Investigating possible factors contributing to the variability in std_dev_ages, such as movie qualities or casting decisions, as well as outliers or clusters of data points, may reveal underlying trends influencing the ages of actors in films.
cor_1 <- cor(movie_data$age_difference, movie_data$diff_mean_age_difference)
cor_2 <- cor(movie_data$actor_1_age, movie_data$mean_ages)
cor_3 <- cor(movie_data$actor_2_age, movie_data$std_dev_ages)
print(paste("Correlation coefficient for age_difference and diff_mean_age_difference:", cor_1))
## [1] "Correlation coefficient for age_difference and diff_mean_age_difference: 1"
print(paste("Correlation coefficient for actor_1_age and mean_ages:", cor_2))
## [1] "Correlation coefficient for actor_1_age and mean_ages: 0.926264555032481"
print(paste("Correlation coefficient for actor_2_age and std_dev_ages:", cor_3))
## [1] "Correlation coefficient for actor_2_age and std_dev_ages: -0.156464786475823"
Cor_1: The perfect correlation indicates that there is a direct and expected relationship between the absolute age_difference from its mean and the actual age_difference. This could imply that deviations from the average age difference are related to the actual age difference in each film.
Cor_2: The significant positive correlation implies a potential trend of casting older actors, which could alter the overall age demographics of movie casts. This could be an indicator of casting preferences or industry trends that favour older performers in lead roles.
Cor_3: The small negative correlation indicates that, while actor ages vary, they are not highly associated with the age of the second actor in the film. This could imply that variables other than the second actor’s age contribute to the difference in actor ages in films.
mean_age_difference <- mean(movie_data$age_difference)
sd_age_difference <- sd(movie_data$age_difference)
confidence_level <- 0.95
margin_of_error <- qt((1 - confidence_level) / 2, df = length(movie_data$age_difference) - 1) * (sd_age_difference / sqrt(length(movie_data$age_difference)))
lower_bound <- mean_age_difference - margin_of_error
upper_bound <- mean_age_difference + margin_of_error
print(paste("The", confidence_level * 100, "% confidence interval for age_difference is [", lower_bound, ",", upper_bound, "]"))
## [1] "The 95 % confidence interval for age_difference is [ 10.9156001627796 , 9.93288468570528 ]"
mean_age_difference <- mean(movie_data$mean_ages)
sd_age_difference <- sd(movie_data$mean_ages)
confidence_level <- 0.95
margin_of_error <- qt((1 - confidence_level) / 2, df = length(movie_data$mean_ages) - 1) * (sd_age_difference / sqrt(length(movie_data$mean_ages)))
lower_bound <- mean_age_difference - margin_of_error
upper_bound <- mean_age_difference + margin_of_error
print(paste("The", confidence_level * 100, "% confidence interval for mean_ages is [", lower_bound, ",", upper_bound, "]"))
## [1] "The 95 % confidence interval for mean_ages is [ 35.8863699212133 , 34.96038332554 ]"
mean_age_difference <- mean(movie_data$std_dev_ages)
sd_age_difference <- sd(movie_data$std_dev_ages)
confidence_level <- 0.95
margin_of_error <- qt((1 - confidence_level) / 2, df = length(movie_data$std_dev_ages) - 1) * (sd_age_difference / sqrt(length(movie_data$std_dev_ages)))
lower_bound <- mean_age_difference - margin_of_error
upper_bound <- mean_age_difference + margin_of_error
print(paste("The", confidence_level * 100, "% confidence interval for std_dev_ages is [", lower_bound, ",", upper_bound, "]"))
## [1] "The 95 % confidence interval for std_dev_ages is [ 7.71849489582241 , 7.02361011800621 ]"
1. Analyzing the practical consequences of the displayed mean age difference may reveal genre-specific patterns in age differences among actors, as well as potential shifts in casting preferences over time.
Exploring the factors that affect variances in the average age of all actors may show genre-specific age preferences as well as temporal changes in casting methods across different film genres and time periods.
Evaluating factors related with variations in the standard deviation of ages among actors could reveal genre-specific diversity in casting processes as well as potential trends in age differences across groups and time periods in the film industry.