data <- read.csv ("C:\\Users\\91630\\OneDrive\\Desktop\\statistics\\age_gaps.CSV")
age_gaps_df <- read.csv("age_gaps.CSV")


str(age_gaps_df)
## 'data.frame':    1155 obs. of  13 variables:
##  $ movie_name        : chr  "Harold and Maude" "Venus" "The Quiet American" "The Big Lebowski" ...
##  $ release_year      : int  1971 2006 2002 1998 2010 1992 2009 1999 1992 1999 ...
##  $ director          : chr  "Hal Ashby" "Roger Michell" "Phillip Noyce" "Joel Coen" ...
##  $ age_difference    : int  52 50 49 45 43 42 40 39 38 38 ...
##  $ couple_number     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ actor_1_name      : chr  "Ruth Gordon" "Peter O'Toole" "Michael Caine" "David Huddleston" ...
##  $ actor_2_name      : chr  "Bud Cort" "Jodie Whittaker" "Do Thi Hai Yen" "Tara Reid" ...
##  $ character_1_gender: chr  "woman" "man" "man" "man" ...
##  $ character_2_gender: chr  "man" "woman" "woman" "woman" ...
##  $ actor_1_birthdate : chr  "1896-10-30" "1932-08-02" "1933-03-14" "1930-09-17" ...
##  $ actor_2_birthdate : chr  "1948-03-29" "1982-06-03" "1982-10-01" "1975-11-08" ...
##  $ actor_1_age       : int  75 74 69 68 81 59 62 69 57 77 ...
##  $ actor_2_age       : int  23 24 20 23 38 17 22 30 19 39 ...

Three column that are unclear

Reasons for Data Encoding:

Unclear Element After Reading Documentation:

Documentation Gaps:

Visualization Highlighting the Issue:

library(ggplot2)
# Plotting histogram
ggplot(age_gaps_df, aes(x = age_difference)) +
  geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Age Differences Between Characters",
       x = "Age Difference",
       y = "Frequency") +
  theme_minimal()

This visualization provides insights into the range and distribution of age differences between characters across the movies in the dataset. It helps identify common age disparities and outliers, allowing for a better understanding of how age dynamics are represented in cinema.

Significant Risks and Mitigation:

Insight:

Further questions