--- 
title: "Datadive2"
output: html_document
---

Load the data

data <- read.csv ("C:\\Users\\varsh\\OneDrive\\Desktop\\Gitstuff\\age_gaps.CSV")

Numeric Summary for 1st Column

summary_C1 <- summary(data$actor_1_age)

print(summary_C1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   33.00   39.00   40.64   47.00   81.00

Numeric Summary for 2nd Column

summary_C2 <- summary(data$actor_2_age)

print(summary_C2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.00   25.00   29.00   30.21   34.00   68.00

Categorical summary for 3rd Column

unique_values_C3 <- unique(data$character_1_gender)

count_values_C3 <- table(data$character_1_gender)

cat("Categorical Summary for character_1_gender:\n")
## Categorical Summary for character_1_gender:
print(data.frame(Value = unique_values_C3, Count = count_values_C3))
##   Value Count.Var1 Count.Freq
## 1 woman        man        941
## 2   man      woman        214

Hypothesis

  1. Is there a statistically significant relationship between Woody Allen’s directing involvement and the average age gap between the characters in his films?

  2. Can we see a pattern in which older male actors are regularly cast opposite female actors with a notable age difference in romantic movie couples?

  3. Are there apparent changes in the average age gap between characters in romance films, and may these changes be linked to different time periods or decades of film production?

Aggregate Function

Most of the Woody Allen’s movies have an average age difference of 20.15, this proves that there is a significant age difference between the characters.

woody_allen_movies <- subset(data,director == "Woody Allen")


average_age_difference_woody_allen <- mean(woody_allen_movies$age_difference)


cat("Average Age Difference for Woody Allen's Movies:", round(average_age_difference_woody_allen, 2))
## Average Age Difference for Woody Allen's Movies: 20.15

Visual Summaries

library(ggplot2)


ggplot(data, aes(x = age_difference)) +
  geom_histogram(binwidth = 5, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Histogram of Age Differences in Romantic Movies",
       x = "Age Difference",
       y = "Frequency") +
  theme_minimal()

ggplot(data, aes(x = director, y = age_difference, fill = director)) +
  geom_boxplot() +
  labs(title = "Boxplot of Age Differences by Director",
       x = "Director",
       y = "Age Difference") +
  theme_minimal()

ggplot(data, aes(x = character_1_gender)) +
  geom_bar(fill = "skyblue", color = "black") +
  labs(title = "Distribution of Character 1 Gender", x = "Gender", y = "Count")

Numeric Summary for 1st Column

Insights

The summary statistics for actor_1_age show that actor 1 in the sample has a median age of 39 years. This means that 50% of the occurrences had actor 1 aged 39 or younger, highlighting the fact that a sizable chunk of the sample has relatively middle-aged actors in main roles.

Significance

The median age serves as a central tendency for the distribution of actor ages, implying that the dataset tends to represent romantic relationships in which at least one of the protagonists is in their late thirties. This insight may influence perceptions of the age dynamics depicted in romance films, as well as reflect trends or preferences in casting decisions.

Further questions

  1. Are there any directors or genres that are connected with a greater median age for actor 1?

  2. How has the median age of actor 1 changed over the decades?

  3. Are there significant differences in median age depending on the gender of character 1?

Numeric Summary for 2nd Column

Insight

The summary statistics for actor_2_age show that actor 2’s median age in the dataset is 29 years. This means that 50% of the instances contain two actors aged 29 or younger, implying a preference for younger actors in supporting parts in romance films.

Significance

The median age of actor 2 is 29, indicating a rather young age profile for supporting parts. This could represent an industry tendency of pairing younger performers in supporting parts, or it could correlate with public preferences for specific age dynamics in romantic relationships on film.

Further questions

  1. How does actor 2’s median age compare to that of actor 1?

  2. Are there any certain genres or directors that have a greater median age for actor 2?

  3. Does the median age of actor 2 vary over time or decades?

Categorical summary for 3rd Column

Insight

The categorical summary for character_1_gender demonstrates that the dataset is dominated by relationships where the first character’s gender is “woman” (941 instances), as opposed to instances where the first character’s gender is “man” (214 instances). This implies a significant gender imbalance in love films, with female characters taking the primary roles.

Significance

The large numerical discrepancy between the counts indicates a widespread tendency in the portrayal of gender dynamics in love films. This could be due to industry practices, audience preferences, or storytelling conventions that frequently place female characters at the center of romance stories.

Further questions

  1. How does the distribution of gender pairs change among directors or genres?

  2. Are there any specific historical periods or decades in which the gender dynamics in love films change significantly?

  3. Is there a link between gender pairings and the general theme or tone of the films?

Visual Summaries

Insight

Histogram of age differences:

The histogram shows that the bulk of age gaps in romantic movies fall between 0 and 20 years, with a peak at 10 years. This shows that many films portray relationships with minor age gaps.

A boxplot illustrating age differences by director:

The boxplot shows how different directors manage age differences in romance movies. Some directors have a greater range of ages, while others prefer to keep them more consistent. Woody Allen’s films, for example, exhibit a notable age disparity.

Character 1 Gender Distribution:

The bar chart shows that female characters (women) have a larger representation as Character 1 in romantic films. This could imply a recurring pattern or preference in the presentation of lead characters in romantic relationships.

Significance

Age dynamics in romantic movies:

The histogram demonstrates that directors frequently chose characters with intermediate age disparities in romance films. This could be a deliberate approach to appeal to a large audience and establish relatable interactions on television.

Directorial Influence on Age Differences:

The box plot illustrates that filmmakers may have different approaches to depicting age differences. Woody Allen’s films, for example, have a broader scope, indicating a potential hallmark in his storytelling method.

Gender Representation in Lead Positions:

The character gender distribution bar chart represents a common pattern in romance films where female characters are the main focus. Understanding this trend could provide information into audience preferences or cinematic traditions.

Further questions

  1. The Impact of Age disparities on Movie Success: Does the range of age disparities in romance movies connect with the films’ overall success or reception? Are specific age groups more enticing to audiences

  2. Evolution Through Decades: How have age gaps in romantic movies changed over time? Are there any notable changes in trends or preferences?

  3. Gender Dynamics in Romantic Relationships: Beyond character one, how do gender dynamics play out in the relationships depicted in these films? Are there any patterns in the roles allotted to male and female characters in romantic novels ?