Data Introduction & Description
Questions Explored:
-How are women represented in Nobel Prize recipient data & what
categories are they awarded? How does age factor into Nobel Prize
winners?
Data Notes
This dataset looks at counts, but for both male and female nobel
prize data recipients, there are multiple individuals who have won the
award twice, I did not remove these multiple wins. Also, I removed any
instances where the nobel prize went to organizations instead of an
individual.
Data Cleaning
Data Structure
Subject Matter: Nobel Prize Winners by Name
Number of Rows: 884
# of Columns: 12
Columns I utilized: ID, firstname, surname, birth year, gender,
winner age
Winner Ages
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.00 50.00 60.00 59.43 69.00 90.00

Summary Statistics for Winner Age
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.00 50.00 60.00 59.43 69.00 90.00
## [1] 59.4276
## [1] 12.3864
## [1] 153.4228
## [1] 884
## 25% 50% 75%
## 50 60 69
Nobel Prize Ages Histogram

Bivariate Comparison of 2 Variables


Exploring Gender in Dataset
Gender Counts
|
Gender
|
Count
|
|
female
|
49
|
|
male
|
835
|


Exploring Gender and Category Variables

Chi-Square Test
In exploring my data, I wanted to see if there was statistical
significance in the Categories women will win a Nobel Prize in.
According to my Chi-Square and small p-value their is evidence against
the null hypothesis and that female gender and category of the Nobel
prize is not independent of each other.
##
## Pearson's Chi-squared test
##
## data: cont_table
## X-squared = 43.898, df = 5, p-value = 2.429e-08
ANOVA for Age of Winner and Gender
I also wanted to look at an ANOVA for this data, looking to see if
their was any significant differnce between the mean ages of winners
between the female and male groups.The p-value is 0.407, which is
greater than the typical significance level of 0.05. Therefore, we fail
to reject the null hypothesis and conclude that there is no significant
difference in the mean ages of winners between the “female” and “male”
groups.
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 106 105.7 0.689 0.407
## Residuals 882 135367 153.5
## Call:
## aov(formula = Winner.Age ~ gender, data = df)
##
## Terms:
## gender Residuals
## Sum of Squares 105.72 135366.64
## Deg. of Freedom 1 882
##
## Residual standard error: 12.38858
## Estimated effects may be unbalanced
Conclusions
As expected, women are underrepresented across all categories of
Nobel Prizes and there is statistical significance between the nobel
prize category and gender. There is no significant difference in the
mean ages of winners between the “female” and “male” groups.
If I continued with this dataset in the future, I’d like to look at
the emergence of female nobel prize winners over time.