Nobel Prize Data_Final Project

Questions Explored:

-How are women represented in Nobel Prize recipient data & what categories are they awarded? How does age factor into Nobel Prize winners?

Data Sourcing

I found my data on Kaggl http://www.kaggle.com/datasets/thedevastator/a-complete-history-of-nobel-prize-winners?resource=download and utilized wikipedia to fill in any missing dates in this data set.

Data Notes

This dataset looks at counts, but for both male and female nobel prize data recipients, there are multiple individuals who have won the award twice, I did not remove these multiple wins. Also, I removed any instances where the nobel prize went to organizations instead of an individual.

Data Cleaning

For my data cleaning process, I uploaded my data into excel. I am excluding rows that are not exclusive to a person (any organizational winners have been removed), I also had to remove duplicates, join birth year into this data set and fill in additional missing information. I discovered some errors in my data set that needed to be corrected as I looked at the summary statistics.

Data Structure

Subject Matter: Nobel Prize Winners by Name

Number of Rows: 884

# of Columns: 12

Columns I utilized: ID, firstname, surname, birth year, gender, winner age

Winner Ages

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.00   50.00   60.00   59.43   69.00   90.00

Summary Statistics for Winner Age

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.00   50.00   60.00   59.43   69.00   90.00

## [1] 59.4276

## [1] 12.3864

## [1] 153.4228

## [1] 884

## 25% 50% 75% 
##  50  60  69

Nobel Prize Ages Histogram

Bivariate Comparison of 2 Variables

Exploring Gender in Dataset

Gender Counts

Gender	Count
female	49
male	835

Exploring Gender and Category Variables

Chi-Square Test

In exploring my data, I wanted to see if there was statistical significance in the Categories women will win a Nobel Prize in. According to my Chi-Square and small p-value their is evidence against the null hypothesis and that female gender and category of the Nobel prize is not independent of each other.

## 
##  Pearson's Chi-squared test
## 
## data:  cont_table
## X-squared = 43.898, df = 5, p-value = 2.429e-08

ANOVA for Age of Winner and Gender

I also wanted to look at an ANOVA for this data, looking to see if their was any significant differnce between the mean ages of winners between the female and male groups.The p-value is 0.407, which is greater than the typical significance level of 0.05. Therefore, we fail to reject the null hypothesis and conclude that there is no significant difference in the mean ages of winners between the “female” and “male” groups.

##              Df Sum Sq Mean Sq F value Pr(>F)
## gender        1    106   105.7   0.689  0.407
## Residuals   882 135367   153.5

## Call:
##    aov(formula = Winner.Age ~ gender, data = df)
## 
## Terms:
##                    gender Residuals
## Sum of Squares     105.72 135366.64
## Deg. of Freedom         1       882
## 
## Residual standard error: 12.38858
## Estimated effects may be unbalanced

Conclusions

As expected, women are underrepresented across all categories of Nobel Prizes and there is statistical significance between the nobel prize category and gender. There is no significant difference in the mean ages of winners between the “female” and “male” groups.

Nobel Prize Data_Final Project

Allie Glover

2023-06-30

Data Introduction & Description

Questions Explored:

Data Sourcing

I found my data on Kaggl http://www.kaggle.com/datasets/thedevastator/a-complete-history-of-nobel-prize-winners?resource=download and utilized wikipedia to fill in any missing dates in this data set.

Data Notes

This dataset looks at counts, but for both male and female nobel prize data recipients, there are multiple individuals who have won the award twice, I did not remove these multiple wins. Also, I removed any instances where the nobel prize went to organizations instead of an individual.

Data Cleaning

Data Structure

Subject Matter: Nobel Prize Winners by Name

Number of Rows: 884

# of Columns: 12

Columns I utilized: ID, firstname, surname, birth year, gender, winner age

Winner Ages

Summary Statistics for Winner Age

Nobel Prize Ages Histogram

Bivariate Comparison of 2 Variables

Exploring Gender in Dataset

Gender Counts

Exploring Gender and Category Variables

Chi-Square Test

ANOVA for Age of Winner and Gender

Conclusions

As expected, women are underrepresented across all categories of Nobel Prizes and there is statistical significance between the nobel prize category and gender. There is no significant difference in the mean ages of winners between the “female” and “male” groups.

If I continued with this dataset in the future, I’d like to look at the emergence of female nobel prize winners over time.