The nobel prize are the most well kwown prizes that are awarded yearly to those who contributed the most for the scientific of the world. Now there are 6 individuals selected in 6 different areas, which are Chemistry, Literature, Medicine, Physics, Economics and Peace. The first awared nobel prize was back in 1901, and at that time, it is just handed out to European or male-scientist. But for today, is that claim still true?
Now, we will find out the answer with the Nobel dataset of all prize winners from 1901 to 2016. Lets analyze it.
df<-read.csv("nobel.csv")
head(df)
## year category prize
## 1 1901 Chemistry The Nobel Prize in Chemistry 1901
## 2 1901 Literature The Nobel Prize in Literature 1901
## 3 1901 Medicine The Nobel Prize in Physiology or Medicine 1901
## 4 1901 Peace The Nobel Peace Prize 1901
## 5 1901 Peace The Nobel Peace Prize 1901
## 6 1901 Physics The Nobel Prize in Physics 1901
## motivation
## 1 "in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions"
## 2 "in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect"
## 3 "for his work on serum therapy, especially its application against diphtheria, by which he has opened a new road in the domain of medical science and thereby placed in the hands of the physician a victorious weapon against illness and deaths"
## 4 <NA>
## 5 <NA>
## 6 "in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him"
## prize_share laureate_id laureate_type full_name birth_date
## 1 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30
## 2 1/1 569 Individual Sully Prudhomme 1839-03-16
## 3 1/1 293 Individual Emil Adolf von Behring 1854-03-15
## 4 1/2 462 Individual Jean Henry Dunant 1828-05-08
## 5 1/2 463 Individual Frédéric Passy 1822-05-20
## 6 1/1 1 Individual Wilhelm Conrad Röntgen 1845-03-27
## birth_city birth_country sex organization_name
## 1 Rotterdam Netherlands Male Berlin University
## 2 Paris France Male <NA>
## 3 Hansdorf (Lawice) Prussia (Poland) Male Marburg University
## 4 Geneva Switzerland Male <NA>
## 5 Paris France Male <NA>
## 6 Lennep (Remscheid) Prussia (Germany) Male Munich University
## organization_city organization_country death_date death_city death_country
## 1 Berlin Germany 1911-03-01 Berlin Germany
## 2 <NA> <NA> 1907-09-07 Châtenay France
## 3 Marburg Germany 1917-03-31 Marburg Germany
## 4 <NA> <NA> 1910-10-30 Heiden Switzerland
## 5 <NA> <NA> 1912-06-12 Paris France
## 6 Munich Germany 1923-02-10 Munich Germany
#Dimension of the data
dim(df)
## [1] 911 18
Dataset contains 911 nobel prize winners with 18 different information of each person. Now we will find which country had the most Nobel prize for now.
#Count the number of Nobel Prize winners from each country
table<-table(df$birth_country)
table<-table[order(table,decreasing=TRUE)]
print(head(table,10))
##
## United States of America United Kingdom Germany
## 259 85 61
## France Sweden Japan
## 51 29 24
## Canada Netherlands Italy
## 18 18 17
## Russia
## 17
United States has the most nobel prize winnings, with 259 prizes, while the second highest nobel prizes winner is 85 prizes from United Kingdom. We see that USA has the the most winners, but why some people claim that Nobel Prize is used to awarded to Europeans. We will answer this question by looking for the year when USA began to dominate the rank.
#Round year to the decade
floor_decade=function(value){return(value-value %% 10)}
df$decade<-sapply(df$year,floor_decade)
#Number of time winning prize of USA each decade
us_winner<-df[which(df$birth_country=="United States of America"),]
us_winner<-data.frame(table(us_winner$decade))
#Number of prizes awarded each decade
a<-data.frame(table(df$decade))
#Finding the proportion of USA winners each decade
us_winner$prop<-us_winner$Freq/a$Freq
colnames(us_winner)<-c("decade","prizes","proportion")
print(us_winner)
## decade prizes proportion
## 1 1900 1 0.01754386
## 2 1910 3 0.07500000
## 3 1920 4 0.07407407
## 4 1930 14 0.25000000
## 5 1940 13 0.30232558
## 6 1950 21 0.29166667
## 7 1960 21 0.26582278
## 8 1970 33 0.31730769
## 9 1980 31 0.31958763
## 10 1990 42 0.40384615
## 11 2000 52 0.42276423
## 12 2010 24 0.29268293
#Plot
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.2
ggplot(data=us_winner,aes(x=decade,y=proportion,group=1)) +geom_path()+
geom_point() +labs(x="Decade",y="Proportion of USA prizes ",title="Number of prizes awarded to USA each decade")
From the plot, we can see that from the decade 1900 to 1920, just a few American scientists received the Nobel Prizes, but there was a rocket from 1920 to 1930 and this number kept increasing until the decade 2000. This could be explained by the World War I happened and there were an enormous immigration from Europe to USA and USA has started a heyday days in USA since then.
We have 6 areas to be honored every year, and we want to know which country usually has the good reputation for that area from year to year.
#Most Chemistry prize
a<-df[,c("category","birth_country")]
a<-data.frame(table(a))
chem<-a[which(a$category=="Chemistry"),]
chem<-chem[order(chem$Freq,decreasing = TRUE),]
head(chem)
## category birth_country Freq
## 691 Chemistry United States of America 52
## 685 Chemistry United Kingdom 22
## 259 Chemistry Germany 19
## 235 Chemistry France 9
## 355 Chemistry Japan 6
## 151 Chemistry Canada 4
For Chemistry, United Kingdom and also Germany has a good reputation in Chemistry area. Especially, we have Japan, one country from Asia with 6 Nobel prizes in Chemistry.
#Most Chemistry prize
a<-df[,c("category","birth_country")]
a<-data.frame(table(a))
phy<-a[which(a$category=="Physics"),]
phy<-phy[order(phy$Freq,decreasing = TRUE),]
head(phy)
## category birth_country Freq
## 696 Physics United States of America 66
## 690 Physics United Kingdom 22
## 264 Physics Germany 16
## 360 Physics Japan 11
## 426 Physics Netherlands 9
## 240 Physics France 7
For Physics area, while USA kept leading the ranking, United Kingdom and Germany continued to contribute top two and three with 22 and 16 physicians. Moreover, Japan also ranked fourth with 11 physicians.
Next, we will figure out if are there more women to win Nobel prizes recent decades.
table(df$sex)
##
## Female Male
## 49 836
There is a huge gap between man and woman. And we will find out which is the most awarded area of woman.
a<-df[,c("sex","category")]
a<-data.frame(table(a))
print(a)
## sex category Freq
## 1 Female Chemistry 4
## 2 Male Chemistry 171
## 3 Female Economics 1
## 4 Male Economics 77
## 5 Female Literature 14
## 6 Male Literature 99
## 7 Female Medicine 12
## 8 Male Medicine 199
## 9 Female Peace 16
## 10 Male Peace 88
## 11 Female Physics 2
## 12 Male Physics 202
library(ggplot2)
ggplot(data=a,aes(x=category,y=Freq,fill=sex)) + geom_bar(position="dodge", stat="identity")+labs(x="Areas",y="Number of times ",title="Number of prizes awarded to man and woman each field")
Female won the Nobel prizes in Literature, Medicine and Peace the most. While just one female won in Physics field and Economics field
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.2.2
## Loading required package: timechange
## Warning: package 'timechange' was built under R version 4.2.2
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
df$birth_date<-as.Date(df$birth_date)
df$year_awarded<-df$year-year(ymd(df$birth_date))
#Age interval function
age_interval<-function(x){
if(is.na(x)){
return("0")
}
else{
if(x<=29){
return ("<29")}
if(x<=39){
return("30-39")
}
if(x<=49){
return("40-49")
}
if(x<=59){
return("50-59")
}
if(x<=69){
return("60-69")
}
if(x<=100){
return("Over 70")
}
}
}
df$age_interval<-sapply(df$year_awarded,age_interval)
age<-df[,c("sex","age_interval")]
age<-data.frame(table(age))
age<-age[-which(age$age_interval=="0"),]
print(age)
## sex age_interval Freq
## 1 Female <29 1
## 2 Male <29 1
## 5 Female 30-39 7
## 6 Male 30-39 41
## 7 Female 40-49 6
## 8 Male 40-49 150
## 9 Female 50-59 13
## 10 Male 50-59 213
## 11 Female 60-69 8
## 12 Male 60-69 243
## 13 Female Over 70 14
## 14 Male Over 70 186
library(ggplot2)
ggplot(data=age,aes(x=age_interval,y=Freq,fill=sex)) + geom_bar(position="dodge", stat="identity")+labs(x="Age distribution",y="Number of prizes ",title="Age distribution between male and female")
We see that most of Nobel winners won their prizes when they were older than 40 years old. We can see the significance between age distribution of male winners. However, for female, there is a fluctuation, the number of female winners are nearly equal at each age interval. And only one male and female to win Nobel prize when there were under 29 years old. What an achievement!
Sometimes, some scientists/writers/activists not only won 1 time but many times. We will analyze to see who have won the Nobel prizes more than one time.
a<-df[,c("laureate_id")]
a<-data.frame(table(a))
more_prize<-df[df$laureate_id %in% a$a[a$Freq > 1],]
head(more_prize[,c("full_name","sex","category","laureate_id","laureate_type")])
## full_name
## 20 Marie Curie, née Sklodowska
## 63 Marie Curie, née Sklodowska
## 90 Comité international de la Croix Rouge (International Committee of the Red Cross)
## 216 Comité international de la Croix Rouge (International Committee of the Red Cross)
## 279 Linus Carl Pauling
## 284 Office of the United Nations High Commissioner for Refugees (UNHCR)
## sex category laureate_id laureate_type
## 20 Female Physics 6 Individual
## 63 Female Chemistry 6 Individual
## 90 <NA> Peace 482 Organization
## 216 <NA> Peace 482 Organization
## 279 Male Chemistry 217 Individual
## 284 <NA> Peace 515 Organization
Marie Curie is the only female scientists to win two Nobel Prizes in Physics and Chemistry until now.
And we look for who was the first woman and the latest woman to receive the Nobel Prize
female<-df[which(df$sex=="Female"),]
#The first woman to win the Nobel Prize
head(female[order(female$year,decreasing = FALSE),],1)
## year category prize
## 20 1903 Physics The Nobel Prize in Physics 1903
## motivation
## 20 "in recognition of the extraordinary services they have rendered by their joint researches on the radiation phenomena discovered by Professor Henri Becquerel"
## prize_share laureate_id laureate_type full_name birth_date
## 20 1/4 6 Individual Marie Curie, née Sklodowska 1867-11-07
## birth_city birth_country sex organization_name
## 20 Warsaw Russian Empire (Poland) Female <NA>
## organization_city organization_country death_date death_city death_country
## 20 <NA> <NA> 1934-07-04 Sallanches France
## decade year_awarded age_interval
## 20 1900 36 30-39
#The latest woman to win the Nobel Prize
head(female[order(female$year,decreasing = TRUE),],1)
## year category prize
## 894 2015 Literature The Nobel Prize in Literature 2015
## motivation
## 894 "for her polyphonic writings, a monument to suffering and courage in our time"
## prize_share laureate_id laureate_type full_name birth_date
## 894 1/1 924 Individual Svetlana Alexievich 1948-05-31
## birth_city birth_country sex organization_name organization_city
## 894 Ivano-Frankivsk Ukraine Female <NA> <NA>
## organization_country death_date death_city death_country decade
## 894 <NA> <NA> <NA> <NA> 2010
## year_awarded age_interval
## 894 67 60-69
The first ever female to win the Nobel Prize was Marie Curie, she received the Nobel in Physics in 1903. And the latest woman to have this honor is Svetlana Alexievich. She received the Nobel Prize in Literature in 2015.
How about the prize shares. The Nobel prizes can be shared with more than 1 people. And we will figure out what is the proportion of the prize shared in the total of 911 given Nobel prizes and which area usually have more than one people to receive the Nobel Prize with the same research topic.
b<-df[,c("prize_share","category")]
a<-data.frame(table(b))
prop.table(table(b))
## category
## prize_share Chemistry Economics Literature Medicine Peace
## 1/1 0.069154775 0.026344676 0.115257958 0.042810099 0.072447859
## 1/2 0.058177827 0.039517014 0.008781559 0.079034029 0.063666301
## 1/3 0.049396268 0.019758507 0.000000000 0.092206367 0.006586169
## 1/4 0.015367728 0.000000000 0.000000000 0.017563117 0.000000000
## category
## prize_share Physics
## 1/1 0.051591658
## 1/2 0.086717892
## 1/3 0.052689352
## 1/4 0.032930845
ggplot(data=a,aes(x=prize_share,y=Freq,fill=category,label=Freq)) + geom_bar(position="stack", stat="identity")+labs(x="Prize share",y="Number of Nobel prizes ",title="Prize shares in 6 areas")
The Nobel Prizes in Literature are usually given to only one author, and just l 8 Literature Nobel Prizes are awarded to co-authors. On the other hand, Economics and Peace Nobel Prizes have never been shared with 4 winners.
We now analyze the age distribution of winning the Nobel Prizes from 1901 to 2016.
ggplot(data=df,aes(x=year,y=year_awarded)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 28 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 28 rows containing missing values (`geom_point()`).
From the plot, we see that since 1950, the age to win the Nobel Prize
have increased slightly every year. But lets look deeper for each
category for their age distribution to win the Nobel Prize.
ggplot(data=df,aes(x=year,y=year_awarded,color=category)) + geom_point() + geom_smooth() + facet_grid(rows = vars(category))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 28 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 28 rows containing missing values (`geom_point()`).
From the plot, we see that Economics is the latest Nobel Prize.
Moreover, we have the average age to win the Nobel Prize in Chemistry, and Physics have increased from around 50 to nearly 75 years old.
On the other hand, the activists to receive Nobel Prize in Peace tends to be younger.
Finally, we will find who is the youngest and oldest to win the Nobel Prizes.
#The oldest to win the Nobel Prize
head(df[order(df$year_awarded,decreasing = TRUE),],1)
## year category prize
## 794 2007 Economics The Sveriges Riksbank Prize in Economic Sciences 2007
## motivation prize_share
## 794 "for having laid the foundations of mechanism design theory" 1/3
## laureate_id laureate_type full_name birth_date birth_city
## 794 820 Individual Leonid Hurwicz 1917-08-21 Moscow
## birth_country sex organization_name organization_city
## 794 Russia Male University of Minnesota Minneapolis, MN
## organization_country death_date death_city
## 794 United States of America 2008-06-24 Minneapolis, MN
## death_country decade year_awarded age_interval
## 794 United States of America 2000 90 Over 70
#The youngest to win the Nobel Prize
head(df[order(df$year_awarded,decreasing = FALSE),],1)
## year category prize
## 886 2014 Peace The Nobel Peace Prize 2014
## motivation
## 886 "for their struggle against the suppression of children and young people and for the right of all children to education"
## prize_share laureate_id laureate_type full_name birth_date
## 886 1/2 914 Individual Malala Yousafzai 1997-07-12
## birth_city birth_country sex organization_name organization_city
## 886 Mingora Pakistan Female <NA> <NA>
## organization_country death_date death_city death_country decade
## 886 <NA> <NA> <NA> <NA> 2010
## year_awarded age_interval
## 886 17 <29
The oldest to win the Nobel Prize is Leonid Hurwicz for his achivement in Economics. While, Malala Yousafzai won the Nobel Prize for Peace when she is exceptionally young, at 17 years old.
After analyzing dataset of Nobel Prize Winners, we have understood deeper insights of the Nobel prize. More, we can see the imbalance between male and female to win this honor prizes, or the age distribution to win the prizes in each area recent years.