The nobel prize are the most well kwown prizes that are awarded yearly to those who contributed the most for the scientific of the world. Now there are 6 individuals selected in 6 different areas, which are Chemistry, Literature, Medicine, Physics, Economics and Peace. The first awared nobel prize was back in 1901, and at that time, it is just handed out to European or male-scientist. But for today, is that claim still true?

Now, we will find out the answer with the Nobel dataset of all prize winners from 1901 to 2016. Lets analyze it.

df<-read.csv("nobel.csv")
head(df)
##   year   category                                          prize
## 1 1901  Chemistry              The Nobel Prize in Chemistry 1901
## 2 1901 Literature             The Nobel Prize in Literature 1901
## 3 1901   Medicine The Nobel Prize in Physiology or Medicine 1901
## 4 1901      Peace                     The Nobel Peace Prize 1901
## 5 1901      Peace                     The Nobel Peace Prize 1901
## 6 1901    Physics                The Nobel Prize in Physics 1901
##                                                                                                                                                                                                                                           motivation
## 1                                                                                                 "in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions"
## 2                                                                "in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect"
## 3 "for his work on serum therapy, especially its application against diphtheria, by which he has opened a new road in the domain of medical science and thereby placed in the hands of the physician a victorious weapon against illness and deaths"
## 4                                                                                                                                                                                                                                               <NA>
## 5                                                                                                                                                                                                                                               <NA>
## 6                                                                                                                "in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him"
##   prize_share laureate_id laureate_type                    full_name birth_date
## 1         1/1         160    Individual Jacobus Henricus van 't Hoff 1852-08-30
## 2         1/1         569    Individual              Sully Prudhomme 1839-03-16
## 3         1/1         293    Individual       Emil Adolf von Behring 1854-03-15
## 4         1/2         462    Individual            Jean Henry Dunant 1828-05-08
## 5         1/2         463    Individual               Frédéric Passy 1822-05-20
## 6         1/1           1    Individual       Wilhelm Conrad Röntgen 1845-03-27
##           birth_city     birth_country  sex  organization_name
## 1          Rotterdam       Netherlands Male  Berlin University
## 2              Paris            France Male               <NA>
## 3  Hansdorf (Lawice)  Prussia (Poland) Male Marburg University
## 4             Geneva       Switzerland Male               <NA>
## 5              Paris            France Male               <NA>
## 6 Lennep (Remscheid) Prussia (Germany) Male  Munich University
##   organization_city organization_country death_date death_city death_country
## 1            Berlin              Germany 1911-03-01     Berlin       Germany
## 2              <NA>                 <NA> 1907-09-07   Châtenay        France
## 3           Marburg              Germany 1917-03-31    Marburg       Germany
## 4              <NA>                 <NA> 1910-10-30     Heiden   Switzerland
## 5              <NA>                 <NA> 1912-06-12      Paris        France
## 6            Munich              Germany 1923-02-10     Munich       Germany
#Dimension of the data
dim(df)
## [1] 911  18

Dataset contains 911 nobel prize winners with 18 different information of each person. Now we will find which country had the most Nobel prize for now.

#Count the number of Nobel Prize winners from each country

table<-table(df$birth_country)
table<-table[order(table,decreasing=TRUE)]
print(head(table,10))
## 
## United States of America           United Kingdom                  Germany 
##                      259                       85                       61 
##                   France                   Sweden                    Japan 
##                       51                       29                       24 
##                   Canada              Netherlands                    Italy 
##                       18                       18                       17 
##                   Russia 
##                       17

United States has the most nobel prize winnings, with 259 prizes, while the second highest nobel prizes winner is 85 prizes from United Kingdom. We see that USA has the the most winners, but why some people claim that Nobel Prize is used to awarded to Europeans. We will answer this question by looking for the year when USA began to dominate the rank.

#Round year to the decade
floor_decade=function(value){return(value-value %% 10)}
df$decade<-sapply(df$year,floor_decade)
#Number of time winning prize of USA each decade
us_winner<-df[which(df$birth_country=="United States of America"),]
us_winner<-data.frame(table(us_winner$decade))
#Number of prizes awarded each decade
a<-data.frame(table(df$decade))
#Finding the proportion of USA winners each decade
us_winner$prop<-us_winner$Freq/a$Freq
colnames(us_winner)<-c("decade","prizes","proportion")
print(us_winner)
##    decade prizes proportion
## 1    1900      1 0.01754386
## 2    1910      3 0.07500000
## 3    1920      4 0.07407407
## 4    1930     14 0.25000000
## 5    1940     13 0.30232558
## 6    1950     21 0.29166667
## 7    1960     21 0.26582278
## 8    1970     33 0.31730769
## 9    1980     31 0.31958763
## 10   1990     42 0.40384615
## 11   2000     52 0.42276423
## 12   2010     24 0.29268293
#Plot
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.2
ggplot(data=us_winner,aes(x=decade,y=proportion,group=1)) +geom_path()+
  geom_point() +labs(x="Decade",y="Proportion of USA prizes ",title="Number of prizes awarded to USA each decade")

From the plot, we can see that from the decade 1900 to 1920, just a few American scientists received the Nobel Prizes, but there was a rocket from 1920 to 1930 and this number kept increasing until the decade 2000. This could be explained by the World War I happened and there were an enormous immigration from Europe to USA and USA has started a heyday days in USA since then.

We have 6 areas to be honored every year, and we want to know which country usually has the good reputation for that area from year to year.

#Most Chemistry prize
a<-df[,c("category","birth_country")]
a<-data.frame(table(a))

chem<-a[which(a$category=="Chemistry"),]
chem<-chem[order(chem$Freq,decreasing = TRUE),]
head(chem)
##      category            birth_country Freq
## 691 Chemistry United States of America   52
## 685 Chemistry           United Kingdom   22
## 259 Chemistry                  Germany   19
## 235 Chemistry                   France    9
## 355 Chemistry                    Japan    6
## 151 Chemistry                   Canada    4

For Chemistry, United Kingdom and also Germany has a good reputation in Chemistry area. Especially, we have Japan, one country from Asia with 6 Nobel prizes in Chemistry.

#Most Chemistry prize
a<-df[,c("category","birth_country")]
a<-data.frame(table(a))

phy<-a[which(a$category=="Physics"),]
phy<-phy[order(phy$Freq,decreasing = TRUE),]
head(phy)
##     category            birth_country Freq
## 696  Physics United States of America   66
## 690  Physics           United Kingdom   22
## 264  Physics                  Germany   16
## 360  Physics                    Japan   11
## 426  Physics              Netherlands    9
## 240  Physics                   France    7

For Physics area, while USA kept leading the ranking, United Kingdom and Germany continued to contribute top two and three with 22 and 16 physicians. Moreover, Japan also ranked fourth with 11 physicians.

Next, we will figure out if are there more women to win Nobel prizes recent decades.

table(df$sex)
## 
## Female   Male 
##     49    836

There is a huge gap between man and woman. And we will find out which is the most awarded area of woman.

a<-df[,c("sex","category")]
a<-data.frame(table(a))
print(a)
##       sex   category Freq
## 1  Female  Chemistry    4
## 2    Male  Chemistry  171
## 3  Female  Economics    1
## 4    Male  Economics   77
## 5  Female Literature   14
## 6    Male Literature   99
## 7  Female   Medicine   12
## 8    Male   Medicine  199
## 9  Female      Peace   16
## 10   Male      Peace   88
## 11 Female    Physics    2
## 12   Male    Physics  202
library(ggplot2)
ggplot(data=a,aes(x=category,y=Freq,fill=sex)) + geom_bar(position="dodge", stat="identity")+labs(x="Areas",y="Number of times ",title="Number of prizes awarded to man and woman each field")

Female won the Nobel prizes in Literature, Medicine and Peace the most. While just one female won in Physics field and Economics field

library(lubridate)
## Warning: package 'lubridate' was built under R version 4.2.2
## Loading required package: timechange
## Warning: package 'timechange' was built under R version 4.2.2
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
df$birth_date<-as.Date(df$birth_date)
df$year_awarded<-df$year-year(ymd(df$birth_date))
#Age interval function
age_interval<-function(x){
  if(is.na(x)){
    return("0")
  }
  else{
  if(x<=29){
    return ("<29")}
  if(x<=39){
    return("30-39")
  }
    if(x<=49){
      return("40-49")
    }
    if(x<=59){
      return("50-59")
    }
    if(x<=69){
      return("60-69")
    }
    if(x<=100){
      return("Over 70")
    }
  }
}
df$age_interval<-sapply(df$year_awarded,age_interval)
age<-df[,c("sex","age_interval")]
age<-data.frame(table(age))
age<-age[-which(age$age_interval=="0"),]
print(age)
##       sex age_interval Freq
## 1  Female          <29    1
## 2    Male          <29    1
## 5  Female        30-39    7
## 6    Male        30-39   41
## 7  Female        40-49    6
## 8    Male        40-49  150
## 9  Female        50-59   13
## 10   Male        50-59  213
## 11 Female        60-69    8
## 12   Male        60-69  243
## 13 Female      Over 70   14
## 14   Male      Over 70  186
library(ggplot2)
ggplot(data=age,aes(x=age_interval,y=Freq,fill=sex)) + geom_bar(position="dodge", stat="identity")+labs(x="Age distribution",y="Number of prizes ",title="Age distribution between male and female")

We see that most of Nobel winners won their prizes when they were older than 40 years old. We can see the significance between age distribution of male winners. However, for female, there is a fluctuation, the number of female winners are nearly equal at each age interval. And only one male and female to win Nobel prize when there were under 29 years old. What an achievement!

Sometimes, some scientists/writers/activists not only won 1 time but many times. We will analyze to see who have won the Nobel prizes more than one time.

a<-df[,c("laureate_id")]
a<-data.frame(table(a))

more_prize<-df[df$laureate_id %in% a$a[a$Freq > 1],]
head(more_prize[,c("full_name","sex","category","laureate_id","laureate_type")])
##                                                                             full_name
## 20                                                        Marie Curie, née Sklodowska
## 63                                                        Marie Curie, née Sklodowska
## 90  Comité international de la Croix Rouge (International Committee of the Red Cross)
## 216 Comité international de la Croix Rouge (International Committee of the Red Cross)
## 279                                                                Linus Carl Pauling
## 284               Office of the United Nations High Commissioner for Refugees (UNHCR)
##        sex  category laureate_id laureate_type
## 20  Female   Physics           6    Individual
## 63  Female Chemistry           6    Individual
## 90    <NA>     Peace         482  Organization
## 216   <NA>     Peace         482  Organization
## 279   Male Chemistry         217    Individual
## 284   <NA>     Peace         515  Organization

Marie Curie is the only female scientists to win two Nobel Prizes in Physics and Chemistry until now.

And we look for who was the first woman and the latest woman to receive the Nobel Prize

female<-df[which(df$sex=="Female"),]
#The first woman to win the Nobel Prize
head(female[order(female$year,decreasing = FALSE),],1)
##    year category                           prize
## 20 1903  Physics The Nobel Prize in Physics 1903
##                                                                                                                                                        motivation
## 20 "in recognition of the extraordinary services they have rendered by their joint researches on the radiation phenomena discovered by Professor Henri Becquerel"
##    prize_share laureate_id laureate_type                   full_name birth_date
## 20         1/4           6    Individual Marie Curie, née Sklodowska 1867-11-07
##    birth_city           birth_country    sex organization_name
## 20     Warsaw Russian Empire (Poland) Female              <NA>
##    organization_city organization_country death_date death_city death_country
## 20              <NA>                 <NA> 1934-07-04 Sallanches        France
##    decade year_awarded age_interval
## 20   1900           36        30-39
#The latest woman to win the Nobel Prize
head(female[order(female$year,decreasing = TRUE),],1)
##     year   category                              prize
## 894 2015 Literature The Nobel Prize in Literature 2015
##                                                                         motivation
## 894 "for her polyphonic writings, a monument to suffering and courage in our time"
##     prize_share laureate_id laureate_type           full_name birth_date
## 894         1/1         924    Individual Svetlana Alexievich 1948-05-31
##          birth_city birth_country    sex organization_name organization_city
## 894 Ivano-Frankivsk       Ukraine Female              <NA>              <NA>
##     organization_country death_date death_city death_country decade
## 894                 <NA>       <NA>       <NA>          <NA>   2010
##     year_awarded age_interval
## 894           67        60-69

The first ever female to win the Nobel Prize was Marie Curie, she received the Nobel in Physics in 1903. And the latest woman to have this honor is Svetlana Alexievich. She received the Nobel Prize in Literature in 2015.

How about the prize shares. The Nobel prizes can be shared with more than 1 people. And we will figure out what is the proportion of the prize shared in the total of 911 given Nobel prizes and which area usually have more than one people to receive the Nobel Prize with the same research topic.

b<-df[,c("prize_share","category")]
a<-data.frame(table(b))
prop.table(table(b))
##            category
## prize_share   Chemistry   Economics  Literature    Medicine       Peace
##         1/1 0.069154775 0.026344676 0.115257958 0.042810099 0.072447859
##         1/2 0.058177827 0.039517014 0.008781559 0.079034029 0.063666301
##         1/3 0.049396268 0.019758507 0.000000000 0.092206367 0.006586169
##         1/4 0.015367728 0.000000000 0.000000000 0.017563117 0.000000000
##            category
## prize_share     Physics
##         1/1 0.051591658
##         1/2 0.086717892
##         1/3 0.052689352
##         1/4 0.032930845
ggplot(data=a,aes(x=prize_share,y=Freq,fill=category,label=Freq)) + geom_bar(position="stack", stat="identity")+labs(x="Prize share",y="Number of Nobel prizes ",title="Prize shares in 6 areas")

The Nobel Prizes in Literature are usually given to only one author, and just l 8 Literature Nobel Prizes are awarded to co-authors. On the other hand, Economics and Peace Nobel Prizes have never been shared with 4 winners.

We now analyze the age distribution of winning the Nobel Prizes from 1901 to 2016.

ggplot(data=df,aes(x=year,y=year_awarded)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 28 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 28 rows containing missing values (`geom_point()`).

From the plot, we see that since 1950, the age to win the Nobel Prize have increased slightly every year. But lets look deeper for each category for their age distribution to win the Nobel Prize.

ggplot(data=df,aes(x=year,y=year_awarded,color=category)) + geom_point() + geom_smooth() + facet_grid(rows = vars(category))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 28 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 28 rows containing missing values (`geom_point()`).

From the plot, we see that Economics is the latest Nobel Prize.

Moreover, we have the average age to win the Nobel Prize in Chemistry, and Physics have increased from around 50 to nearly 75 years old.

On the other hand, the activists to receive Nobel Prize in Peace tends to be younger.

Finally, we will find who is the youngest and oldest to win the Nobel Prizes.

#The oldest to win the Nobel Prize
head(df[order(df$year_awarded,decreasing = TRUE),],1)
##     year  category                                                 prize
## 794 2007 Economics The Sveriges Riksbank Prize in Economic Sciences 2007
##                                                       motivation prize_share
## 794 "for having laid the foundations of mechanism design theory"         1/3
##     laureate_id laureate_type      full_name birth_date birth_city
## 794         820    Individual Leonid Hurwicz 1917-08-21     Moscow
##     birth_country  sex       organization_name organization_city
## 794        Russia Male University of Minnesota   Minneapolis, MN
##         organization_country death_date      death_city
## 794 United States of America 2008-06-24 Minneapolis, MN
##                death_country decade year_awarded age_interval
## 794 United States of America   2000           90      Over 70
#The youngest to win the Nobel Prize
head(df[order(df$year_awarded,decreasing = FALSE),],1)
##     year category                      prize
## 886 2014    Peace The Nobel Peace Prize 2014
##                                                                                                                   motivation
## 886 "for their struggle against the suppression of children and young people and for the right of all children to education"
##     prize_share laureate_id laureate_type        full_name birth_date
## 886         1/2         914    Individual Malala Yousafzai 1997-07-12
##     birth_city birth_country    sex organization_name organization_city
## 886    Mingora      Pakistan Female              <NA>              <NA>
##     organization_country death_date death_city death_country decade
## 886                 <NA>       <NA>       <NA>          <NA>   2010
##     year_awarded age_interval
## 886           17          <29

The oldest to win the Nobel Prize is Leonid Hurwicz for his achivement in Economics. While, Malala Yousafzai won the Nobel Prize for Peace when she is exceptionally young, at 17 years old.

After analyzing dataset of Nobel Prize Winners, we have understood deeper insights of the Nobel prize. More, we can see the imbalance between male and female to win this honor prizes, or the age distribution to win the prizes in each area recent years.