Set up the working directory and load dataset

setwd('C:/Users/nguye/ITEC4220/Project')
getwd()
## [1] "C:/Users/nguye/ITEC4220/Project"
nobel <- read.csv("nobel_prize.csv")
head(nobel, n=3)
##   awardYear          category
## 1      2001 Economic Sciences
## 2      1975           Physics
## 3      2004         Chemistry
##                                                             categoryFullName
## 1 The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel
## 2                                                 The Nobel Prize in Physics
## 3                                               The Nobel Prize in Chemistry
##   sortOrder portion prizeAmount prizeAmountAdjusted dateAwarded prizeStatus
## 1         2     1/3    10000000            12295082  2001-10-10    received
## 2         1     1/3      630000             3404179  1975-10-17    received
## 3         1     1/3    10000000            11762861  2004-10-06    received
##                                                                                                                                                                                             motivation
## 1                                                                                                                                            for their analyses of markets with asymmetric information
## 2 for the discovery of the connection between collective motion and particle motion in atomic nuclei and the development of the theory of the structure of the atomic nucleus based on this connection
## 3                                                                                                                                          for the discovery of ubiquitin-mediated protein degradation
##   categoryTopMotivation
## 1                      
## 2                      
## 3                      
##                                                   award_link  id
## 1 https://masterdataapi.nobelprize.org/2/nobelPrize/eco/2001 745
## 2 https://masterdataapi.nobelprize.org/2/nobelPrize/phy/1975 102
## 3 https://masterdataapi.nobelprize.org/2/nobelPrize/che/2004 779
##                name         knownName  givenName  familyName          fullName
## 1 A. Michael Spence A. Michael Spence A. Michael      Spence A. Michael Spence
## 2      Aage N. Bohr      Aage N. Bohr    Aage N.        Bohr   Aage Niels Bohr
## 3 Aaron Ciechanover Aaron Ciechanover      Aaron Ciechanover Aaron Ciechanover
##   penName gender                                      laureate_link birth_date
## 1           male http://masterdataapi.nobelprize.org/2/laureate/745 1943-00-00
## 2           male http://masterdataapi.nobelprize.org/2/laureate/102 1922-06-19
## 3           male http://masterdataapi.nobelprize.org/2/laureate/779 1947-10-01
##      birth_city birth_cityNow birth_continent                     birth_country
## 1 Montclair, NJ Montclair, NJ   North America                               USA
## 2    Copenhagen    Copenhagen          Europe                           Denmark
## 3         Haifa         Haifa            Asia British Protectorate of Palestine
##   birth_countryNow                                  birth_locationString
## 1              USA                                    Montclair, NJ, USA
## 2          Denmark                                   Copenhagen, Denmark
## 3           Israel Haifa, British Protectorate of Palestine (now Israel)
##   death_date death_city death_cityNow death_continent death_country
## 1                                                                  
## 2 2009-09-08 Copenhagen    Copenhagen          Europe       Denmark
## 3                                                                  
##   death_countryNow death_locationString orgName nativeName acronym
## 1                                                                 
## 2          Denmark  Copenhagen, Denmark                           
## 3                                                                 
##   org_founded_date org_founded_city org_founded_cityNow org_founded_continent
## 1                                                                            
## 2                                                                            
## 3                                                                            
##   org_founded_country org_founded_countryNow org_founded_locationString
## 1                                                                      
## 2                                                                      
## 3                                                                      
##   ind_or_org residence_1 residence_2
## 1 Individual                        
## 2 Individual                        
## 3 Individual                        
##                                              affiliation_1 affiliation_2
## 1                   Stanford University, Stanford, CA, USA              
## 2                Niels Bohr Institute, Copenhagen, Denmark              
## 3 Technion - Israel Institute of Technology, Haifa, Israel              
##   affiliation_3 affiliation_4
## 1                            
## 2                            
## 3

Plot the award money distribution

hist(nobel$prizeAmount/1000, breaks=5, main="Distribution of Nobel prize award amount",
     xlab="Prize amount (SEK in thousands)", ylab="Number of laureates")

Regression analysis between adjusted prize amount and award year

model <- lm(nobel$prizeAmountAdjusted ~ nobel$awardYear)
summary(model)
## 
## Call:
## lm(formula = nobel$prizeAmountAdjusted ~ nobel$awardYear)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3713005 -2821725    16254  2380253  5984762 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -93900372    5335282  -17.60   <2e-16 ***
## nobel$awardYear     50754       2706   18.75   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2800000 on 948 degrees of freedom
## Multiple R-squared:  0.2706, Adjusted R-squared:  0.2698 
## F-statistic: 351.7 on 1 and 948 DF,  p-value: < 2.2e-16

Slope: For each additional year, the adjusted Nobel prize amount increases by $50,754. Since the p-value is smaller than 0.05, we can reject the null hypothesis and conclude that there’s a statistically significant relationship between the year and the prize amount adjusted for inflation.

In other words, year is a valid indicator of the prize amount.

At the same time, the result of this regression analysis also demonstrates that year is not the only factor that has an influence on the prize amount as the value of R-squared (coefficient of determination) is only 0.2706. This suggests that a linear model only explains approximately 27% of the variation in the data (large residuals also imply this).

Scatter plot to show the relationship between adjusted award amount and award year

plot(nobel$awardYear, nobel$prizeAmountAdjusted/1000, pch=19, main="Correlation between award year and adjusted award amount", xlab="Prize amount (SEK in thousands)", ylab="Number of laureates", col="yellow")
mtext("Coefficient of determination = 0.2706", side=3, col="blue")