setwd('C:/Users/nguye/ITEC4220/Project')
getwd()
## [1] "C:/Users/nguye/ITEC4220/Project"
nobel <- read.csv("nobel_prize.csv")
head(nobel, n=3)
## awardYear category
## 1 2001 Economic Sciences
## 2 1975 Physics
## 3 2004 Chemistry
## categoryFullName
## 1 The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel
## 2 The Nobel Prize in Physics
## 3 The Nobel Prize in Chemistry
## sortOrder portion prizeAmount prizeAmountAdjusted dateAwarded prizeStatus
## 1 2 1/3 10000000 12295082 2001-10-10 received
## 2 1 1/3 630000 3404179 1975-10-17 received
## 3 1 1/3 10000000 11762861 2004-10-06 received
## motivation
## 1 for their analyses of markets with asymmetric information
## 2 for the discovery of the connection between collective motion and particle motion in atomic nuclei and the development of the theory of the structure of the atomic nucleus based on this connection
## 3 for the discovery of ubiquitin-mediated protein degradation
## categoryTopMotivation
## 1
## 2
## 3
## award_link id
## 1 https://masterdataapi.nobelprize.org/2/nobelPrize/eco/2001 745
## 2 https://masterdataapi.nobelprize.org/2/nobelPrize/phy/1975 102
## 3 https://masterdataapi.nobelprize.org/2/nobelPrize/che/2004 779
## name knownName givenName familyName fullName
## 1 A. Michael Spence A. Michael Spence A. Michael Spence A. Michael Spence
## 2 Aage N. Bohr Aage N. Bohr Aage N. Bohr Aage Niels Bohr
## 3 Aaron Ciechanover Aaron Ciechanover Aaron Ciechanover Aaron Ciechanover
## penName gender laureate_link birth_date
## 1 male http://masterdataapi.nobelprize.org/2/laureate/745 1943-00-00
## 2 male http://masterdataapi.nobelprize.org/2/laureate/102 1922-06-19
## 3 male http://masterdataapi.nobelprize.org/2/laureate/779 1947-10-01
## birth_city birth_cityNow birth_continent birth_country
## 1 Montclair, NJ Montclair, NJ North America USA
## 2 Copenhagen Copenhagen Europe Denmark
## 3 Haifa Haifa Asia British Protectorate of Palestine
## birth_countryNow birth_locationString
## 1 USA Montclair, NJ, USA
## 2 Denmark Copenhagen, Denmark
## 3 Israel Haifa, British Protectorate of Palestine (now Israel)
## death_date death_city death_cityNow death_continent death_country
## 1
## 2 2009-09-08 Copenhagen Copenhagen Europe Denmark
## 3
## death_countryNow death_locationString orgName nativeName acronym
## 1
## 2 Denmark Copenhagen, Denmark
## 3
## org_founded_date org_founded_city org_founded_cityNow org_founded_continent
## 1
## 2
## 3
## org_founded_country org_founded_countryNow org_founded_locationString
## 1
## 2
## 3
## ind_or_org residence_1 residence_2
## 1 Individual
## 2 Individual
## 3 Individual
## affiliation_1 affiliation_2
## 1 Stanford University, Stanford, CA, USA
## 2 Niels Bohr Institute, Copenhagen, Denmark
## 3 Technion - Israel Institute of Technology, Haifa, Israel
## affiliation_3 affiliation_4
## 1
## 2
## 3
hist(nobel$prizeAmount/1000, breaks=5, main="Distribution of Nobel prize award amount",
xlab="Prize amount (SEK in thousands)", ylab="Number of laureates")
model <- lm(nobel$prizeAmountAdjusted ~ nobel$awardYear)
summary(model)
##
## Call:
## lm(formula = nobel$prizeAmountAdjusted ~ nobel$awardYear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3713005 -2821725 16254 2380253 5984762
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -93900372 5335282 -17.60 <2e-16 ***
## nobel$awardYear 50754 2706 18.75 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2800000 on 948 degrees of freedom
## Multiple R-squared: 0.2706, Adjusted R-squared: 0.2698
## F-statistic: 351.7 on 1 and 948 DF, p-value: < 2.2e-16
Slope: For each additional year, the adjusted Nobel prize amount increases by $50,754. Since the p-value is smaller than 0.05, we can reject the null hypothesis and conclude that there’s a statistically significant relationship between the year and the prize amount adjusted for inflation.
In other words, year is a valid indicator of the prize amount.
At the same time, the result of this regression analysis also demonstrates that year is not the only factor that has an influence on the prize amount as the value of R-squared (coefficient of determination) is only 0.2706. This suggests that a linear model only explains approximately 27% of the variation in the data (large residuals also imply this).
plot(nobel$awardYear, nobel$prizeAmountAdjusted/1000, pch=19, main="Correlation between award year and adjusted award amount", xlab="Prize amount (SEK in thousands)", ylab="Number of laureates", col="yellow")
mtext("Coefficient of determination = 0.2706", side=3, col="blue")