Dataset Selection
I selected a dataset from 100+ Interesting Data Sets: Ecdat-dataset for econometrics package: http://www.rdocumentation.org/packages/ecdat
Chose the ‘University’ dataset: http://www.rdocumentation.org/packages/Ecdat/functions/University
Dataset Description
This dataset is part of a study from 1988 in the UK that looked at 62 universities and recorded a number of observations (17 in total) such as the number of undergrad and grad students, net assets, student fees, technicians, admin pay, research grants, academic pay, etc.
The source of the dataset: Glass, J.C., D.G. McKillop and N. Hyndman (1995) “Efficiency in the provision of university teaching and research : an empirical analysis of UK universities”, Journal of Applied Econometrics, 10(1), january-march, 61-72.
library("Ecdat", lib.loc="~/R/win-library/3.0")
data(University)
dim(University) #62 Universities, 17 observations
## [1] 62 17
summary(University)
## undstudents poststudents nassets acnumbers
## Min. : 0 Min. : 26.0 Min. : 2037 Min. : 48.0
## 1st Qu.: 2678 1st Qu.: 665.2 1st Qu.: 17707 1st Qu.: 383.2
## Median : 3828 Median : 958.5 Median : 32261 Median : 558.5
## Mean : 4373 Mean :1115.1 Mean : 55043 Mean : 753.9
## 3rd Qu.: 6336 3rd Qu.:1554.2 3rd Qu.: 59737 3rd Qu.:1008.8
## Max. :10035 Max. :3975.0 Max. :406564 Max. :2030.0
## acrelnum clernum compop techn
## Min. : 11.00 Min. : 17.0 Min. : 0.00 Min. : 10.0
## 1st Qu.: 91.25 1st Qu.:193.0 1st Qu.: 6.00 1st Qu.:111.6
## Median :140.00 Median :276.5 Median : 8.50 Median :172.0
## Mean :171.88 Mean :312.9 Mean : 13.17 Mean :227.0
## 3rd Qu.:229.75 3rd Qu.:401.6 3rd Qu.: 13.75 3rd Qu.:331.1
## Max. :658.00 Max. :845.5 Max. :200.00 Max. :639.5
## stfees acpay acrelpay secrpay
## Min. : 520 Min. : 806 Min. : 0 Min. : 189.0
## 1st Qu.: 4162 1st Qu.: 8243 1st Qu.: 1110 1st Qu.: 984.2
## Median : 6065 Median :10728 Median : 2248 Median :1648.5
## Mean : 7061 Mean :14390 Mean : 2937 Mean :1934.0
## 3rd Qu.: 9248 3rd Qu.:20360 3rd Qu.: 3885 3rd Qu.:2476.2
## Max. :18800 Max. :35253 Max. :10478 Max. :8667.0
## admpay agresrk furneq landbuild
## Min. : 221 Min. : 139.0 Min. : 89 Min. : 2583
## 1st Qu.:1086 1st Qu.: 972.5 1st Qu.:1166 1st Qu.: 16855
## Median :1538 Median :1718.1 Median :1764 Median : 24744
## Mean :1872 Mean :2406.6 Mean :2419 Mean : 38635
## 3rd Qu.:2586 3rd Qu.:3396.7 3rd Qu.:3338 3rd Qu.: 41737
## Max. :4705 Max. :9147.1 Max. :9400 Max. :362000
## resgr
## Min. : 121
## 1st Qu.: 3570
## Median : 6534
## Mean : 9237
## 3rd Qu.:12150
## Max. :40746
Selection of independent and dependent variables
I am interested in whether the value of research grants received in an institute (in British Pound Sterlings-pounds) can predict the pay of an academic at the institute (also in pounds).
Independent variable: Research Grants (pounds) “resgr”
Dependent variable: Academic pay (pounds) “acpay”
head(University[,c(17,10)])
## resgr acpay
## 1 2176 4889
## 2 1502 993
## 3 40746 30705
## 4 30300 31840
## 5 2075 10292
## 6 11352 15636
Null hypothesis
The null hypothesis is that there is no association/relationship between research grants received (pounds) and academic pay (pounds) in universities in the UK based on data obtained in 1988.
Linear model
The linear model is testing whether the variation in UK university research grants (“resgr”) received in the year 1988 (in pounds) can explain the variation in academic pay (“acpay”, also in pounds) within these universities.
attach(University)
lm<-lm(acpay~resgr)
Scattergram
plot(resgr,acpay,cex=1,pch=16,col="red", main="Academic pay vs. Research grants in UK universities in 1988", xlab="Research Grants received (pounds)", ylab="Academic pay (pounds)")
Regression line
plot(resgr,acpay,cex=1,pch=16,col="red", main="Academic pay vs. Research grants in UK universities in 1988", xlab="Research Grants received (pounds)", ylab="Academic pay (pounds)")
abline(lm$coef, lwd=2)
95% confidence intervals of the regression line, \(b_0\) and \(b_1\)
plot(resgr,acpay,cex=1,pch=16,col="red", main="Academic pay vs. Research grants in UK universities in 1988", xlab="Research Grants received (pounds)", ylab="Academic pay (pounds)")
abline(lm$coef, lwd=2)
ci<-confint(lm,level=0.95)
abline(ci[,1],lty=2,col="blue")
abline(ci[,2],lty=2,col="blue")
Summary of the model
summary(lm)
##
## Call:
## lm(formula = acpay ~ resgr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12055.8 -2419.1 -72.8 2137.3 12660.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.072e+03 8.364e+02 7.26 9.02e-10 ***
## resgr 9.004e-01 6.663e-02 13.51 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4460 on 60 degrees of freedom
## Multiple R-squared: 0.7527, Adjusted R-squared: 0.7486
## F-statistic: 182.6 on 1 and 60 DF, p-value: < 2.2e-16
Interpretation of statistical analysis \(b_0\), \(b_1\) and \(r\)
\(b_0\): Intercept is 6072; this means that in the absence of research grants, academic pay is 6072 pounds.
\(b_1\): Slope of the regression line is 0.9; an increase in research grants received by 1 pound is associated with a 0.9 pound increase in academic pay.
\(r\): The R-squared value is 0.7527, which represents the magnitude of the correlation between academic pay and research grants. In other words, variation in the independent variable, research grants explains 75.27% of the variation in the dependent variable, academic pay.
The p-value for both coefficients, \(b_0\) and \(b_1\) are significant at an alpha value of 0.05. Hence, at a 95% confidence interval our data suggests that there is an association between research grants received and academic pay.