DataM: Inclass Exercise 0316-1
input data
Read in a plain text file with variable names and assign a name to it.
checking data
Check the structure of data
'data.frame': 39 obs. of 3 variables:
$ math2: int 28 56 51 13 39 41 30 13 17 32 ...
$ math1: int 18 22 44 8 20 12 16 5 9 18 ...
$ cc : num 328 406 387 167 328 ...
descriptive statistics
plot data
# specify square plot region
par(pty="s")
# scatter plot of math2 by math1
plot(math2 ~ math1, data=dta, xlim=c(0, 60), ylim=c(0, 60),
xlab="Math score at Year 1", ylab="Math score at Year 2")
# add grid lines
grid() regression analysis
regress math2 by math1
Call:
lm(formula = math2 ~ math1, data = dta)
Residuals:
Min 1Q Median 3Q Max
-10.430 -5.521 -0.369 4.253 20.388
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.944 2.607 4.965 1.57e-05 ***
math1 1.030 0.152 6.780 5.57e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.255 on 37 degrees of freedom
Multiple R-squared: 0.5541, Adjusted R-squared: 0.542
F-statistic: 45.97 on 1 and 37 DF, p-value: 5.571e-08
par(pty="s")
plot(math2 ~ math1, data=dta, xlim=c(0, 60), ylim=c(0, 60),
xlab="Math score at Year 1", ylab="Math score at Year 2")
grid()
abline(dta.lm, lty=2) # add regression line
title("Mathematics Attainment") # add plot titlediagnostics
specify maximum plot region
par(pty="m")
plot(scale(resid(dta.lm)) ~ fitted(dta.lm),
ylim=c(-3.5, 3.5), type="n",
xlab="Fitted values", ylab="Standardized residuals")
text(fitted(dta.lm), scale(resid(dta.lm)), labels=rownames(dta), cex=0.5)
grid()
# add a horizontal red dash line
abline(h=0, lty=2, col="red") normality check
Shapiro-Wilk normality test
data: resid(dta.lm)
W = 0.9313, p-value = 0.01978
The shapiro test showed that our data did not correspond to the assumption of normality. However, since the degree is large enougth, we still can retain the result of the analysis.