Question 1
Use the dataset about marks from Statistics 500 one year at the University of Michigan from package “faraway”, and complete the following questions.
1. Draw a scatter plot of midterm and final
library("faraway") #import 套件
## Warning: package 'faraway' was built under R version 3.6.3
grade_data<-faraway::stat500 #將dataset指定給grade_data
plot(x=grade_data$midterm, y=grade_data$final,col="blue",main="The scatter of midterm and final test grade",xlab="Midterm grade",ylab="Final grade") #畫scatter

2. Determine a line to represent the data by guessing theintercept and slope.
plot(x=grade_data$midterm, y=grade_data$final,col="blue",main="The scatter of midterm and final test grade",xlab="Midterm grade",ylab="Final grade")
legends_coord <- locator(2) #在圖上point out兩個位置並貯存兩點的座標在變數legends_coord中
x<-legends_coord$x #將兩點的x座標給定給變數x
y<-legends_coord$y #將兩點的y座標給定給變數y
segments(9.505664, 15.81806, 29.406350, 36.62465, col= 'blue') #將兩點連線

3. Estimate the simple linear regression line.
result<-lm(final~midterm,data=grade_data) #建立簡單線性回歸模型
summary(result) #模型配適的結果
##
## Call:
## lm(formula = final ~ midterm, data = grade_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.932 -2.657 0.527 2.984 9.286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.0462 2.4822 6.062 1.44e-07 ***
## midterm 0.5633 0.1190 4.735 1.67e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.192 on 53 degrees of freedom
## Multiple R-squared: 0.2973, Adjusted R-squared: 0.284
## F-statistic: 22.42 on 1 and 53 DF, p-value: 1.675e-05
4. Superimpose your guessed and estimated lines on the plot.
windows()
plot(x=grade_data$midterm, y=grade_data$final,col="blue",main="The scatter of midterm and final test grade",xlab="Midterm grade",ylab="Final grade")
segments(9.505664, 15.81806, 29.406350, 36.62465, col= 'blue')
abline(result,col="red")
legend(x=c(8,18),y=c(35,38),legend=c("The line we determined","regression fitted line"),col=c("blue","red"),lty=c(1,1),cex=0.8)

5. Make sure that the SSE computed based on the line we choose intuitively is less than SSE computed based on the OLS.
f_x<-function(x){15.81806-((36.62465-15.81806)/(29.406350-9.505664))*9.505664+((36.62465-15.81806)/(29.406350-9.505664))*x}#計算出通過兩點的線性方程
estimate_final_grade<-f_x(grade_data$midterm)#將上述方程當作是期末考成績的估計式,並帶入期中考成績得到期末考成績的估計值。
SSE_based_on_f_x<-sum(grade_data$final-f_x(grade_data$midterm))^2
SSE_based_on_OLS<-sum((result$residuals)^2)
paste0("SSE_based_on_f_x = ", SSE_based_on_f_x," SSE_based_on_OLS =",SSE_based_on_OLS)
## [1] "SSE_based_on_f_x = 1207.75293089733 SSE_based_on_OLS =931.285368085807"