2025-02-14
a you can read off the graph (lets say 2)
\[b = \frac{change in y}{change in x}\] x = (9-4)/(7.5-2.5) =1
\[ y = 2 + x \]
b <- seq(-1.43,-1,0.002)
sse <- numeric(length(b))
for (i in 1:length(b)) {
a <- mean(reg.data$growth)-b[i]*mean(reg.data$tannin)
residual <- reg.data$growth - a - b[i]*reg.data$tannin
sse[i] <- sum(residual^2)
}
plot(b,sse,type="l",ylim=c(19,24))
arrows(-1.216,20.07225,-1.216,19,col="red")
abline(h=20.07225,col="green",lty=2)
lines(b,sse)
print(b[which(sse==min(sse))])
[1] -1.216
\[ y = a + bx\\a = y - bx \]
The line has to got through the mean of y (6.9) and x (4)
\[ a = \bar y - b\bar x \]
We know everything on the left hand side, so can calculate a
\[ a = 6.9-(-1.2 \times 4)\\ = 6.9 + 4.8\\= 11.7\]
Therefore we can write the equation of the line from the parameters we have calculated (a and b).
\[ y= 11.7 -1.2x \]
Call:
lm(formula = reg.data$growth ~ reg.data$tannin)
Residuals:
Min 1Q Median 3Q Max
-2.4556 -0.8889 -0.2389 0.9778 2.8944
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.7556 1.0408 11.295 9.54e-06 ***
reg.data$tannin -1.2167 0.2186 -5.565 0.000846 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.693 on 7 degrees of freedom
Multiple R-squared: 0.8157, Adjusted R-squared: 0.7893
F-statistic: 30.97 on 1 and 7 DF, p-value: 0.0008461
\[ y = 11.75 - 1.2x \] Tanin levels affect growth (Regression: \(R^2\) = 0.79, \(F_{1,7}\) = 30.97, p = \(0.0009\))
bs1040marks <- read.csv("~/Dropbox/Teaching/first_year_stats/lectures/5.regressions/bs1040marks.csv")
bs1040_model<-lm(real~mock, data = bs1040marks)
summary(bs1040_model)
Call:
lm(formula = real ~ mock, data = bs1040marks)
Residuals:
Min 1Q Median 3Q Max
-55.164 -7.702 0.017 7.620 39.091
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.8859 3.8308 8.846 4.56e-16 ***
mock 1.4186 0.3494 4.060 7.01e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.87 on 202 degrees of freedom
(102 observations deleted due to missingness)
Multiple R-squared: 0.07545, Adjusted R-squared: 0.07087
F-statistic: 16.48 on 1 and 202 DF, p-value: 7.012e-05
library(ggplot2)
ggplot(bs1040marks, aes(x = mock, y = real)) +
geom_point(color = "blue", alpha = 0.6) + # Scatter plot points
geom_smooth(method = "lm", color = "red", se = TRUE) + # Regression line
theme_minimal() +
labs(title = "Mock scores predict BS1040 scores",
x = "Mock scores",
y = "Exam scores")
Regression