alumni <- read.csv("alumni.csv")
X <- alumni$percent_of_classes_under_20
Y <- alumni$alumni_giving_rate
plot(X, Y, xlab = "Percent of classes Under 20", ylab = "Alumni Giving Rate",
main="Scatterplot of X vs Y")
fit <- lm(Y ~ X)
coef(fit)
## (Intercept) X
## -7.3860676 0.6577687
abline(fit, lwd = 2, col = "red2")

cor(X, Y)
## [1] 0.6456504
## X is the predictor variable & Y is the response variable. Both of which are quantitative and continuous.
## Based on visible inspection of the scatterplot, there do not appear to be any extreme outliers that distort the overall pattern. There are high values in the positive linear regression but no true outliers.
## The correlation is r = 0.6456504
fit <- lm(Y ~ X)
coef(fit)
## (Intercept) X
## -7.3860676 0.6577687
## The estimated regression equation is Y_hat = -7.3860676 + 0.6577687 X
model <- lm(Y ~ X, data = alumni)
coef(model)
## (Intercept) X
## -7.3860676 0.6577687
summary(model)
##
## Call:
## lm(formula = Y ~ X, data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.053 -7.158 -1.660 6.734 29.658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.3861 6.5655 -1.125 0.266
## X 0.6578 0.1147 5.734 7.23e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.38 on 46 degrees of freedom
## Multiple R-squared: 0.4169, Adjusted R-squared: 0.4042
## F-statistic: 32.88 on 1 and 46 DF, p-value: 7.228e-07
## After intercepting the values, we use the above formula to find all of them in the summary of the "model" which is the assigned variable we have taken. The intercept represents the predicted alumni giving rate when the percent of classes under 20 is zero.
## The slope b1 = 0.6578 which means that for every 1 percentage point increase in small classes, the alumni giving rate is expected to increase by about 0.658 percentage points, on average.
## The t-statistic is 5.734 and p-value is 7.23×10⁻⁷. Since the p-value is far below 0.05, we reject H0. There is strong evidence of a statistically significant linear relationship.
anova(model)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 3539.8 3539.8 32.884 7.228e-07 ***
## Residuals 46 4951.7 107.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## The F-statistic is 32.88 with p-value 7.23×10⁻⁷, which is < 0.05. We reject the null hypothesis that the model has no explanatory power. The regression model is statistically significant overall.
## The R^2 = 0.417. So, 41.7% of the variability in alumni giving rate is explained by the percentage of small classes.