summary_X <- summary(alumni$'percent_of_classes_under_20')
summary_X
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 44.75 59.50 55.73 66.25 77.00
summary_Y <- summary(alumni$'alumni_giving_rate')
summary_Y
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 18.75 29.00 29.27 38.50 67.00
From the boxplot, we can see that there are no outliers in the data of both variables X and Y.
The correlation coefficient is 0.6578, the positive coefficient idicating that as the percentage of “classes with fewer than 20 students” increases, the “Alumni Giving Rate” tends to increase.
As the p-value is very low (7.23e-07), the coefficient is statistically significant.
par(mfrow=c(1,2), mar=c(2,2,2,2), oma=c(0,0,0,0))
boxplot(alumni$'percent_of_classes_under_20')
hist(alumni$'percent_of_classes_under_20', main = 'percent_of_classes_under_20')
boxplot(alumni$'alumni_giving_rate')
hist(alumni$'alumni_giving_rate', main = 'alumni_giving_rate')
plot(alumni$`percent_of_classes_under_20`, alumni$`alumni_giving_rate`,
xlab="percent_of_classes_under_20", ylab="alumni_giving_rate", pch = 20)
fit <- lm(alumni_giving_rate ~ percent_of_classes_under_20, data = alumni)
#print(fit)
summary(fit)
##
## Call:
## lm(formula = alumni_giving_rate ~ percent_of_classes_under_20,
## data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.053 -7.158 -1.660 6.734 29.658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.3861 6.5655 -1.125 0.266
## percent_of_classes_under_20 0.6578 0.1147 5.734 7.23e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.38 on 46 degrees of freedom
## Multiple R-squared: 0.4169, Adjusted R-squared: 0.4042
## F-statistic: 32.88 on 1 and 46 DF, p-value: 7.228e-07
Deducing from the output, the value of b0 is -7.3861 and b1 is 0.6578. Hence the estimated regression equation is Y = -7.3861 + 0.6578 * X
In addition to the above mentioned analysis, it is worth noting that
The R-squared value (0.4169) suggests that approximately 41.69% of the variability in Alumni Giving Rate can be explained by the percentage of classes under 20.
The F-statistic (32.88) and its associated p-value (7.23e-07) suggest that the model as a whole is statistically significant.