tuition = read.csv("tuition_final.csv")
Creating an acceptance rate variable and filtering UNC-CH row from
the dataset.
tuition$Acc.Rate = (tuition$Accepted/tuition$Applied)*100
tuition[tuition$Name == "University of North Carolina at Chapel Hill",]
## ID Name State Public Avg.SAT
## 682 2974 University of North Carolina at Chapel Hill NC 1 1121
## Avg.ACT Applied Accepted Size Out.Tuition Spending Acc.Rate
## 682 NA 14596 5985 14609 8400 15893 41.00438
Plotting a simple linear regression with Tuition Price on SAT
Scores.
plot(tuition$Avg.SAT, tuition$Out.Tuition, main = "College Tuition Based on SAT Score", xlab = "SAT Score", ylab = "Tuition Price", pch = 20, cex = 1, col = "blue")

Now that we have a Y-Intercept and a Slope, we can include our
regression line into the graph from before by using the abline()
function in R. This line optimally reduces the residuals for the
datapoints.
plot(tuition$Avg.SAT, tuition$Out.Tuition, main = "College Tuition Based on SAT Score", xlab = "SAT Score", ylab = "Tuition Price", pch = 20, cex = 1, col = "blue")
abline(b0, b1)

Using the function just created, we can see whether UNC provides a
good education for the price:
CH = tuition[tuition$Name == "University of North Carolina at Chapel Hill",]
CHTuition = CH$Out.Tuition
CHTuitionPredict = predict_yval(EntireTuitionNoNA$Avg.SAT, EntireTuitionNoNA$Out.Tuition, CH$Avg.SAT)
#Is the actual UNC tuition cheaper than the predicted?
CHTuitionPredict
## [1] 13019.43
CHTuition
## [1] 8400
CHTuition < CHTuitionPredict
## [1] TRUE
#We can see that UNC provides a great education at a great price.
We’ve seen how to do manually create a linear regression. R also has
the lm() function that allows us to speed up this process as well as
include other variables within it. Here, we have a multiple linear
regression model where we adjust the public variable into a categorical
variable using the factor() function.