In the alumni data we have the information for 48 colleges and their alumni donation ratio. We are trying to fit a linear regression model using this data to predict the alumni donation based on the different factors present in our data .. This has 48 rows each corresponding to one unique college. There are total of 5 variables in our data - school name , percent of class under 20 , student facult ratio ,alumni giving rate, private.
We will plot the graphs to explain the relation between response variable(alumni donation ratio) with our predictor variable (percent of class under 20 & student faculty ratio) and quality variable( Private) . This plot will help in relationship building between response and predictor. The R -code for the dataset is mentioned below :
options(warn=-1)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.5 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
url <- "https://bgreenwell.github.io/uc-bana7052/data/alumni.csv"
alumni <- read.csv(url)
colnames(alumni)[1]<- "school"
alumni$private<- as.factor(alumni$private)
head(alumni)
## school percent_of_classes_under_20
## 1 Boston College 39
## 2 Brandeis University 68
## 3 Brown University 60
## 4 California Institute of Technology 65
## 5 Carnegie Mellon University 67
## 6 Case Western Reserve Univ. 52
## student_faculty_ratio alumni_giving_rate private
## 1 13 25 1
## 2 8 33 1
## 3 8 40 1
## 4 3 46 1
## 5 10 28 1
## 6 8 31 1
p1<- ggplot(alumni,aes(x= student_faculty_ratio ,y=alumni_giving_rate,color=private)) +geom_point(size = 2, shape = 17, alpha = 1)
p2<- ggplot(alumni,aes(x= percent_of_classes_under_20 ,y=alumni_giving_rate,color=private)) +geom_point(size = 2, shape = 15, alpha = 1)
gridExtra::grid.arrange(p1, p2, ncol = 2)