A. Which of the predictors are quantitative, and which are qualitative?
auto2<-read.csv("Auto (2).csv",
header=TRUE,
na.strings = "?")
auto3<-na.omit(auto2)
str(auto3)
## 'data.frame': 392 obs. of 9 variables:
## $ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : int 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : int 130 165 150 150 140 198 220 215 225 190 ...
## $ weight : int 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
## $ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ year : int 70 70 70 70 70 70 70 70 70 70 ...
## $ origin : int 1 1 1 1 1 1 1 1 1 1 ...
## $ name : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
## - attr(*, "na.action")= 'omit' Named int 33 127 331 337 355
## ..- attr(*, "names")= chr "33" "127" "331" "337" ...
Quantitative predictors: mpg, displacement, acceleration Qualitattive predictors: cylinders, horsepower, weight, year, origin, name
B. What is the range of each quantitative predictor?
range(auto3$mpg)
## [1] 9.0 46.6
range(auto3$displacement)
## [1] 68 455
range(auto3$acceleration)
## [1] 8.0 24.8
C. What is the mean and standard deviation of each quantitative predictor?
mean(auto3$mpg)
## [1] 23.44592
sd(auto3$mpg)
## [1] 7.805007
mean(auto3$displacement)
## [1] 194.412
sd(auto3$displacement)
## [1] 104.644
mean(auto3$acceleration)
## [1] 15.54133
sd(auto3$acceleration)
## [1] 2.758864
D. Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?
auto4<-auto3[-c(10:85),]
range(auto4$mpg)
## [1] 11.0 46.6
mean(auto4$mpg)
## [1] 24.40443
sd(auto4$mpg)
## [1] 7.867283
range(auto4$displacement)
## [1] 68 455
mean(auto4$displacement)
## [1] 187.2405
sd(auto4$displacement)
## [1] 99.67837
range(auto4$acceleration)
## [1] 8.5 24.8
mean(auto4$acceleration)
## [1] 15.7269
sd(auto4$acceleration)
## [1] 2.693721
E. Using the full data set, investigate the predictors graphically, using scatterplots and other tools of your choice. Create some plots (at least 3) highlighting the relationships among the predictors. Comment on your findings.
plot(auto3$mpg, auto3$displacement)
There seems to be evidence for a strong, negative association between mpg and displacement that could potentially be linear.
plot(auto3$acceleration, auto3$name)
Since the scatterplot is random and contains no pattern, there seems to be no association between acceleration and name.
plot(auto3$weight, auto3$horsepower)
There seems to be evidence for a strong, positive, linear assoication between weight and horsepower.
F. Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg?
plot(auto3$displacement, auto3$mpg)
Since the scatterplot (with x=displacement and y=mpg) shows a relativley strong, negative assoication, we could use a linear regression model between these two varibales to help predict [mpg] given a certain displacement. These predicted values would be compared to our observed data to ensure that the residual plot justifies the linear regression as the appropriate model.
A. Construct a matrix, where rows represent each movie. Name this matrix starWars and output it.
# Box office Star Wars (in miilions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.9)
return_jedi <- c(309.306, 165.8)
starWars <- matrix(data=c(new_hope, empire_strikes, return_jedi), nrow=3, ncol=2, byrow=TRUE)
starWars
## [,1] [,2]
## [1,] 460.998 314.4
## [2,] 290.475 247.9
## [3,] 309.306 165.8
B. Rename the rows and columns of the matrix you created in Part A with the vector ‘region’ for columns and the vector ‘titles’ for rows.
# Vectors regions and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
dimnames(starWars) <- list(titles, region)
starWars
## US non-US
## A New Hope 460.998 314.4
## The Empire Strikes Back 290.475 247.9
## Return of the Jedi 309.306 165.8
C. Calculate the worldwide box office figures for each movie using the rowSums() function. Name and output this vector.
worldwide<-rowSums(starWars)
worldwide
## A New Hope The Empire Strikes Back Return of the Jedi
## 775.398 538.375 475.106
D. Now we want to add a column to our matrix for worldwide sales. You can do this by using the cbind() function. This function binds columns together.
world_sales<-cbind("US", "non-US")
E. Create another matrix for the prequels and name it starWars2. Don’t forget to name the rows and the columns (similar to above)
# Prequels
phantom_menace <- c(474.5, 552.5)
attack_clones <- c(310.7, 338.7)
revenge_sith <- c(380.3, 468.5)
starWars2 <- matrix(data=c(phantom_menace, attack_clones, revenge_sith), nrow=3, ncol=2, byrow=TRUE)
starWars2
## [,1] [,2]
## [1,] 474.5 552.5
## [2,] 310.7 338.7
## [3,] 380.3 468.5
titles2 <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
dimnames(starWars2) <- list(titles2, region)
starWars2
## US non-US
## The Phantom Menace 474.5 552.5
## Attack of the Clones 310.7 338.7
## Revenge of the Sith 380.3 468.5
F. Make one big matrix that combines all the movies (from starWars and starWars2) using rbind(). This binds rows or in this case can be used to combine two matrices. Name this new matrix allStarWars.
allStarWars<-rbind(starWars, starWars2)
allStarWars
## US non-US
## A New Hope 460.998 314.4
## The Empire Strikes Back 290.475 247.9
## Return of the Jedi 309.306 165.8
## The Phantom Menace 474.500 552.5
## Attack of the Clones 310.700 338.7
## Revenge of the Sith 380.300 468.5
G. Find the total non-US revenue for all the movies using the colSums() function.
colSums(allStarWars)
## US non-US
## 2226.279 2087.800
The total non-US revenue for all the movies is $2087.80.
A. Use the read.csv() function to read the data into R.
college<-read.csv("College.csv", header=TRUE, na.strings="?")
B. Use the View() function to look at the data.
rownames(college) <- college[,1]
college <- college[,-1]
head(college)
## Private Apps Accept Enroll Top10perc
## Abilene Christian University Yes 1660 1232 721 23
## Adelphi University Yes 2186 1924 512 16
## Adrian College Yes 1428 1097 336 22
## Agnes Scott College Yes 417 349 137 60
## Alaska Pacific University Yes 193 146 55 16
## Albertson College Yes 587 479 158 38
## Top25perc F.Undergrad P.Undergrad Outstate
## Abilene Christian University 52 2885 537 7440
## Adelphi University 29 2683 1227 12280
## Adrian College 50 1036 99 11250
## Agnes Scott College 89 510 63 12960
## Alaska Pacific University 44 249 869 7560
## Albertson College 62 678 41 13500
## Room.Board Books Personal PhD Terminal
## Abilene Christian University 3300 450 2200 70 78
## Adelphi University 6450 750 1500 29 30
## Adrian College 3750 400 1165 53 66
## Agnes Scott College 5450 450 875 92 97
## Alaska Pacific University 4120 800 1500 76 72
## Albertson College 3335 500 675 67 73
## S.F.Ratio perc.alumni Expend Grad.Rate
## Abilene Christian University 18.1 12 7041 60
## Adelphi University 12.2 16 10527 56
## Adrian College 12.9 30 8735 54
## Agnes Scott College 7.7 37 19016 59
## Alaska Pacific University 11.9 2 10922 15
## Albertson College 9.4 11 9727 55
C. Perform the following tasks and provide the code: a. Use the summary() function to produce a numerical summary of the variables in the data set.
summary(college)
## Private Apps Accept Enroll Top10perc
## No :212 Min. : 81 Min. : 72 Min. : 35 Min. : 1.00
## Yes:565 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00
## Median : 1558 Median : 1110 Median : 434 Median :23.00
## Mean : 3002 Mean : 2019 Mean : 780 Mean :27.56
## 3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00
## Max. :48094 Max. :26330 Max. :6392 Max. :96.00
## Top25perc F.Undergrad P.Undergrad Outstate
## Min. : 9.0 Min. : 139 Min. : 1.0 Min. : 2340
## 1st Qu.: 41.0 1st Qu.: 992 1st Qu.: 95.0 1st Qu.: 7320
## Median : 54.0 Median : 1707 Median : 353.0 Median : 9990
## Mean : 55.8 Mean : 3700 Mean : 855.3 Mean :10441
## 3rd Qu.: 69.0 3rd Qu.: 4005 3rd Qu.: 967.0 3rd Qu.:12925
## Max. :100.0 Max. :31643 Max. :21836.0 Max. :21700
## Room.Board Books Personal PhD
## Min. :1780 Min. : 96.0 Min. : 250 Min. : 8.00
## 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00
## Median :4200 Median : 500.0 Median :1200 Median : 75.00
## Mean :4358 Mean : 549.4 Mean :1341 Mean : 72.66
## 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00
## Max. :8124 Max. :2340.0 Max. :6800 Max. :103.00
## Terminal S.F.Ratio perc.alumni Expend
## Min. : 24.0 Min. : 2.50 Min. : 0.00 Min. : 3186
## 1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751
## Median : 82.0 Median :13.60 Median :21.00 Median : 8377
## Mean : 79.7 Mean :14.09 Mean :22.74 Mean : 9660
## 3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830
## Max. :100.0 Max. :39.80 Max. :64.00 Max. :56233
## Grad.Rate
## Min. : 10.00
## 1st Qu.: 53.00
## Median : 65.00
## Mean : 65.46
## 3rd Qu.: 78.00
## Max. :118.00
pairs(college[,1:10])
plot(college$Private, college$Outstate)
Elite <- rep("No", nrow(college))
Elite[college$Top10perc > 50] = "Yes"
Elite <- as.factor(Elite)
college <- data.frame(college, Elite)
summary(college)
## Private Apps Accept Enroll Top10perc
## No :212 Min. : 81 Min. : 72 Min. : 35 Min. : 1.00
## Yes:565 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00
## Median : 1558 Median : 1110 Median : 434 Median :23.00
## Mean : 3002 Mean : 2019 Mean : 780 Mean :27.56
## 3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00
## Max. :48094 Max. :26330 Max. :6392 Max. :96.00
## Top25perc F.Undergrad P.Undergrad Outstate
## Min. : 9.0 Min. : 139 Min. : 1.0 Min. : 2340
## 1st Qu.: 41.0 1st Qu.: 992 1st Qu.: 95.0 1st Qu.: 7320
## Median : 54.0 Median : 1707 Median : 353.0 Median : 9990
## Mean : 55.8 Mean : 3700 Mean : 855.3 Mean :10441
## 3rd Qu.: 69.0 3rd Qu.: 4005 3rd Qu.: 967.0 3rd Qu.:12925
## Max. :100.0 Max. :31643 Max. :21836.0 Max. :21700
## Room.Board Books Personal PhD
## Min. :1780 Min. : 96.0 Min. : 250 Min. : 8.00
## 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00
## Median :4200 Median : 500.0 Median :1200 Median : 75.00
## Mean :4358 Mean : 549.4 Mean :1341 Mean : 72.66
## 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00
## Max. :8124 Max. :2340.0 Max. :6800 Max. :103.00
## Terminal S.F.Ratio perc.alumni Expend
## Min. : 24.0 Min. : 2.50 Min. : 0.00 Min. : 3186
## 1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751
## Median : 82.0 Median :13.60 Median :21.00 Median : 8377
## Mean : 79.7 Mean :14.09 Mean :22.74 Mean : 9660
## 3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830
## Max. :100.0 Max. :39.80 Max. :64.00 Max. :56233
## Grad.Rate Elite
## Min. : 10.00 No :699
## 1st Qu.: 53.00 Yes: 78
## Median : 65.00
## Mean : 65.46
## 3rd Qu.: 78.00
## Max. :118.00
There are 78 elite Universities.
plot(college$Elite, college$Outstate)