1. Auto Data: 1A) Which predictors are qualitative, and which are quanitative?
auto<-read.csv("Auto.csv",
header=TRUE,
na.strings = "?")
str(auto)
## 'data.frame': 397 obs. of 9 variables:
## $ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : int 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : int 130 165 150 150 140 198 220 215 225 190 ...
## $ weight : int 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
## $ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ year : int 70 70 70 70 70 70 70 70 70 70 ...
## $ origin : int 1 1 1 1 1 1 1 1 1 1 ...
## $ name : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
#Quanitative predictors: mpg, cylinders, displacement, horsepower, weight, acceleration, year, origin
#Qualitative predictors: name
1B) What is the range of each quantitative predictor?
range(auto$mpg)
## [1] 9.0 46.6
range(auto$cylinders)
## [1] 3 8
range(auto$displacement)
## [1] 68 455
range(auto$horsepower) #Horsepower outputs "N/A" for this and all below coding, even though it has a given data set. Unsure what this means.
## [1] NA NA
range(auto$weight)
## [1] 1613 5140
range(auto$acceleration)
## [1] 8.0 24.8
range(auto$year)
## [1] 70 82
range(auto$origin)
## [1] 1 3
1C) What is the mean and standard deviation of each quantitative predictor?
mean(auto$mpg)
## [1] 23.51587
sd(auto$mpg)
## [1] 7.825804
mean(auto$cylinders)
## [1] 5.458438
sd(auto$cylinders)
## [1] 1.701577
mean(auto$displacement)
## [1] 193.5327
sd(auto$displacement)
## [1] 104.3796
mean(auto$horsepower)
## [1] NA
sd(auto$horsepower)
## [1] NA
mean(auto$weight)
## [1] 2970.262
sd(auto$weight)
## [1] 847.9041
mean(auto$acceleration)
## [1] 15.55567
sd(auto$acceleration)
## [1] 2.749995
mean(auto$year)
## [1] 75.99496
sd(auto$year)
## [1] 3.690005
mean(auto$origin)
## [1] 1.574307
sd(auto$origin)
## [1] 0.8025495
1D) Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?
#mpg
Automatmpg<-matrix(auto$mpg, 397, 1)
Automatmpg2<-Automatmpg[-c(10:85,1),drop = FALSE]
range(Automatmpg2)
## [1] 11.0 46.6
mean(Automatmpg2)
## [1] 24.45875
sd(Automatmpg2)
## [1] 7.912336
#cylinders
Automatcyl<-matrix(auto$cylinders,397,1)
Automatcyl2<-Automatcyl[-c(10:85,1),drop = FALSE]
range(Automatcyl2)
## [1] 3 8
mean(Automatcyl2)
## [1] 5.3625
sd(Automatcyl2)
## [1] 1.649499
#displacement
Automatdis<-matrix(auto$displacement,397,1)
Automatdis2<-Automatdis[-c(10:85,1),drop = FALSE]
range(Automatdis2)
## [1] 68 455
mean(Automatdis2)
## [1] 186.675
sd(Automatdis2)
## [1] 99.56448
#horsepower
Automathor<-matrix(auto$horsepower,397,1)
Automathor2<-Automathor[-c(10:85,1),drop = FALSE]
range(Automathor2)
## [1] NA NA
mean(Automathor2)
## [1] NA
sd(Automathor2)
## [1] NA
#weight
Automatwei<-matrix(auto$weight,397,1)
Automatwei2<-Automatwei[-c(10:85,1),drop = FALSE]
range(Automatwei2)
## [1] 1649 4997
mean(Automatwei2)
## [1] 2932.181
sd(Automatwei2)
## [1] 811.283
#acceleration
Automatacc<-matrix(auto$acceleration,397,1)
Automatacc2<-Automatacc[-c(10:85,1),drop = FALSE]
range(Automatacc2)
## [1] 8.5 24.8
mean(Automatacc2)
## [1] 15.73469
sd(Automatacc2)
## [1] 2.676582
#year
Automatyea<-matrix(auto$year,397,1)
Automatyea2<-Automatyea[-c(10:85,1),drop = FALSE]
range(Automatyea2)
## [1] 70 82
mean(Automatyea2)
## [1] 77.175
sd(Automatyea2)
## [1] 3.090181
#origin
Automatori<-matrix(auto$origin,397,1)
Automatori2<-Automatori[-c(10:85,1),drop = FALSE]
range(Automatori2)
## [1] 1 3
mean(Automatori2)
## [1] 1.6
sd(Automatori2)
## [1] 0.8167525
1F) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer
#Based on my above plots, I would suggest that horsepower and year could serve as indicators for mpg because horsepower appeared to have an inverse relationship with mpg, and year appeared to have positive correlation.
2. Working with vectors and matrices: 2A) Construct a matrix, where rows represent each movie. Name this matrix starWars and output it.
# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)
# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
starWars<-matrix(data=c(new_hope,empire_strikes,return_jedi),2,3,)
starWars<-t(starWars)
starWars
## [,1] [,2]
## [1,] 460.998 314.4
## [2,] 290.475 247.9
## [3,] 309.306 165.8
2B)Rename the rows and columns of the matrix you created in Part A with the vector region for columns and the vector titles for rows. Then print the matrix.
rownames(starWars)<-titles
colnames(starWars)<-region
starWars
## US non-US
## A New Hope 460.998 314.4
## The Empire Strikes Back 290.475 247.9
## Return of the Jedi 309.306 165.8
2D)Now we want to add a column to our matrix for worldwide sales. You can do this by using the cbind() function. This function binds columns together.
starWars<-cbind(starWars,Worldwide)
starWars
## US non-US Worldwide
## A New Hope 460.998 314.4 775.398
## The Empire Strikes Back 290.475 247.9 538.375
## Return of the Jedi 309.306 165.8 475.106
2E)Create another matrix for the prequels and name it starWars2. Don’t forget to name the rows and the columns (similar to above)
phantom_menace <- c(474.5, 552.5)
attack_clones <- c(310.7, 338.7)
revenge_sith <- c(380.3, 468.5)
titles2<- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
starWars2<-matrix(data=c(phantom_menace,attack_clones,revenge_sith),2,3,)
starWars2<-t(starWars2)
rownames(starWars2)<-titles2
colnames(starWars2)<-region
Worldwide2<-rowSums(starWars2,na.rm = FALSE,)
starWars2<-cbind(starWars2,Worldwide2)
starWars2
## US non-US Worldwide2
## The Phantom Menace 474.5 552.5 1027.0
## Attack of the Clones 310.7 338.7 649.4
## Revenge of the Sith 380.3 468.5 848.8
2F)Make one big matrix that combines all the movies (from starWars and starWars2) using rbind(). This binds rows or in this case can be used to combine to matrices. Name this new matrix allStarWars.
allStarWars<-rbind(starWars, starWars2)
allStarWars
## US non-US Worldwide
## A New Hope 460.998 314.4 775.398
## The Empire Strikes Back 290.475 247.9 538.375
## Return of the Jedi 309.306 165.8 475.106
## The Phantom Menace 474.500 552.5 1027.000
## Attack of the Clones 310.700 338.7 649.400
## Revenge of the Sith 380.300 468.5 848.800
2G)Find the total non-US revenue for all the movies using the colSums() function.
colSums(allStarWars,na.rm = FALSE, dims = 1L)
## US non-US Worldwide
## 2226.279 2087.800 4314.079
3. College: 3A)Use the read.csv() function to read the data into R. You can download the data from the book’s website (don’t forget to set the working directory) or you can use the URL
college<-read.csv("College.csv", header=TRUE)
3B)Use the View() function to look at the data. You should notice that the first column is the just the name of each university. We don’t really want R to treat this as a variable. However, it may be handy to have these names for later. Try the following commands: rownames(college) <- college[,1] View(college)
View(college)
rownames(college) <- college[,1]
View(college)