##Problem 1: Auto Data #Part A

Auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data", 
                   header=TRUE,
                   na.strings = "?")
str(Auto)
## 'data.frame':    397 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : int  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...

The predictors mpg, cylinders, displacement, horsepower, weight, acceleration, year, and origin are quantitative. The predictor name is qualitative.

#Part B

#Range of each quantitative predictor:
range(Auto$mpg)
## [1]  9.0 46.6
range(Auto$cylinders)
## [1] 3 8
range(Auto$displacement)
## [1]  68 455
range(Auto$horsepower, na.rm=TRUE)
## [1]  46 230
range(Auto$weight)
## [1] 1613 5140
range(Auto$acceleration)
## [1]  8.0 24.8
range(Auto$year)
## [1] 70 82
range(Auto$origin)
## [1] 1 3

#Part C

#Mean of each quantitative predictor:
mean(Auto$mpg)
## [1] 23.51587
mean(Auto$cylinders)
## [1] 5.458438
mean(Auto$displacement)
## [1] 193.5327
mean(Auto$horsepower, na.rm=TRUE)
## [1] 104.4694
mean(Auto$weight)
## [1] 2970.262
mean(Auto$acceleration)
## [1] 15.55567
mean(Auto$year)
## [1] 75.99496
mean(Auto$origin)
## [1] 1.574307
#Standard deviation of each quantitative predictor:
sd(Auto$mpg)
## [1] 7.825804
sd(Auto$cylinders)
## [1] 1.701577
sd(Auto$displacement)
## [1] 104.3796
sd(Auto$horsepower, na.rm=TRUE)
## [1] 38.49116
sd(Auto$weight)
## [1] 847.9041
sd(Auto$acceleration)
## [1] 2.749995
sd(Auto$year)
## [1] 3.690005
sd(Auto$origin)
## [1] 0.8025495

#Part D

#Subset of data with the 10th through 85th observations removed.
Auto2<-Auto[-c(10:85),]
#Range of each quantitative predictor in the data subset:
range(Auto2$mpg)
## [1] 11.0 46.6
range(Auto2$cylinders)
## [1] 3 8
range(Auto2$displacement)
## [1]  68 455
range(Auto2$horsepower, na.rm=TRUE)
## [1]  46 230
range(Auto2$weight)
## [1] 1649 4997
range(Auto2$acceleration)
## [1]  8.5 24.8
range(Auto2$year)
## [1] 70 82
range(Auto2$origin)
## [1] 1 3
#Mean of each quantitative predictor in the data subset:
mean(Auto2$mpg)
## [1] 24.43863
mean(Auto2$cylinders)
## [1] 5.370717
mean(Auto2$displacement)
## [1] 187.0498
mean(Auto2$horsepower, na.rm=TRUE)
## [1] 100.9558
mean(Auto2$weight)
## [1] 2933.963
mean(Auto2$acceleration)
## [1] 15.72305
mean(Auto2$year)
## [1] 77.15265
mean(Auto2$origin)
## [1] 1.598131
#Standard deviation of each quantitative predictor in the data subset:
sd(Auto2$mpg)
## [1] 7.908184
sd(Auto2$cylinders)
## [1] 1.653486
sd(Auto2$displacement)
## [1] 99.63539
sd(Auto2$horsepower, na.rm=TRUE)
## [1] 35.89557
sd(Auto2$weight)
## [1] 810.6429
sd(Auto2$acceleration)
## [1] 2.680514
sd(Auto2$year)
## [1] 3.11123
sd(Auto2$origin)
## [1] 0.8161627

#Part E

plot(Auto$mpg, Auto$weight)

The scatterplot comparing mpg and weight suggests that as the weight of a car decreases, the miles per gallon increases. This makes sense because a heavier car requires more gas to acclerate the weightm of the car compared to a lighter car.

plot(Auto$year, Auto$mpg)

The scatterplot comparing the model year of the car and mpg suggests that there has been an increase in mpg and effeciency over the years as newer methods to create cars are created and implemented.

plot(Auto$cylinders, Auto$horsepower)

The scatter plot comparing the number of cylinders that a car has and the car’s horsepower shows a trend suggesting that the greater the number of cylinders of a car, the greater the horsepower of that car. ``` #Part F

pairs(Auto)

When observing the pairwise scatterplots between mpg and the other variables, the plots comparing mpg and displacement, mpg and horesepower, and mpg and weight show are trend in which as the x varible increase, mpg decreases. These findings suggest that if you knew the displacement, horespower, and weight of a car, it could be possible to predice gas mileage. The plots comparing mpg and cylinders and mpg and origin have similar structures, suggesting that it may be possible to predict the mpg with this information. In general, it would be difficult to predict the mpg of a car given only these variables because there are several other cofounding variables that can affect the prediction.

##Problem 2: Working with vectors and matrices #Data

#Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

#Vectors region and titles, used for naming
region <-c("US", "non-US")
titles <- c("A New Hope", "The Empire Stikes Back", "Return of the Jedi")

#Part A

#Construct a matrix
starWars <- matrix(c(new_hope, empire_strikes, return_jedi), nrow=3, byrow = TRUE)
starWars
##         [,1]  [,2]
## [1,] 460.998 314.4
## [2,] 290.475 247.9
## [3,] 309.306 165.8

#Part B

#Rename the row and columns of the matrix
colnames(starWars) <- region
rownames(starWars) <- titles
starWars
##                             US non-US
## A New Hope             460.998  314.4
## The Empire Stikes Back 290.475  247.9
## Return of the Jedi     309.306  165.8

#Part C

#Worldwide box office figures for each movie
worldwide_figures<- rowSums(starWars)
worldwide_figures
##             A New Hope The Empire Stikes Back     Return of the Jedi 
##                775.398                538.375                475.106

#Part D

#Add column to matrix for worldwide sales
worldwide_wars <- cbind(starWars, worldwide_figures)

#Prequel Data

phantom_menace <- c(474.5, 552.5)
attack_clones <- c(310.7, 338.7)
revenge_sith <- c(380.3, 468.5)
titles2<-c("The Phantom Menace", "Attack of the Clones", "Revent of the Sith")

#Part E

#Create a matrix for the prequels with row and column names
starWars2 <- matrix(c(phantom_menace, attack_clones, revenge_sith), 
                    nrow=3, byrow = TRUE)

colnames(starWars2) <- region
rownames(starWars2) <- titles2
starWars2
##                         US non-US
## The Phantom Menace   474.5  552.5
## Attack of the Clones 310.7  338.7
## Revent of the Sith   380.3  468.5

#Part F

#Combine starWars and starWars2 into one matrix
allStarWars <- rbind(starWars,starWars2)
allStarWars
##                             US non-US
## A New Hope             460.998  314.4
## The Empire Stikes Back 290.475  247.9
## Return of the Jedi     309.306  165.8
## The Phantom Menace     474.500  552.5
## Attack of the Clones   310.700  338.7
## Revent of the Sith     380.300  468.5

#Part G

#Total non-US revenue for all movies
colSums(allStarWars)
##       US   non-US 
## 2226.279 2087.800

The total non-US revenue for all the movies is $2087.80

##Problem 3: College data #Part A

#Reading data into R:
college<-read.csv("http://faculty.marshall.usc.edu/gareth-james/ISL/College.csv",header=TRUE)

#Part B

#Create row names column with the name of each university recorded
rownames(college)<- college[,1]
#First data column becomes 'Private'
college <- college[,-1]

#Part C

#Numerical summary of the varaibles in the data set
summary(college)
##  Private        Apps           Accept          Enroll       Top10perc    
##  No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
##  Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
##            Median : 1558   Median : 1110   Median : 434   Median :23.00  
##            Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
##            3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
##            Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
##    Top25perc      F.Undergrad     P.Undergrad         Outstate    
##  Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
##  1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
##  Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
##  Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
##  3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
##  Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
##    Room.Board       Books           Personal         PhD        
##  Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
##  1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
##  Median :4200   Median : 500.0   Median :1200   Median : 75.00  
##  Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
##  3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
##  Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
##     Terminal       S.F.Ratio      perc.alumni        Expend     
##  Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
##  1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
##  Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
##  Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
##  3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
##  Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
##    Grad.Rate     
##  Min.   : 10.00  
##  1st Qu.: 53.00  
##  Median : 65.00  
##  Mean   : 65.46  
##  3rd Qu.: 78.00  
##  Max.   :118.00
#Scatterplot matrix of the first ten varibles in the data
pairs(college[,1:10])

#Boxplot of Outstate vs Private
plot(college$Private, college$Outstate, main = "Outstate vs Private", xlab = "Private", ylab = "Outstate")

#Create Elite varible based on whether or not the proportion of students coming from the top 10% of their high shcool class exceed 50%
Elite <- rep("No", nrow(college))
Elite[college$Top10perc > 50] = "Yes"
Elite <- as.factor(Elite)
college <- data.frame(college, Elite)
summary(college)
##  Private        Apps           Accept          Enroll       Top10perc    
##  No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
##  Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
##            Median : 1558   Median : 1110   Median : 434   Median :23.00  
##            Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
##            3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
##            Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
##    Top25perc      F.Undergrad     P.Undergrad         Outstate    
##  Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
##  1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
##  Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
##  Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
##  3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
##  Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
##    Room.Board       Books           Personal         PhD        
##  Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
##  1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
##  Median :4200   Median : 500.0   Median :1200   Median : 75.00  
##  Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
##  3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
##  Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
##     Terminal       S.F.Ratio      perc.alumni        Expend     
##  Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
##  1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
##  Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
##  Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
##  3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
##  Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
##    Grad.Rate      Elite    
##  Min.   : 10.00   No :699  
##  1st Qu.: 53.00   Yes: 78  
##  Median : 65.00            
##  Mean   : 65.46            
##  3rd Qu.: 78.00            
##  Max.   :118.00

There are 78 elite universities.

#Side-by-side boxplots of Outstate vs Elite
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
ggplot(college, aes(x=Elite, y=Outstate, fill=Elite))+
         geom_boxplot()