Problem 1

Part A

Using the str function, I identified the quantitative preditctors. These include mpg,cylinders, displacement, horsepower, weight, and acceleration.

Auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data", 
                   header=TRUE,
                   na.strings = "?")
str(Auto)
## 'data.frame':    397 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : int  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...

Part B

Using matrices I found the range of the quantitative predictors

Auto=na.omit(Auto)
ranges <- matrix(c(range(Auto$mpg),
                range(Auto$cylinders),
                range(Auto$displacement),
                range(Auto$horsepower),
                range(Auto$weight), 
                range(Auto$acceleration)), 
                nrow=2, ncol=6, byrow=FALSE)
colnames(ranges)<- c("mpg","cylinders","displacement","horsepower","weight","acceleration")
rownames(ranges)<- c("low", "high")
ranges
##       mpg cylinders displacement horsepower weight acceleration
## low   9.0         3           68         46   1613          8.0
## high 46.6         8          455        230   5140         24.8

Part C

using matrices I found the mean and standard deviations of the quantitaive predictors

means<- matrix(c(mean(Auto$mpg),
                 mean(Auto$cylinders),
                 mean(Auto$displacement),
                 mean(Auto$horsepower),
                 mean(Auto$weight),
                 mean(Auto$acceleration)),
               nrow=6, ncol=1, byrow=TRUE)
SDs<- matrix(c(sd(Auto$mpg),
               sd(Auto$cylinders),
               sd(Auto$displacement),
               sd(Auto$horsepower),
               sd(Auto$weight),
               sd(Auto$acceleration)),
             nrow=6, ncol=1, byrow =TRUE)
Mean_SD<-cbind(means,SDs)
rownames(Mean_SD)<- c("mpg","cylinders","displacement","horsepower","weight","acceleration")
colnames(Mean_SD)<- c("Mean","SD")
Mean_SD
##                     Mean         SD
## mpg            23.445918   7.805007
## cylinders       5.471939   1.705783
## displacement  194.411990 104.644004
## horsepower    104.469388  38.491160
## weight       2977.584184 849.402560
## acceleration   15.541327   2.758864

Part D

Ignoring the 10th trhough 85th observations I found the Mean, Standard devation, and range of the observations

Range

apply(Auto[-c(10:85), 1:6], 2 , range)
##       mpg cylinders displacement horsepower weight acceleration
## [1,] 11.0         3           68         46   1649          8.5
## [2,] 46.6         8          455        230   4997         24.8

Mean

apply(Auto[-c(10:85), 1:6], 2 , mean)
##          mpg    cylinders displacement   horsepower       weight acceleration 
##    24.404430     5.373418   187.240506   100.721519  2935.971519    15.726899

SD

apply(Auto[-c(10:85), 1:6], 2 , sd)
##          mpg    cylinders displacement   horsepower       weight acceleration 
##     7.867283     1.654179    99.678367    35.708853   811.300208     2.693721

Part E

Using the complete data set I created pairwise scatterplots

pairs(Auto [,])

Looking at the scatter plots above shows that there is a negative correlation between horsepower and acceleration. As accelration increases horsepower decreases, which is unexpecetd.

plot(y=Auto$horsepower, x=Auto$acceleration, xlab = "Acceleration", ylab = "Horsepower")

There is a positive correlation between weight and displacement, as weight increases displacment increases.

plot(y=Auto$displacement, x=Auto$weight, xlab = "Weight", ylab = "displacement")

There also appears to be a positice correlation between and horsepower as horsepower increase so does the weight.

plot(y=Auto$weight, x=Auto$horsepower, xlab = "Horsepower", ylab = "Weight")

Part F

Suppossing we want to predict mpg in relation to other variables we might want to look at the relationship between mpg and horsepower

plot(y=Auto$mpg, x=Auto$horsepower, xlab = "Horsepower", ylab = "mpg")

There is a clear positive correlation between the two variables, as horsepower increase the mpg increases. Weight has a similair relation ship to mpg

plot(y=Auto$mpg, x=Auto$weight, xlab = "weight", ylab = "mpg")

When weight increases, the mpg increases. By understanding these relationship we might be ablt to predict the mpg a car has when looking at weight and horsepower.

Problem 2

Part A

I created a matrix of the Star War films where rows represent each movie.

new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of
the Jedi")

StarWars <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)

Part B

I renamed the columns and rows

region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
rownames(StarWars) <- titles
colnames(StarWars) <- region
StarWars
##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of the Jedi      309.306  165.8

Part C

Using the rowSum function I found the worldwide box office figures

worldwide_vector <- rowSums(StarWars)

Part D

I added a column to the matrix to include worldwide sales using cbind

all_wars_matrix <- cbind(StarWars, worldwide_vector)
all_wars_matrix
##                              US non-US worldwide_vector
## A New Hope              460.998  314.4          775.398
## The Empire Strikes Back 290.475  247.9          538.375
## Return of the Jedi      309.306  165.8          475.106

Part E

I created a second matrix for the prequel movies.

phantom_menace <- c(474.5, 552.5)
attack_clones <- c(310.7, 338.7)
revenge_sith <- c(380.3, 468.5)
titles2 <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")

StarWars2 <- matrix(c(phantom_menace, attack_clones, revenge_sith), nrow=3, byrow = TRUE)
StarWars2
##       [,1]  [,2]
## [1,] 474.5 552.5
## [2,] 310.7 338.7
## [3,] 380.3 468.5
rownames(StarWars2) <- titles2
colnames (StarWars2) <- region

Part F

I constructed a new matrix to include all the Star War films.

AllWars <- rbind(StarWars, StarWars2)

Part G

I found the total revenue of all star wars films

Total_revenue <- colSums(AllWars)
Total_revenue
##       US   non-US 
## 2226.279 2087.800

Problem 3

Part A

I used the sum() function to produce a summary of the data

summary(college)
##  Private        Apps           Accept          Enroll       Top10perc    
##  No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
##  Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
##            Median : 1558   Median : 1110   Median : 434   Median :23.00  
##            Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
##            3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
##            Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
##    Top25perc      F.Undergrad     P.Undergrad         Outstate    
##  Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
##  1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
##  Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
##  Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
##  3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
##  Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
##    Room.Board       Books           Personal         PhD        
##  Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
##  1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
##  Median :4200   Median : 500.0   Median :1200   Median : 75.00  
##  Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
##  3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
##  Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
##     Terminal       S.F.Ratio      perc.alumni        Expend     
##  Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
##  1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
##  Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
##  Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
##  3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
##  Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
##    Grad.Rate     
##  Min.   : 10.00  
##  1st Qu.: 53.00  
##  Median : 65.00  
##  Mean   : 65.46  
##  3rd Qu.: 78.00  
##  Max.   :118.00

Part B

I used the pairs function to create scatterplots of the data

pairs(college[,1:10])

Part C

I used ggplot to to produce side-by-side boxplots of Outstate vs Private.

library(tidyverse)
ggplot(college, aes(y=Outstate, fill= Private, ))+ geom_boxplot()+
  theme_minimal()

## Part D I binned the top ten percent of the data to create a new variable.

Elite <- rep("No", nrow(college))
Elite[college$Top10perc > 50] = "Yes"
Elite <- as.factor(Elite)
college <- data.frame(college, Elite)

summary(Elite)
##  No Yes 
## 699  78
ggplot(college, aes(y=Outstate, fill= Elite))+ geom_boxplot()+
  theme_minimal()