HW#1_AlexMAtteson

HW 1 - Alex Matteson

This is my first R homework for stats 239

1.) a

Auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data", 
                   header=TRUE,
                   na.strings = "?")
#head(Auto)
str(Auto)

## 'data.frame':    397 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : int  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...

As you can see mpg, displacement, horsepower, weight, acceleration, cylindars and year are quantitative. While name and origin are qualitative.

range(Auto$mpg)

## [1]  9.0 46.6

range(Auto$displacement)

## [1]  68 455

range(Auto$cylinders)

## [1] 3 8

range(Auto$horsepower)

## [1] NA NA

range(Auto$weight)

## [1] 1613 5140

range(Auto$acceleration)

## [1]  8.0 24.8

range(Auto$year)

## [1] 70 82

mean(Auto$mpg)

## [1] 23.51587

sd(Auto$mpg)

## [1] 7.825804

mean(Auto$cylinders)

## [1] 5.458438

sd(Auto$cylinders)

## [1] 1.701577

mean(Auto$displacement)

## [1] 193.5327

sd(Auto$displacement)

## [1] 104.3796

mean(Auto$horsepower)

## [1] NA

sd(Auto$horsepower)

## [1] NA

mean(Auto$weight)

## [1] 2970.262

sd(Auto$weight)

## [1] 847.9041

mean(Auto$acceleration)

## [1] 15.55567

sd(Auto$acceleration)

## [1] 2.749995

mean(Auto$year)

## [1] 75.99496

sd(Auto$year)

## [1] 3.690005

For some reason I do not remember this? And I can’t figure it out based on the class notes/the internet. Is it something like this: Auto[c(10, 85)] ?

Acceleration does not seem like an effective way of predicting mpg. However, hourse power and weight do. As we increase both of these variables mpg goes down. This makes sense because a heavier car and a more powerful car probably both need more gas. But also I don’t know anything about cars so that may be wrong.

2.) a. b.

# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)
# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of
the Jedi")

starWars <- matrix(data = c(new_hope, empire_strikes, return_jedi),
                   nrow = 3,
                   byrow = TRUE)
colnames(starWars) = region
rownames(starWars)= titles
starWars

##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of\nthe Jedi     309.306  165.8

worldWide <- rowSums(starWars)
worldWide

##              A New Hope The Empire Strikes Back     Return of\nthe Jedi 
##                 775.398                 538.375                 475.106

BigStarWar <- cbind(starWars, worldWide)
BigStarWar

##                              US non-US worldWide
## A New Hope              460.998  314.4   775.398
## The Empire Strikes Back 290.475  247.9   538.375
## Return of\nthe Jedi     309.306  165.8   475.106

# Prequels
phantom_menace <- c(474.5, 552.5)
attack_clones <- c(310.7, 338.7)
revenge_sith <- c(380.3, 468.5)
titles2 <- c("phantom_menace", "attack_clones", "revenge_sith")
region2 <- c("US", "non-US")

starWars2 <- matrix(data = c(phantom_menace, attack_clones, revenge_sith),
                    nrow = 3,
                    byrow = TRUE)

colnames(starWars2) = region2
rownames(starWars2)= titles2
starWars2

##                   US non-US
## phantom_menace 474.5  552.5
## attack_clones  310.7  338.7
## revenge_sith   380.3  468.5

allStarWars <- rbind(starWars, starWars2)
allStarWars

##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of\nthe Jedi     309.306  165.8
## phantom_menace          474.500  552.5
## attack_clones           310.700  338.7
## revenge_sith            380.300  468.5

colSums(allStarWars)

##       US   non-US 
## 2226.279 2087.800

3.)

college <- read.csv("http://faculty.marshall.usc.edu/gareth-james/ISL/College.csv",header=TRUE)

rownames(college) <- college[,1]

college <- college[,-1]
#summary(college)

C a

summary(college)

##  Private        Apps           Accept          Enroll       Top10perc    
##  No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
##  Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
##            Median : 1558   Median : 1110   Median : 434   Median :23.00  
##            Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
##            3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
##            Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
##    Top25perc      F.Undergrad     P.Undergrad         Outstate    
##  Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
##  1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
##  Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
##  Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
##  3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
##  Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
##    Room.Board       Books           Personal         PhD        
##  Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
##  1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
##  Median :4200   Median : 500.0   Median :1200   Median : 75.00  
##  Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
##  3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
##  Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
##     Terminal       S.F.Ratio      perc.alumni        Expend     
##  Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
##  1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
##  Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
##  Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
##  3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
##  Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
##    Grad.Rate     
##  Min.   : 10.00  
##  1st Qu.: 53.00  
##  Median : 65.00  
##  Mean   : 65.46  
##  3rd Qu.: 78.00  
##  Max.   :118.00

pairs(college[,1:10])

plot(college$Private, college$Outstate)

Elite <- rep("No", nrow(college))
Elite[college$Top10perc > 50] = "Yes"
Elite <- as.factor(Elite)
college <- data.frame(college, Elite)        

summary(Elite)

##  No Yes 
## 699  78

plot(college$Outstate, college$Elite)