#Victor Ramos Hw 2

Question 2)

2a)

Inference, N = CEO salary, P = profit, number of employees, industry

2b)

Predictive, N = Price charged, P = marketing budget, competition price, ten other variables

2c)

Predictive, N = % change in USD/Euro, P = % change in US, % change in British, % change in German

Question 5)

Inflexible regression can generate a smaller deviation of F and is easier to interpret. Inflexible methods have higher bias, and greater error. Flexible approach gives a range of variation to estimate f, the model would be more complicated, but it will have less error and bias.

Question 6)

Parametric reduces the problem of estimating f to a set of parameters, it is easier than a single function of F. Parametric is less flexible, so it is easier to interpret. The disadvantage of parametric approaches is that they have a chance of a model not fitting the data well to the true value of F.

Non-Parametric accurately fit a wider range of F and is flexible, the advantage being that they try to get closer to the points on the curve of F. Disadvantage is that a very large number of observations is required to obtain for an accurate estimate of F. Without many observations then the model will not be accurate.

Question 8)

8a)

college <- read.csv("https://www.statlearning.com/s/College.csv")
fix(college)
rownames(college)=college[,1]
fix(college)

8b)

college=college[,-1]
fix(college)
summary(college)
##    Private               Apps           Accept          Enroll    
##  Length:777         Min.   :   81   Min.   :   72   Min.   :  35  
##  Class :character   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242  
##  Mode  :character   Median : 1558   Median : 1110   Median : 434  
##                     Mean   : 3002   Mean   : 2019   Mean   : 780  
##                     3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902  
##                     Max.   :48094   Max.   :26330   Max.   :6392  
##    Top10perc       Top25perc      F.Undergrad     P.Undergrad     
##  Min.   : 1.00   Min.   :  9.0   Min.   :  139   Min.   :    1.0  
##  1st Qu.:15.00   1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0  
##  Median :23.00   Median : 54.0   Median : 1707   Median :  353.0  
##  Mean   :27.56   Mean   : 55.8   Mean   : 3700   Mean   :  855.3  
##  3rd Qu.:35.00   3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0  
##  Max.   :96.00   Max.   :100.0   Max.   :31643   Max.   :21836.0  
##     Outstate       Room.Board       Books           Personal   
##  Min.   : 2340   Min.   :1780   Min.   :  96.0   Min.   : 250  
##  1st Qu.: 7320   1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850  
##  Median : 9990   Median :4200   Median : 500.0   Median :1200  
##  Mean   :10441   Mean   :4358   Mean   : 549.4   Mean   :1341  
##  3rd Qu.:12925   3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700  
##  Max.   :21700   Max.   :8124   Max.   :2340.0   Max.   :6800  
##       PhD            Terminal       S.F.Ratio      perc.alumni   
##  Min.   :  8.00   Min.   : 24.0   Min.   : 2.50   Min.   : 0.00  
##  1st Qu.: 62.00   1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00  
##  Median : 75.00   Median : 82.0   Median :13.60   Median :21.00  
##  Mean   : 72.66   Mean   : 79.7   Mean   :14.09   Mean   :22.74  
##  3rd Qu.: 85.00   3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00  
##  Max.   :103.00   Max.   :100.0   Max.   :39.80   Max.   :64.00  
##      Expend        Grad.Rate     
##  Min.   : 3186   Min.   : 10.00  
##  1st Qu.: 6751   1st Qu.: 53.00  
##  Median : 8377   Median : 65.00  
##  Mean   : 9660   Mean   : 65.46  
##  3rd Qu.:10830   3rd Qu.: 78.00  
##  Max.   :56233   Max.   :118.00

8c)

pairs(college[,2:11])

attach(college)

I was getting this error for the plot() part Error in plot.window(…) : need finite ‘ylim’ values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
Im a noob so only thing I can think of was this as.factor

Private=as.factor(Private)
plot(Private, Outstate)

Elite =rep("No",nrow(college )) 
Elite[college$Top10perc>50]=" Yes" 
Elite=as.factor(Elite) 
college=data.frame(college, Elite)
summary(Elite)
##  Yes   No 
##   78  699
plot(Elite, Outstate)

par(mfrow=c(2,2))
hist(Top10perc)
hist(Top25perc)
hist(Room.Board)
hist(Grad.Rate)

detach(college)

Most Colleges hover around 4,000 for room and board. Grad rates for many colleges hover around 65%

##Question 9

auto <- read.csv("https://www.statlearning.com/s/Auto.csv")
auto=na.omit(auto)
attach(auto)
horsepower = as.numeric(horsepower)
## Warning: NAs introduced by coercion

The following uses the auto data

9a)Which are Quantitative?

mpg, displacement, weight, acceleration, horsepower, year

9b)Range of each Quantitative?

Mpg range 9, 46.6
Weight range 1613, 5140
Acceleration range 8, 24.8
Displacement range 68, 455
Horsepower range is NA, NA
Year range is 70, 82

9c)Mean and Std dev of each quantitative

mean of mpg is 23.515869 and deviation 7.8258039
mean of weight is 2970.2619647 and deviation 847.9041195
mean of accel is 15.5556675 and deviation 2.7499953
mean of displ is 193.5327456 and deviation 104.3795833
mean of horsepower is NA and deviation NA mean of year is 75.9949622 and deviation 3.6900049

9d)remove the tenth through 85th row

auto.r<-auto[-c(10:85),]
range(auto.r$mpg); mean(auto.r$mpg); sd(auto.r$mpg)
## [1] 11.0 46.6
## [1] 24.43863
## [1] 7.908184
range(auto.r$weight); mean(auto.r$weight); sd(auto.r$weight)
## [1] 1649 4997
## [1] 2933.963
## [1] 810.6429
range(auto.r$acceleration); mean(auto.r$acceleration); sd(auto.r$acceleration)
## [1]  8.5 24.8
## [1] 15.72305
## [1] 2.680514
range(auto.r$displacement); mean(auto.r$displacement); sd(auto.r$displacement)
## [1]  68 455
## [1] 187.0498
## [1] 99.63539
range(auto.r$horsepower); mean(auto.r$horsepower); sd(auto.r$horsepower)
## [1] "?"  "98"
## Warning in mean.default(auto.r$horsepower): argument is not numeric or logical:
## returning NA
## [1] NA
## Warning in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm =
## na.rm): NAs introduced by coercion
## [1] NA
range(auto.r$year); mean(auto.r$year); sd(auto.r$year)
## [1] 70 82
## [1] 77.15265
## [1] 3.11123

9e)using the full data set, create plots and comment findings

# Heavier the car the least efficient it becomes with mpg as shown below
plot(mpg, weight)

# In general the more cylinders the least mpg as shown below
plot(mpg, cylinders)

# In general the newer the car the more fuel efficient it becomes, as shown below
plot(mpg, year)

9f) I think there is anough data to predict mpg

10a)

library(MASS)
boston <- Boston
summary(boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00
detach(auto)
attach(boston)

506 rows, 14 columns. The columns represent qualitative and quantitative data from crime rate to proportion of residential land zoned, to age of owner-occupied unites build prior to 1940.

10b) make pairwise scatterplot and describe findings

pairs(boston)

pairs(~ crim + zn + indus + dis + tax + rad + medv, boston)

10c)Any related to crime as a good predictor?

Age of the house seems to have a connection to crime.

plot(age, crim)

There appears to be a strong conneciton with the distance of employment centers and the crime rate.

plot(dis, crim)

There appears to be greater chances of crime if property tax rates are high

  plot(tax, crim)

The higher concentration of lower status population increases the chances of crime.

plot(lstat, crim)

The lower the median value of occupied homes the higher the chances for crime.

plot(medv, crim)

10d)Do any of the suburbs of Boston appear to have particularly high crime rates? Tax rates? Pupil-teacher ratios? Comment on the range of each predictor.

A few of the suburbs or neighborhoods have high crime ratse, while most other are lower than 10 per capita

hist(crim, breaks=15)

Many suburbs have low property taxes, there are some who have high taxes passing 700

hist(tax, breaks=35)

There appears to be some suburbs with a higher pupil to teacher ratio. Probably linked to property taxes.

hist(ptratio, breaks=10)

10d) How many suburbs bound the charles river?

dim(subset(boston, chas == 1))
## [1] 35 14

35 suburbs border charles river

10f) median student teacher ratio among towns in this data set

summary(ptratio)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.60   17.40   19.05   18.46   20.20   22.00

Median is 19.05

10g)

The lowest median value suburb isn’t by the river, tax is high, there is a higher crime rate there.

t(subset(boston, medv == min(boston$medv)))
##              399      406
## crim     38.3518  67.9208
## zn        0.0000   0.0000
## indus    18.1000  18.1000
## chas      0.0000   0.0000
## nox       0.6930   0.6930
## rm        5.4530   5.6830
## age     100.0000 100.0000
## dis       1.4896   1.4254
## rad      24.0000  24.0000
## tax     666.0000 666.0000
## ptratio  20.2000  20.2000
## black   396.9000 384.9700
## lstat    30.5900  22.9800
## medv      5.0000   5.0000

10h)In this data set, how many of the suburbs average more than seven rooms per dwelling? More than eight rooms per dwelling? Comment on the suburbs that average more than eight rooms per dwelling.

dim(subset(boston, rm > 7))
## [1] 64 14

64 suburbs have an average of 7 rooms

dim(subset(boston, rm > 8))
## [1] 13 14

13 suburbs have an average of 8

Below is comparing the 8 avg room to the rest

Crime for the avg 8 room suburbs are higher than the median of the rest of Boston. It is also closer or have a higher proportion of non-retail businesses

summary(subset(boston, rm > 8))
##       crim               zn            indus             chas       
##  Min.   :0.02009   Min.   : 0.00   Min.   : 2.680   Min.   :0.0000  
##  1st Qu.:0.33147   1st Qu.: 0.00   1st Qu.: 3.970   1st Qu.:0.0000  
##  Median :0.52014   Median : 0.00   Median : 6.200   Median :0.0000  
##  Mean   :0.71879   Mean   :13.62   Mean   : 7.078   Mean   :0.1538  
##  3rd Qu.:0.57834   3rd Qu.:20.00   3rd Qu.: 6.200   3rd Qu.:0.0000  
##  Max.   :3.47428   Max.   :95.00   Max.   :19.580   Max.   :1.0000  
##       nox               rm             age             dis       
##  Min.   :0.4161   Min.   :8.034   Min.   : 8.40   Min.   :1.801  
##  1st Qu.:0.5040   1st Qu.:8.247   1st Qu.:70.40   1st Qu.:2.288  
##  Median :0.5070   Median :8.297   Median :78.30   Median :2.894  
##  Mean   :0.5392   Mean   :8.349   Mean   :71.54   Mean   :3.430  
##  3rd Qu.:0.6050   3rd Qu.:8.398   3rd Qu.:86.50   3rd Qu.:3.652  
##  Max.   :0.7180   Max.   :8.780   Max.   :93.90   Max.   :8.907  
##       rad              tax           ptratio          black      
##  Min.   : 2.000   Min.   :224.0   Min.   :13.00   Min.   :354.6  
##  1st Qu.: 5.000   1st Qu.:264.0   1st Qu.:14.70   1st Qu.:384.5  
##  Median : 7.000   Median :307.0   Median :17.40   Median :386.9  
##  Mean   : 7.462   Mean   :325.1   Mean   :16.36   Mean   :385.2  
##  3rd Qu.: 8.000   3rd Qu.:307.0   3rd Qu.:17.40   3rd Qu.:389.7  
##  Max.   :24.000   Max.   :666.0   Max.   :20.20   Max.   :396.9  
##      lstat           medv     
##  Min.   :2.47   Min.   :21.9  
##  1st Qu.:3.32   1st Qu.:41.7  
##  Median :4.14   Median :48.3  
##  Mean   :4.31   Mean   :44.2  
##  3rd Qu.:5.12   3rd Qu.:50.0  
##  Max.   :7.44   Max.   :50.0
summary(boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00

#The End