An Introduction to Statistical Learning

2. Statistical Learning - Exercises

Conceptual

1. For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer.

The sample size n is extremely large, and the number of predictors p is small.
better, with a large sample size the model is less likely to overfit when we increase flexibility
The number of predictors p is extremely large, and the number of observations n is small.
worse, with a small sample size a more flexible model will overfit the training data very fast
The relationship between the predictors and response is highly non-linear.
better, since less flexible models like linear regression perform poorly on highly non-linear data.
The variance of the error terms, i.e.σ2=Var(e), is extremely high.
worse, highly flexible models will follow the noise to closely –> overfitting

2. Explain whether each scenario is a classiﬁcation or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p.

We collect a set of data on the top 500 ﬁrms in the US. For each ﬁrm we record proﬁt, number of employees, industry and the CEO salary. We are interested in understanding which factors aﬀect CEO salary.

n = 500, p= 3, CEO salary is a continous variable so this is a regression problem and since we are interested in understanding which factors influence the CEO salary it is an inference problem

We are considering launching a new product and wish to know whether it will be a success or a failure. We collect data on 20 similar products that were previously launched. For each product we have recorded whether it was a success or failure, price charged for the product, marketing budget, competition price, and ten other variables.

n = 20, p= 13, success or failure -> categorical –> classification , since we want to know if a given product is a success or failure it is a prediction problem.

We are interest in predicting the % change in the USD/Euro exchange rate in relation to the weekly changes in the world stock markets. Hence we collect weekly data for all of 2012. For each week we record the % change in the USD/Euro, the % change in the US market, the % change in the British market, and the % change in the German market.

n = 52, p= 3, since y is continous –> regression, prediction

3. We now revisit the bias-variance decomposition.

Provide a sketch of typical (squared) bias, variance, training error, test error, and Bayes (or irreducible) error curves, on a single plot, as we go from less ﬂexible statistical learning methods towards more ﬂexible approaches. The x-axis should represent the amount of ﬂexibility in the method, and the y-axis should represent the values for each curve. There should be ﬁve curves. Make sure to label each one.

Explain why each of the ﬁve curves has the shape displayed in part (a)
- train-MSE: monotonically decreasing up to a point where it perfectly fits the data
- test-MSE: declining sharply with increased flexibility but is starting to rise again when the training model starts to overfit the training data (model can’t generalize to new data)
- Var(e): constant, irreducible error, best possible value test MSE can achieve
- squared bias: the error that is introduced by explaining a complicated real world problem through a simple model, decreases with increasing flexibility

Variance: the amout by which \(\hat{f}\) changes if we estimate it using different training data

4. You will now think of some real-life applications for statistical learning.

Describe three real-life applications in which classiﬁcation might be useful. Describe the response, as well as the predictors. Is the goal of each application inference or prediction? Explain your answer.
- Image Classification Plants: 3 dimensional RBG matrices, plant name, prediction
- Predict Gender of a person based on different predictors like salary, education, etc , response = gender, prediction
- credit card fraud detection, a ton of different predictors, fraud = response, prediction
Describe three real-life applications in which regression might be useful. Describe the response, as well as the predictors. Is the goal of each application inference or prediction? Explain your answer.
- predict success chance of candies, ingredients as well as several other features, response = predicted success chance -> inference
- which features make a good candy, analyse ingredients and other features to infere which are the most important ones regarding a candies success –> inference
- predict housing prices in certain areas based on several features such as number of rooms, ocean proximity, lat, long, etc, response = price, prediction
Describe three real-life applications in which cluster analysis might be useful.
- discover customer segments
- discover similiar music
- discover similiar movies

5. What are the advantages and disadvantages of a very ﬂexible (versus a less ﬂexible) approach for regression or classiﬁcation? Under what circumstances might a more ﬂexible approach be preferred to a less ﬂexible approach? When might a less ﬂexible approach be preferred?

The advantages of a flexible approach are that it reduces bias and fits non-linear models better. The disadvantages of a flexible approach are that it requires estimating a greater number of parameters and it will overfit at some point. Furthermore it increases the model variance. A more flexible approach would be prefered if we are interested in prediction and not the interpretability of the results. A less flexible approach would be prefered if we are interested in inference and the interpretability of the results.

6. Describe the diﬀerences between a parametric and a non-parametric statistical learning approach. What are the advantages of a parametric approach to regression or classiﬁcation (as opposed to a nonparametric approach)? What are its disadvantages?

A parametric approach reduces the problem of estimating f down to one of estimating a set of parameters because it assumes a form of f. A non parametric approach does not asume a particular form of f but requires a very large sample size to accurately estimate f. Advantages: simplifing the modelling of f down to one of estimating a set of parameters Disadvantages: The model we chose will usually not match the true form of f –> estimate will be poor. We can avoid this by adding flexibilty. But the more flexible or model is the more it tends to overfit.

7. The table below provides a training data set containing six observations, three predictors, and one qualitative response variable. Suppose we wish to use this data set to make a prediction for Y when X1 = X2 = X3 = 0 using K-nearest neighbors.

Obs.	X₁	X₂	X₃	Y
1	0	3	0	Red
2	2	0	0	Red
3	0	1	3	Red
4	0	1	2	Green
5	-1	0	1	Green
6	1	1	1	Red

Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0

obs1 = c(0, 3, 0)
obs2 = c(2, 0, 0)
obs3 = c(0, 1, 3)
obs4 = c(0, 1, 2)
obs5 = c(-1, 0, 1)
obs6 = c(1, 1, 1)
obs0 = c(0, 0, 0)
sqrt(sum((obs1-obs0)^2))

## [1] 3

sqrt(sum((obs2-obs0)^2))

## [1] 2

sqrt(sum((obs3-obs0)^2))

## [1] 3.162278

sqrt(sum((obs4-obs0)^2))

## [1] 2.236068

sqrt(sum((obs5-obs0)^2))

## [1] 1.414214

sqrt(sum((obs6-obs0)^2))

## [1] 1.732051

What is our prediction with K = 1? Why?
nearest point is obs5 –> prediction is green
What is our prediction with K = 3? Why?
nearest points are ob5, obs6, obs 2 –> prediction is red with 2/3 probability
If the Bayes decision boundary in this problem is highly nonlinear, then would we expect the best value for K to be large or small? Why?
When K becomes larger, the decision boundary becomes inflexible (almost linear). Therefore K should be small

Applied

8. College Data

Use the read.csv() function to read the data into R. Call the loaded data “college”. Make sure that you have the directory set to the correct location for the data.

library(ISLR)
data(College)
college = read.csv("College.csv")

Look at the data using the fix() function. You should notice that the first column is just the name of each university. We don’t really want R to treat this as data. However, it may be handy to have these names for later.

rownames(college)=college[,1]
college =college [,-1] 
fix(college)

Use the summary() function to produce a numerical summary of the variables in the data set.

summary(college)

##  Private        Apps           Accept          Enroll       Top10perc    
##  No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
##  Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
##            Median : 1558   Median : 1110   Median : 434   Median :23.00  
##            Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
##            3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
##            Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
##    Top25perc      F.Undergrad     P.Undergrad         Outstate    
##  Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
##  1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
##  Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
##  Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
##  3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
##  Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
##    Room.Board       Books           Personal         PhD        
##  Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
##  1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
##  Median :4200   Median : 500.0   Median :1200   Median : 75.00  
##  Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
##  3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
##  Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
##     Terminal       S.F.Ratio      perc.alumni        Expend     
##  Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
##  1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
##  Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
##  Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
##  3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
##  Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
##    Grad.Rate     
##  Min.   : 10.00  
##  1st Qu.: 53.00  
##  Median : 65.00  
##  Mean   : 65.46  
##  3rd Qu.: 78.00  
##  Max.   :118.00

Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of the data.

pairs(college[, 1:10])

Use the plot() function to produce side-by-side boxplots of “Outstate” versus “Private”.

plot(college$Private, college$Outstate, xlab = "Private University", ylab ="Out of State tuition in USD", main = "Outstate Tuition Plot", col=c('powderblue', 'mistyrose'))

Create a new qualitative variable, called Elite, by binning the Top10perc variable .

Elite=rep("No",nrow(college ))
Elite[college$Top10perc >50]=" Yes"
Elite=as.factor(Elite)
college=data.frame(college ,Elite)
summary(college$Elite)

##  Yes   No 
##   78  699

plot(college$Elite, college$Outstate, xlab = "Elite University", ylab ="Out of State tuition in USD", main = "Outstate Tuition Plot", col=c('powderblue', 'mistyrose'))

Use the hist() function to produce some histograms with diﬀering numbers of bins for a few of the quantitative variables.

par(mfrow = c(2,2))
hist(college$Apps, col = 5, xlab = "Accepted Applications", ylab = "Count")
hist(college$PhD, col = 2, xlab = "PhD", ylab = "Count")
hist(college$Grad.Rate, col = 3, xlab = "Grad Rate", ylab = "Count")
hist(college$Personal, col = 4, xlab = "Estimated Personal Spending", ylab = "Count")

Continue exploring the data, and provide a brief summary of what you discover.

summary(college$Grad.Rate)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   53.00   65.00   65.46   78.00  118.00

A graduation rate of 118% is quite weird.

9. This exercise involves the “Auto” data set studied in the lab. Make sure the missing values have been removed from the data.

Which of the predictors are quantitative, and which are qualitative?

Auto = read.csv("Auto.csv", na.strings = "?")
Auto = na.omit(Auto)
str(Auto)

## 'data.frame':    392 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : int  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : int  3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : int  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
##  - attr(*, "na.action")= 'omit' Named int  33 127 331 337 355
##   ..- attr(*, "names")= chr  "33" "127" "331" "337" ...

All but origin and name are quantitative.

What is the range of each quantitative predictor?

range(Auto$mpg)

## [1]  9.0 46.6

range(Auto$cylinders)

## [1] 3 8

range(Auto$displacement)

## [1]  68 455

range(Auto$horsepower)

## [1]  46 230

range(Auto$weight)

## [1] 1613 5140

range(Auto$acceleration)

## [1]  8.0 24.8

range(Auto$year)

## [1] 70 82

What is the mean and standard deviation of each quantitative predictor?

sapply(Auto[,1:7], mean)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##    23.445918     5.471939   194.411990   104.469388  2977.584184    15.541327 
##         year 
##    75.979592

sapply(Auto[,1:7], sd)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##     7.805007     1.705783   104.644004    38.491160   849.402560     2.758864 
##         year 
##     3.683737

Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?

Auto_dropped = Auto[-(10:85),-(8:9)]
sapply(Auto_dropped, range)

##       mpg cylinders displacement horsepower weight acceleration year
## [1,] 11.0         3           68         46   1649          8.5   70
## [2,] 46.6         8          455        230   4997         24.8   82

sapply(Auto_dropped, mean)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##    24.404430     5.373418   187.240506   100.721519  2935.971519    15.726899 
##         year 
##    77.145570

sapply(Auto_dropped, sd)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##     7.867283     1.654179    99.678367    35.708853   811.300208     2.693721 
##         year 
##     3.106217

Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings.

pairs(Auto[,1:7])

cor(Auto[,1:7], method = "pearson")

##                     mpg  cylinders displacement horsepower     weight
## mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442
## cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273
## displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944
## horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377
## weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000
## acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392
## year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199
##              acceleration       year
## mpg             0.4233285  0.5805410
## cylinders      -0.5046834 -0.3456474
## displacement   -0.5438005 -0.3698552
## horsepower     -0.6891955 -0.4163615
## weight         -0.4168392 -0.3091199
## acceleration    1.0000000  0.2903161
## year            0.2903161  1.0000000

Some Variables are heavily correlated.

Suppose that we wish to predict gas mileage (“mpg”) on the basis of other variables. Do your plots suggest that any of the other variables might be useful in predicting “mpg”?

Yes, since almost every variable is at least moderately correlated with mpg.

10. This exercise involves the “Boston” housing data set.

To begin, load in the “Boston” data set.

require(MASS)
data(Boston)
str(Boston)

## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ black  : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

Make some pairwise scatterplots of the predictors in this data set.

pairs(Boston)

Are any of the predictors associated with per capita crime rate?

cor(Boston, method = "pearson")

##                crim          zn       indus         chas         nox
## crim     1.00000000 -0.20046922  0.40658341 -0.055891582  0.42097171
## zn      -0.20046922  1.00000000 -0.53382819 -0.042696719 -0.51660371
## indus    0.40658341 -0.53382819  1.00000000  0.062938027  0.76365145
## chas    -0.05589158 -0.04269672  0.06293803  1.000000000  0.09120281
## nox      0.42097171 -0.51660371  0.76365145  0.091202807  1.00000000
## rm      -0.21924670  0.31199059 -0.39167585  0.091251225 -0.30218819
## age      0.35273425 -0.56953734  0.64477851  0.086517774  0.73147010
## dis     -0.37967009  0.66440822 -0.70802699 -0.099175780 -0.76923011
## rad      0.62550515 -0.31194783  0.59512927 -0.007368241  0.61144056
## tax      0.58276431 -0.31456332  0.72076018 -0.035586518  0.66802320
## ptratio  0.28994558 -0.39167855  0.38324756 -0.121515174  0.18893268
## black   -0.38506394  0.17552032 -0.35697654  0.048788485 -0.38005064
## lstat    0.45562148 -0.41299457  0.60379972 -0.053929298  0.59087892
## medv    -0.38830461  0.36044534 -0.48372516  0.175260177 -0.42732077
##                  rm         age         dis          rad         tax    ptratio
## crim    -0.21924670  0.35273425 -0.37967009  0.625505145  0.58276431  0.2899456
## zn       0.31199059 -0.56953734  0.66440822 -0.311947826 -0.31456332 -0.3916785
## indus   -0.39167585  0.64477851 -0.70802699  0.595129275  0.72076018  0.3832476
## chas     0.09125123  0.08651777 -0.09917578 -0.007368241 -0.03558652 -0.1215152
## nox     -0.30218819  0.73147010 -0.76923011  0.611440563  0.66802320  0.1889327
## rm       1.00000000 -0.24026493  0.20524621 -0.209846668 -0.29204783 -0.3555015
## age     -0.24026493  1.00000000 -0.74788054  0.456022452  0.50645559  0.2615150
## dis      0.20524621 -0.74788054  1.00000000 -0.494587930 -0.53443158 -0.2324705
## rad     -0.20984667  0.45602245 -0.49458793  1.000000000  0.91022819  0.4647412
## tax     -0.29204783  0.50645559 -0.53443158  0.910228189  1.00000000  0.4608530
## ptratio -0.35550149  0.26151501 -0.23247054  0.464741179  0.46085304  1.0000000
## black    0.12806864 -0.27353398  0.29151167 -0.444412816 -0.44180801 -0.1773833
## lstat   -0.61380827  0.60233853 -0.49699583  0.488676335  0.54399341  0.3740443
## medv     0.69535995 -0.37695457  0.24992873 -0.381626231 -0.46853593 -0.5077867
##               black      lstat       medv
## crim    -0.38506394  0.4556215 -0.3883046
## zn       0.17552032 -0.4129946  0.3604453
## indus   -0.35697654  0.6037997 -0.4837252
## chas     0.04878848 -0.0539293  0.1752602
## nox     -0.38005064  0.5908789 -0.4273208
## rm       0.12806864 -0.6138083  0.6953599
## age     -0.27353398  0.6023385 -0.3769546
## dis      0.29151167 -0.4969958  0.2499287
## rad     -0.44441282  0.4886763 -0.3816262
## tax     -0.44180801  0.5439934 -0.4685359
## ptratio -0.17738330  0.3740443 -0.5077867
## black    1.00000000 -0.3660869  0.3334608
## lstat   -0.36608690  1.0000000 -0.7376627
## medv     0.33346082 -0.7376627  1.0000000

The variables rad and tax have the strongest linear relationship with crime per capita.

Do any of the suburbs of Boston appear to have particularly high crime rates? Tax rates? Pupil-teacher ratios?

hist(Boston$crim, breaks = 25)

nrow(  Boston[Boston$crim > 20, ])

## [1] 18

hist(Boston$tax, breaks = 25)

hist(Boston$ptratio, breaks = 25)

All three variables have some outliers.

How many of the suburbs in this data set bound the Charles river?

nrow(Boston[Boston$chas == 1, ])

## [1] 35

What is the median pupil-teacher ratio among the towns in this data set?

t(Boston[Boston$medv == min(Boston$medv),])

##              399      406
## crim     38.3518  67.9208
## zn        0.0000   0.0000
## indus    18.1000  18.1000
## chas      0.0000   0.0000
## nox       0.6930   0.6930
## rm        5.4530   5.6830
## age     100.0000 100.0000
## dis       1.4896   1.4254
## rad      24.0000  24.0000
## tax     666.0000 666.0000
## ptratio  20.2000  20.2000
## black   396.9000 384.9700
## lstat    30.5900  22.9800
## medv      5.0000   5.0000

sapply(Boston, quantile)

##           crim    zn indus chas   nox     rm     age       dis rad tax ptratio
## 0%    0.006320   0.0  0.46    0 0.385 3.5610   2.900  1.129600   1 187   12.60
## 25%   0.082045   0.0  5.19    0 0.449 5.8855  45.025  2.100175   4 279   17.40
## 50%   0.256510   0.0  9.69    0 0.538 6.2085  77.500  3.207450   5 330   19.05
## 75%   3.677083  12.5 18.10    0 0.624 6.6235  94.075  5.188425  24 666   20.20
## 100% 88.976200 100.0 27.74    1 0.871 8.7800 100.000 12.126500  24 711   22.00
##         black  lstat   medv
## 0%     0.3200  1.730  5.000
## 25%  375.3775  6.950 17.025
## 50%  391.4400 11.360 21.200
## 75%  396.2250 16.955 25.000
## 100% 396.9000 37.970 50.000

Which suburb of Boston has lowest median value of owner-occupied homes? What are the values of the other predictors for that suburb, and how do those values compare to the overall ranges for those predictors?
In this data set, how many of the suburbs average more than seven rooms per dwelling? More than eight rooms per dwelling?

nrow(Boston[Boston$rm > 7,])

## [1] 64

nrow(Boston[Boston$rm > 8,])

## [1] 13

summary(subset(Boston, rm > 8))

##       crim               zn            indus             chas       
##  Min.   :0.02009   Min.   : 0.00   Min.   : 2.680   Min.   :0.0000  
##  1st Qu.:0.33147   1st Qu.: 0.00   1st Qu.: 3.970   1st Qu.:0.0000  
##  Median :0.52014   Median : 0.00   Median : 6.200   Median :0.0000  
##  Mean   :0.71879   Mean   :13.62   Mean   : 7.078   Mean   :0.1538  
##  3rd Qu.:0.57834   3rd Qu.:20.00   3rd Qu.: 6.200   3rd Qu.:0.0000  
##  Max.   :3.47428   Max.   :95.00   Max.   :19.580   Max.   :1.0000  
##       nox               rm             age             dis       
##  Min.   :0.4161   Min.   :8.034   Min.   : 8.40   Min.   :1.801  
##  1st Qu.:0.5040   1st Qu.:8.247   1st Qu.:70.40   1st Qu.:2.288  
##  Median :0.5070   Median :8.297   Median :78.30   Median :2.894  
##  Mean   :0.5392   Mean   :8.349   Mean   :71.54   Mean   :3.430  
##  3rd Qu.:0.6050   3rd Qu.:8.398   3rd Qu.:86.50   3rd Qu.:3.652  
##  Max.   :0.7180   Max.   :8.780   Max.   :93.90   Max.   :8.907  
##       rad              tax           ptratio          black      
##  Min.   : 2.000   Min.   :224.0   Min.   :13.00   Min.   :354.6  
##  1st Qu.: 5.000   1st Qu.:264.0   1st Qu.:14.70   1st Qu.:384.5  
##  Median : 7.000   Median :307.0   Median :17.40   Median :386.9  
##  Mean   : 7.462   Mean   :325.1   Mean   :16.36   Mean   :385.2  
##  3rd Qu.: 8.000   3rd Qu.:307.0   3rd Qu.:17.40   3rd Qu.:389.7  
##  Max.   :24.000   Max.   :666.0   Max.   :20.20   Max.   :396.9  
##      lstat           medv     
##  Min.   :2.47   Min.   :21.9  
##  1st Qu.:3.32   1st Qu.:41.7  
##  Median :4.14   Median :48.3  
##  Mean   :4.31   Mean   :44.2  
##  3rd Qu.:5.12   3rd Qu.:50.0  
##  Max.   :7.44   Max.   :50.0

summary(Boston)

##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00