library('plotly')
library('dplyr')
library('ggplot2')
James, G., D. Witten, Hastie T., and R. Tibshirani. 2013. An
Introduction to Statistical Learning with
Applications in R. Springer. 7th
The table below provides a training data set containing six observations, three predictors, and one qualitative response variable. Obs. X1 X2 X3 Y 1 0 3 0 Red 2 2 0 0 Red 3 0 1 3 Red 4 0 1 2 Green 5 −1 0 1 Green 6 1 1 1 Red Suppose we wish to use this data set to make a prediction for Y when X1 = X2 = X3 = 0 using K-nearest neighbors. (a) Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0. (b) What is our prediction with K = 1? Why? (c) What is our prediction with K = 3? Why? (d) If the Bayes decision boundary in this problem is highly nonlinear, then would we expect the best value for K to be large or small? Why?
df <- data.frame(X1=c(0,2,0,0,-1,1),
X2=c(3,0,1,1,0,1),
X3=c(0,0,3,2,1,1),
Y=c('Red','Red','Red','Green','Green','Red'))
df <- df %>% mutate(d=sqrt(X1^2 +X2*X2 + X3*X3)) %>% arrange(d)
df
## X1 X2 X3 Y d
## 1 -1 0 1 Green 1.414214
## 2 1 1 1 Red 1.732051
## 3 2 0 0 Red 2.000000
## 4 0 1 2 Green 2.236068
## 5 0 3 0 Red 3.000000
## 6 0 1 3 Red 3.162278
From the df table, d is the Euclidean distance from the
test point Xo(0,0,0) to the 6 train points. Sort these distances
ascending for k nearest neighbor test.
If k=1, the 1 nearest observation is (-1,0,1) with Y=Green, hence we predict the test point (0,0,0) is Green.
If k=3, we will take 3 nearest neighbor points of Xo, those Y values are Green, Red, Red and we predict the test point Xo is Red in this case.
If the decision boundary is highly nonlinear, the Euclidean distance will be high and hence the high variance, so it is better to use the small k.
p <- plot_ly(df, x=~X1, y=~X2, z=~X3, color=~Y, colors=c('green','red'))
p
## No trace type specified:
## Based on info supplied, a 'scatter3d' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter3d
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
College dataset, page 54This will install or load ISLR package which contains sample datasets of the book.
if (!require('ISLR')){
install.packages('ISLR')
library('ISLR')
} else {
library('ISLR')
}
## Loading required package: ISLR
Summary the variables in the data set
summary(College)
## Private Apps Accept Enroll Top10perc
## No :212 Min. : 81 Min. : 72 Min. : 35 Min. : 1.00
## Yes:565 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00
## Median : 1558 Median : 1110 Median : 434 Median :23.00
## Mean : 3002 Mean : 2019 Mean : 780 Mean :27.56
## 3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00
## Max. :48094 Max. :26330 Max. :6392 Max. :96.00
## Top25perc F.Undergrad P.Undergrad Outstate
## Min. : 9.0 Min. : 139 Min. : 1.0 Min. : 2340
## 1st Qu.: 41.0 1st Qu.: 992 1st Qu.: 95.0 1st Qu.: 7320
## Median : 54.0 Median : 1707 Median : 353.0 Median : 9990
## Mean : 55.8 Mean : 3700 Mean : 855.3 Mean :10441
## 3rd Qu.: 69.0 3rd Qu.: 4005 3rd Qu.: 967.0 3rd Qu.:12925
## Max. :100.0 Max. :31643 Max. :21836.0 Max. :21700
## Room.Board Books Personal PhD
## Min. :1780 Min. : 96.0 Min. : 250 Min. : 8.00
## 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00
## Median :4200 Median : 500.0 Median :1200 Median : 75.00
## Mean :4358 Mean : 549.4 Mean :1341 Mean : 72.66
## 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00
## Max. :8124 Max. :2340.0 Max. :6800 Max. :103.00
## Terminal S.F.Ratio perc.alumni Expend
## Min. : 24.0 Min. : 2.50 Min. : 0.00 Min. : 3186
## 1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751
## Median : 82.0 Median :13.60 Median :21.00 Median : 8377
## Mean : 79.7 Mean :14.09 Mean :22.74 Mean : 9660
## 3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830
## Max. :100.0 Max. :39.80 Max. :64.00 Max. :56233
## Grad.Rate
## Min. : 10.00
## 1st Qu.: 53.00
## Median : 65.00
## Mean : 65.46
## 3rd Qu.: 78.00
## Max. :118.00
Produce a scatterplot of the first ten variables
pairs(College[,1:10])
boxplot of Outstate and Private as we can
see, Private universities have higher out of state tuition fee.
boxplot(College$Outstate~College$Private)
ggplot(College, aes(x=Private, y = Outstate, fill=Private)) +
geom_boxplot()
Create a new qualitative variable, called Elite, by binning the Top10perc variable. We are going to divide universities into two groups based on whether or not the proportion of students coming from the top 10% of their high school classes exceeds 50 %
College$Elite <- as.factor(ifelse(College$Top10perc > 50,'Yes','No'))
summary(College$Elite)
## No Yes
## 699 78
ggplot(College, aes(x=Elite, y = Outstate, fill=Elite)) +
geom_boxplot()
Use the hist() function to produce some histograms with differing numbers of bins for a few of the quantitative variables.
par(mfrow=c(2,3))
hist(College$Accept)
hist(College$Outstate)
hist(College$Top10perc)
hist(College$PhD)
hist(College$Grad.Rate)
hist(College$Enroll)
Continue exploring the data, and provide a brief summary of what you discover.
College$AcceptRate <- College$Accept / College$Apps
ggplot(College, aes(x=Elite, y = AcceptRate, fill=Private)) +
geom_boxplot()+
ggtitle('Acceptance Rate break down by Elite and Private uni')
From the above boxplot, the acceptance rate to Elite and/or Private universites is lower (aka harder) than the non-Elite uni.
College$AcceptRate <- College$Accept / College$Apps
ggplot(College, aes(x=Elite, y = Grad.Rate, fill=Private)) +
geom_boxplot()+
ggtitle('Graduation Rate break down by Elite and Private uni')
The above graph showing the graduation rate, Elite and private universities have the highest graduation rate (nearly 90%), followed by Elite and public, then non-elite private and the public non-elite universities have the lowest graduation rate, about 50%.
Auto data set, page 56From summary table,
quantitative predictors: mpg, displacement, horsepower, weight,
acceleration, year (can consider as categorical variable). Range from
min to max value.
qualitative predictors: name, origin (coded to quantitative),
cylinders
No missing values.
summary(Auto)
## mpg cylinders displacement horsepower weight
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
##
## acceleration year origin name
## Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
## 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
## Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
## Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
## 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
## Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
## (Other) :365
Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?
summary(Auto[-c(10:85),])
## mpg cylinders displacement horsepower weight
## Min. :11.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1649
## 1st Qu.:18.00 1st Qu.:4.000 1st Qu.:100.2 1st Qu.: 75.0 1st Qu.:2214
## Median :23.95 Median :4.000 Median :145.5 Median : 90.0 Median :2792
## Mean :24.40 Mean :5.373 Mean :187.2 Mean :100.7 Mean :2936
## 3rd Qu.:30.55 3rd Qu.:6.000 3rd Qu.:250.0 3rd Qu.:115.0 3rd Qu.:3508
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :4997
##
## acceleration year origin
## Min. : 8.50 Min. :70.00 Min. :1.000
## 1st Qu.:14.00 1st Qu.:75.00 1st Qu.:1.000
## Median :15.50 Median :77.00 Median :1.000
## Mean :15.73 Mean :77.15 Mean :1.601
## 3rd Qu.:17.30 3rd Qu.:80.00 3rd Qu.:2.000
## Max. :24.80 Max. :82.00 Max. :3.000
##
## name
## ford pinto : 5
## toyota corolla : 5
## amc matador : 4
## chevrolet chevette : 4
## amc hornet : 3
## chevrolet caprice classic: 3
## (Other) :292
Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings
pairs(Auto %>% select(-c(name,year, origin, cylinders)) )
positive relationship between horsepower and weight
negative relationship between mpg and weight,….
round(cor(Auto %>% select(-c(name))),2)
## mpg cylinders displacement horsepower weight acceleration year
## mpg 1.00 -0.78 -0.81 -0.78 -0.83 0.42 0.58
## cylinders -0.78 1.00 0.95 0.84 0.90 -0.50 -0.35
## displacement -0.81 0.95 1.00 0.90 0.93 -0.54 -0.37
## horsepower -0.78 0.84 0.90 1.00 0.86 -0.69 -0.42
## weight -0.83 0.90 0.93 0.86 1.00 -0.42 -0.31
## acceleration 0.42 -0.50 -0.54 -0.69 -0.42 1.00 0.29
## year 0.58 -0.35 -0.37 -0.42 -0.31 0.29 1.00
## origin 0.57 -0.57 -0.61 -0.46 -0.59 0.21 0.18
## origin
## mpg 0.57
## cylinders -0.57
## displacement -0.61
## horsepower -0.46
## weight -0.59
## acceleration 0.21
## year 0.18
## origin 1.00
model <- lm(mpg~cylinders + displacement + horsepower+ weight + acceleration + year + origin, data=Auto)
model <- lm(mpg~ weight + year + origin, data=Auto)
model
##
## Call:
## lm(formula = mpg ~ weight + year + origin, data = Auto)
##
## Coefficients:
## (Intercept) weight year origin
## -18.045850 -0.005994 0.757126 1.150391
summary(model)
##
## Call:
## lm(formula = mpg ~ weight + year + origin, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9440 -2.0948 -0.0389 1.7255 13.2722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.805e+01 4.001e+00 -4.510 8.60e-06 ***
## weight -5.994e-03 2.541e-04 -23.588 < 2e-16 ***
## year 7.571e-01 4.832e-02 15.668 < 2e-16 ***
## origin 1.150e+00 2.591e-01 4.439 1.18e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.348 on 388 degrees of freedom
## Multiple R-squared: 0.8175, Adjusted R-squared: 0.816
## F-statistic: 579.2 on 3 and 388 DF, p-value: < 2.2e-16
R-squared: 81.6% p value<0.05 for model with weight, year and origin.
Boston data set, page 56library('MASS')
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## The following object is masked from 'package:plotly':
##
## select
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
pairs(Boston)
cor(Boston)
## crim zn indus chas nox
## crim 1.00000000 -0.20046922 0.40658341 -0.055891582 0.42097171
## zn -0.20046922 1.00000000 -0.53382819 -0.042696719 -0.51660371
## indus 0.40658341 -0.53382819 1.00000000 0.062938027 0.76365145
## chas -0.05589158 -0.04269672 0.06293803 1.000000000 0.09120281
## nox 0.42097171 -0.51660371 0.76365145 0.091202807 1.00000000
## rm -0.21924670 0.31199059 -0.39167585 0.091251225 -0.30218819
## age 0.35273425 -0.56953734 0.64477851 0.086517774 0.73147010
## dis -0.37967009 0.66440822 -0.70802699 -0.099175780 -0.76923011
## rad 0.62550515 -0.31194783 0.59512927 -0.007368241 0.61144056
## tax 0.58276431 -0.31456332 0.72076018 -0.035586518 0.66802320
## ptratio 0.28994558 -0.39167855 0.38324756 -0.121515174 0.18893268
## black -0.38506394 0.17552032 -0.35697654 0.048788485 -0.38005064
## lstat 0.45562148 -0.41299457 0.60379972 -0.053929298 0.59087892
## medv -0.38830461 0.36044534 -0.48372516 0.175260177 -0.42732077
## rm age dis rad tax ptratio
## crim -0.21924670 0.35273425 -0.37967009 0.625505145 0.58276431 0.2899456
## zn 0.31199059 -0.56953734 0.66440822 -0.311947826 -0.31456332 -0.3916785
## indus -0.39167585 0.64477851 -0.70802699 0.595129275 0.72076018 0.3832476
## chas 0.09125123 0.08651777 -0.09917578 -0.007368241 -0.03558652 -0.1215152
## nox -0.30218819 0.73147010 -0.76923011 0.611440563 0.66802320 0.1889327
## rm 1.00000000 -0.24026493 0.20524621 -0.209846668 -0.29204783 -0.3555015
## age -0.24026493 1.00000000 -0.74788054 0.456022452 0.50645559 0.2615150
## dis 0.20524621 -0.74788054 1.00000000 -0.494587930 -0.53443158 -0.2324705
## rad -0.20984667 0.45602245 -0.49458793 1.000000000 0.91022819 0.4647412
## tax -0.29204783 0.50645559 -0.53443158 0.910228189 1.00000000 0.4608530
## ptratio -0.35550149 0.26151501 -0.23247054 0.464741179 0.46085304 1.0000000
## black 0.12806864 -0.27353398 0.29151167 -0.444412816 -0.44180801 -0.1773833
## lstat -0.61380827 0.60233853 -0.49699583 0.488676335 0.54399341 0.3740443
## medv 0.69535995 -0.37695457 0.24992873 -0.381626231 -0.46853593 -0.5077867
## black lstat medv
## crim -0.38506394 0.4556215 -0.3883046
## zn 0.17552032 -0.4129946 0.3604453
## indus -0.35697654 0.6037997 -0.4837252
## chas 0.04878848 -0.0539293 0.1752602
## nox -0.38005064 0.5908789 -0.4273208
## rm 0.12806864 -0.6138083 0.6953599
## age -0.27353398 0.6023385 -0.3769546
## dis 0.29151167 -0.4969958 0.2499287
## rad -0.44441282 0.4886763 -0.3816262
## tax -0.44180801 0.5439934 -0.4685359
## ptratio -0.17738330 0.3740443 -0.5077867
## black 1.00000000 -0.3660869 0.3334608
## lstat -0.36608690 1.0000000 -0.7376627
## medv 0.33346082 -0.7376627 1.0000000
Do any of the suburbs of Boston appear to have particularly high crime rates? Tax rates? Pupil-teacher ratios? Comment on the range of each predictor.
summary(Boston[,c('crim','tax','ptratio')])
## crim tax ptratio
## Min. : 0.00632 Min. :187.0 Min. :12.60
## 1st Qu.: 0.08205 1st Qu.:279.0 1st Qu.:17.40
## Median : 0.25651 Median :330.0 Median :19.05
## Mean : 3.61352 Mean :408.2 Mean :18.46
## 3rd Qu.: 3.67708 3rd Qu.:666.0 3rd Qu.:20.20
## Max. :88.97620 Max. :711.0 Max. :22.00
Yes, max value of crim, tax are much higher than the mean values.
How many of the suburbs in this data set bound the Charles river?
sum(Boston$chas==1)
## [1] 35
What is the median pupil-teacher ratio among the towns in this data set?
median(Boston$ptratio)
## [1] 19.05
Which suburb of Boston has lowest median value of owneroccupied homes? What are the values of the other predictors for that suburb, and how do those values compare to the overall ranges for those predictors? Comment on your findings
Boston[Boston$medv==min(Boston$medv),]
## crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 399 38.3518 0 18.1 0 0.693 5.453 100 1.4896 24 666 20.2 396.90 30.59
## 406 67.9208 0 18.1 0 0.693 5.683 100 1.4254 24 666 20.2 384.97 22.98
## medv
## 399 5
## 406 5
Boston$min_medv <- ifelse(Boston$medv==min(Boston$medv),'lowest medv', 'non lowest')
ggplot(Boston, aes(x=min_medv, y=crim, color=min_medv))+
geom_boxplot()
ggplot(Boston, aes(x=min_medv, y=indus, color=min_medv))+
geom_boxplot()
ggplot(Boston, aes(x=min_medv, y=zn, color=min_medv))+
geom_boxplot()
ggplot(Boston, aes(x=min_medv, y=nox, color=min_medv))+
geom_boxplot()
Some comments: 2 suburbs have the lowest medv (5) whose crime rate,
nitrogen oxides are very high compare to non-lowest…. we can plot more
graphs and more comments :) but with the high crime rate, these suburbs
should not be the livable location.
In this data set, how many of the suburbs average more than
seven rooms per dwelling? More than eight rooms per dwelling?
Comment on the suburbs that average more than eight rooms
per dwelling
cat('\n number of suburbs have more than 7 rooms per dwelling ', sum(Boston$rm>=7))
##
## number of suburbs have more than 7 rooms per dwelling 64
cat('\n number of suburbs have more than 8 rooms per dwelling ', sum(Boston$rm>=8))
##
## number of suburbs have more than 8 rooms per dwelling 13
cat('\n List of more than 8 rooms')
##
## List of more than 8 rooms
Boston[which(Boston$rm >=8),]
## crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 98 0.12083 0 2.89 0 0.4450 8.069 76.0 3.4952 2 276 18.0 396.90 4.21
## 164 1.51902 0 19.58 1 0.6050 8.375 93.9 2.1620 5 403 14.7 388.45 3.32
## 205 0.02009 95 2.68 0 0.4161 8.034 31.9 5.1180 4 224 14.7 390.55 2.88
## 225 0.31533 0 6.20 0 0.5040 8.266 78.3 2.8944 8 307 17.4 385.05 4.14
## 226 0.52693 0 6.20 0 0.5040 8.725 83.0 2.8944 8 307 17.4 382.00 4.63
## 227 0.38214 0 6.20 0 0.5040 8.040 86.5 3.2157 8 307 17.4 387.38 3.13
## 233 0.57529 0 6.20 0 0.5070 8.337 73.3 3.8384 8 307 17.4 385.91 2.47
## 234 0.33147 0 6.20 0 0.5070 8.247 70.4 3.6519 8 307 17.4 378.95 3.95
## 254 0.36894 22 5.86 0 0.4310 8.259 8.4 8.9067 7 330 19.1 396.90 3.54
## 258 0.61154 20 3.97 0 0.6470 8.704 86.9 1.8010 5 264 13.0 389.70 5.12
## 263 0.52014 20 3.97 0 0.6470 8.398 91.5 2.2885 5 264 13.0 386.86 5.91
## 268 0.57834 20 3.97 0 0.5750 8.297 67.0 2.4216 5 264 13.0 384.54 7.44
## 365 3.47428 0 18.10 1 0.7180 8.780 82.9 1.9047 24 666 20.2 354.55 5.29
## medv min_medv
## 98 38.7 non lowest
## 164 50.0 non lowest
## 205 50.0 non lowest
## 225 44.8 non lowest
## 226 50.0 non lowest
## 227 37.6 non lowest
## 233 41.7 non lowest
## 234 48.3 non lowest
## 254 42.8 non lowest
## 258 50.0 non lowest
## 263 48.8 non lowest
## 268 50.0 non lowest
## 365 21.9 non lowest
summary(Boston[which(Boston$rm >=8),])
## crim zn indus chas
## Min. :0.02009 Min. : 0.00 Min. : 2.680 Min. :0.0000
## 1st Qu.:0.33147 1st Qu.: 0.00 1st Qu.: 3.970 1st Qu.:0.0000
## Median :0.52014 Median : 0.00 Median : 6.200 Median :0.0000
## Mean :0.71879 Mean :13.62 Mean : 7.078 Mean :0.1538
## 3rd Qu.:0.57834 3rd Qu.:20.00 3rd Qu.: 6.200 3rd Qu.:0.0000
## Max. :3.47428 Max. :95.00 Max. :19.580 Max. :1.0000
## nox rm age dis
## Min. :0.4161 Min. :8.034 Min. : 8.40 Min. :1.801
## 1st Qu.:0.5040 1st Qu.:8.247 1st Qu.:70.40 1st Qu.:2.288
## Median :0.5070 Median :8.297 Median :78.30 Median :2.894
## Mean :0.5392 Mean :8.349 Mean :71.54 Mean :3.430
## 3rd Qu.:0.6050 3rd Qu.:8.398 3rd Qu.:86.50 3rd Qu.:3.652
## Max. :0.7180 Max. :8.780 Max. :93.90 Max. :8.907
## rad tax ptratio black
## Min. : 2.000 Min. :224.0 Min. :13.00 Min. :354.6
## 1st Qu.: 5.000 1st Qu.:264.0 1st Qu.:14.70 1st Qu.:384.5
## Median : 7.000 Median :307.0 Median :17.40 Median :386.9
## Mean : 7.462 Mean :325.1 Mean :16.36 Mean :385.2
## 3rd Qu.: 8.000 3rd Qu.:307.0 3rd Qu.:17.40 3rd Qu.:389.7
## Max. :24.000 Max. :666.0 Max. :20.20 Max. :396.9
## lstat medv min_medv
## Min. :2.47 Min. :21.9 Length:13
## 1st Qu.:3.32 1st Qu.:41.7 Class :character
## Median :4.14 Median :48.3 Mode :character
## Mean :4.31 Mean :44.2
## 3rd Qu.:5.12 3rd Qu.:50.0
## Max. :7.44 Max. :50.0
Some comments: crime rate is very low compare to overall. Price is high, and so on….