Assignment #1

Rudy Martinez

6/8/2021


Libraries

library(ISLR)
library(tidyverse)
library(MASS)

Exercise 2

Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p.
  • (a) We collect a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We are interested in understanding which factors affect CEO salary.

    • Because this scenario has a quantitative response - CEO Salary - this is a regression problem.
    • Because we’re trying to obtain an understanding of the factors that affect CEO Salary, we are most interested in inference.
    • n = 500 Firms and p = 3 Predictors (profit, number of employees, and industry)


  • (b) We are considering launching a new product and wish to know whether it will be a success or a failure. We collect data on 20 similar products that were previously launched. For each product we have recorded whether it was a success or failure, price charged for the product, marketing budget, competition price, and ten other variables.

    • Because this scenario has a qualitative response - result (whether a product is a success or failure) - this is a classification problem.
    • We are not interested in obtaining a deep understanding of the relationships between each individual predictor and the response. Instead, we simply want an accurate model to predict the response using the predictors. Therefore, we are interested in prediction.
    • n = 20 Products and p = 13 Predictors (price charged, marketing budget, competition price, and 10 other variables)


  • (c) We are interested in predicting the % change in the USD/Euro exchange rate in relation to the weekly changes in the world stock markets. Hence we collect weekly data for all of 2012. For each week we record the % change in the USD/Euro, the % change in the US market, the % change in the British market, and the % change in the German market.

    • Because this scenario has a quantitative response - % change in the USD/Euro exchange rate this is a regression problem.
    • Because we’re trying to obtain an understanding of the factors that affect % change in the USD/Euro exchange rate, we are most interested in inference.
    • n = 52 Weeks and p = 3 Predictors (% change in the US market, % change in the British market, and % change in the German market)

Exercise 5

What are the advantages and disadvantages of a very flexible (versus a less flexible) approach for regression or classification? Under what circumstances might a more flexible approach be preferred to a less flexible approach? When might a less flexible approach be preferred?
  • Advantages / Disadvantages of Very Flexible Approach
    • Advantage: Can generate a much wider range of possible shapes to estimate f.
    • Disadvantage: Can lead to such complicated estimates of f that it is difficult to understand how any individual predictors are associated with the response.


  • Circumstances for More / Less Flexible Approach
    • More Flexible: Allows the model to make more accurate predictions, which is desirable if we are mainly interested in prediction. However, as flexibility increases, the interpretability of the models generated decreases.
    • Less Flexible: If we are mainly interested in inference, then restrictive models are much more interpretable. For instance, when inference is the goal, the linear model may be a good choice since it will be quite easy to understand the relationship between Y and X1, X2,…,Xp.

Exercise 6

Describe the differences between a parametric and a non-parametric statistical learning approach. What are the advantages of a parametric approach to regression or classification (as opposed to a nonparametric approach)? What are its disadvantages?
  • Differences Between a Parametric and Non-Parametric Statistical Learning Approach
    • Parametric: We make an assumption about the functional form, or shape, of f. For example, one very simple assumption is that f is linear in X. Once we have assumed that f is linear, the problem of estimating f is greatly simplified.
    • Non-Parametric: We do not make explicit assumptions about the functional form of f. Instead we seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly. By avoiding the assumption of a particular functional form for f, we have the potential to accurately fit a wider range of possible shapes for f.


  • Advantages / Disadvantages of Parametric Approach
    • Advantage: Reduces the problem of estimating f down to one of estimating a set of parameters. Assuming a parametric form for f simplifies the problem of estimating f because it is generally much easier to estimate a set of parameters, such as β0, β1,…, βp in the linear model (2.4), than it is to fit an entirely arbitrary function f.
    • Disadvantage: The model we choose will usually not match the true unknown form of f. If the chosen model is too far from the true f, then our estimate will be poor. This brings with it the possibility that the functional form used to estimate f is very different from the true f, in which case the resulting model will not fit the data well. I

Exercise 8

This exercise relates to the College data set, which can be found in the file College.csv. It contains a number of variables for 777 different universities and colleges in the US.
# Exercise 8a
college = read.csv('College.csv')


  • Look at the data using the fix() function. You should notice that the first column is just the name of each university. We don’t really want R to treat this as data. However, it may be handy to have these names for later.
# Exercise 8b-1
rownames(college) = college[, 1]
fix(college)


  • You should see that there is now a row.names column with the name of each university recorded. This means that R has given each row a name corresponding to the appropriate university. R will not try to perform calculations on the row names. However, we still need to eliminate the first column in the data where the names are stored.
# Exercise 8b-2
college = college [, -1]
fix(college)


  • Use the summary() function to produce a numerical summary of the variables in the data set.
# Exercise 8c-1
summary(college)
##    Private               Apps           Accept          Enroll    
##  Length:777         Min.   :   81   Min.   :   72   Min.   :  35  
##  Class :character   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242  
##  Mode  :character   Median : 1558   Median : 1110   Median : 434  
##                     Mean   : 3002   Mean   : 2019   Mean   : 780  
##                     3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902  
##                     Max.   :48094   Max.   :26330   Max.   :6392  
##    Top10perc       Top25perc      F.Undergrad     P.Undergrad     
##  Min.   : 1.00   Min.   :  9.0   Min.   :  139   Min.   :    1.0  
##  1st Qu.:15.00   1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0  
##  Median :23.00   Median : 54.0   Median : 1707   Median :  353.0  
##  Mean   :27.56   Mean   : 55.8   Mean   : 3700   Mean   :  855.3  
##  3rd Qu.:35.00   3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0  
##  Max.   :96.00   Max.   :100.0   Max.   :31643   Max.   :21836.0  
##     Outstate       Room.Board       Books           Personal   
##  Min.   : 2340   Min.   :1780   Min.   :  96.0   Min.   : 250  
##  1st Qu.: 7320   1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850  
##  Median : 9990   Median :4200   Median : 500.0   Median :1200  
##  Mean   :10441   Mean   :4358   Mean   : 549.4   Mean   :1341  
##  3rd Qu.:12925   3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700  
##  Max.   :21700   Max.   :8124   Max.   :2340.0   Max.   :6800  
##       PhD            Terminal       S.F.Ratio      perc.alumni   
##  Min.   :  8.00   Min.   : 24.0   Min.   : 2.50   Min.   : 0.00  
##  1st Qu.: 62.00   1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00  
##  Median : 75.00   Median : 82.0   Median :13.60   Median :21.00  
##  Mean   : 72.66   Mean   : 79.7   Mean   :14.09   Mean   :22.74  
##  3rd Qu.: 85.00   3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00  
##  Max.   :103.00   Max.   :100.0   Max.   :39.80   Max.   :64.00  
##      Expend        Grad.Rate     
##  Min.   : 3186   Min.   : 10.00  
##  1st Qu.: 6751   1st Qu.: 53.00  
##  Median : 8377   Median : 65.00  
##  Mean   : 9660   Mean   : 65.46  
##  3rd Qu.:10830   3rd Qu.: 78.00  
##  Max.   :56233   Max.   :118.00


  • Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of the data.
# Exercise 8c-2
pairs(College[, 1:10])


  • Use the plot() function to produce side-by-side boxplots of Outstate versus Private.
# Exercise 8c-3
plot(College$Private, College$Outstate, xlim = c(0, 3), col = c('lightsteelblue', 'lightgrey'), xlab = 'Private', ylab = 'Outstate')


  • Create a new qualitative variable, called Elite, by binning the Top10perc variable. Use the summary() function to see how many elite universities there are. Now use the plot() function to produce side-by-side boxplots of Outstate versus Elite
# Exercise 8c-4
Elite = rep("No",nrow(college))
Elite[college$Top10perc >50] ="Yes"
Elite = as.factor(Elite)
college = data.frame(college, Elite)

summary(Elite)
##  No Yes 
## 699  78
plot(college$Elite, college$Outstate, xlab = 'Elite', ylab = 'Outstate', xlim = c(0, 3), col = c('lightsteelblue', 'lightgrey'))


  • Use the hist() function to produce some histograms with differing numbers of bins for a few of the quantitative variables.
# Exercise 8c-5
par(mfrow=c(2,2))
hist(college$Accept, main="Number of Applications Accepted", col="lightsteelblue", xlab = 'Accepted')
hist(college$Enroll, main="Number of New Students Enrolled", col="lightgrey", xlab = 'Enroll')
hist(college$PhD, main="Percent of Faculty with a PhD", col="lightsteelblue", xlab = 'PhD' )
hist(college$perc.alumni, main="Graduation Rate", col="lightgrey", xlab = 'Grad.Rate')


  • Continue exploring the data, and provide a brief summary of what you discover.
# Exercise 8c-6
plot(college$Accept, college$Enroll,
     xlab = 'Number of Applicants Accepted', 
     ylab = 'Number of New Students Enrolled', 
     col = 'steelblue')

plot(college$PhD, college$Grad.Rate,
     xlab = 'Percent of Faculty with a PhD', 
     ylab = 'Graduation Rate', 
     col = 'black')

  • The above scatterplots detail the Relationship between the following variables: Accept, Enroll, PhD, and Grad.Rate. Based on these visuals, there appears to be a correlation between the Number of Applications Accepted and the New Students Enrolled. Additionally, there appears to be a correlation between the Percent of Faculty with a PhD and the Graduation Rate of students.


  • Which schools have a graduation rate above 99%?
#Exercise 8c-6 Question
grad_focus = College[College$Grad.Rate > 99, ]
rownames(grad_focus)
##  [1] "Amherst College"                 "Cazenovia College"              
##  [3] "College of Mount St. Joseph"     "Grove City College"             
##  [5] "Harvard University"              "Harvey Mudd College"            
##  [7] "Lindenwood College"              "Missouri Southern State College"
##  [9] "Santa Clara University"          "Siena College"                  
## [11] "University of Richmond"


Exercise 9

This exercise involves the Auto data set studied in the lab. Make sure that the missing values have been removed from the data.
#Exercise 9a
auto = read.csv('Auto.csv', header = T, na.strings = "?")
auto = na.omit(auto)
names(auto)
## [1] "mpg"          "cylinders"    "displacement" "horsepower"   "weight"      
## [6] "acceleration" "year"         "origin"       "name"
  • Which of the predictors are quantitative, and which are qualitative?
    • Quantitative Predictors: mpg, cylinders, displacement, horsepower, weight, acceleration, year, and origin.
    • Qualitative Predictors: name


  • What is the range of each quantitative predictor? You can answer this using the range() function.
# Exercise 9b
quant_pred = sapply(auto, is.numeric)
sapply(auto[, quant_pred], range)
##       mpg cylinders displacement horsepower weight acceleration year origin
## [1,]  9.0         3           68         46   1613          8.0   70      1
## [2,] 46.6         8          455        230   5140         24.8   82      3


  • What is the mean and standard deviation of each quantitative predictor?
# Exercise 9c
stats_pred = sapply(auto[, quant_pred], function(x) signif(c(mean(x), sd(x)), 0))
rownames(stats_pred) <- c("mean", "sd")
stats_pred
##      mpg cylinders displacement horsepower weight acceleration year origin
## mean  20         5          200        100   3000           20   80    2.0
## sd     8         2          100         40    800            3    4    0.8


  • Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?
observation_subset = sapply(auto[-10:-85, quant_pred], function(x) round(c(range(x), mean(x), sd(x)), 0))
rownames(observation_subset) <- c("min", "max", "mean", "sd")
observation_subset
##      mpg cylinders displacement horsepower weight acceleration year origin
## min   11         3           68         46   1649            8   70      1
## max   47         8          455        230   4997           25   82      3
## mean  24         5          187        101   2936           16   77      2
## sd     8         2          100         36    811            3    3      1


  • Using the full data set, investigate the predictors graphically using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings.
# Exercise 9e
pairs(Auto)

plot(auto$acceleration, auto$mpg,
     xlab = 'Acceleration', 
     ylab = 'mpg', 
     col = 'steelblue')

plot(auto$horsepower, auto$weight,
     xlab = 'Horsepower', 
     ylab = 'Weight', 
     col = 'black')

  • The pairs() plot indicates the visual relationshiops between predictors in the Auto dataset. From this compilation, we can see correlations between variables like weight and horsepower.
  • Finding 1: Focusing in on weight and horsepower, the third scatterplot showcases a strong correlation between the two variables. As horsepower increases, the weight of the vehicle also increases. Therefore, vehicles that are heavier or larger - like trucks - have the ability to output more power.
  • Finding 2: Focusing in on acceleration and mpg, the second scatterplot showcases a reasonable correlation between the two variables. As acceleration increases, the mpg of the vehicle also increases. Therefore, a vehicle that can accelerate more quickly - a vehicle that is not as heavy, lighter, and smaller - typically also has a higher miles per gallon.


  • Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables.
# Exercise 9f

plot(auto$weight, auto$mpg,
     xlab = 'Weight', 
     ylab = 'Miles per Gallon', 
     col = 'black')

  • Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer.
    • Yes: Focusing in on weight and mpg, the scatterplot above showcases a negative correlation between the two variables. As weight increases, the mpg of the vehicle also decreases. Therefore, a vehicle that weights more typically also has a lower miles per gallon. This can be used to predict mpg.


Exercise 10

This exercise involves the Boston housing data set.
  • To begin, load in the Boston data set. The Boston data set is part of the MASS library in R. How many rows are in this data set? How many columns? What do the rows and columns represent?
dim(Boston)
## [1] 506  14
  • 506 Rows and 14 Columns
  • The columns are variables that influence the housing values in the suburbs of Boston


  • Make some pairwise scatterplots of the predictors (columns) in this data set. Describe your findings.
# Exercise 10b/10c
pairs(Boston)

par(mfrow = c(2, 2))
plot(Boston$crim, Boston$medv,
     xlab = 'Per Capita Crime Rate', 
     ylab = 'medv', 
     col = 'steelblue')

plot(Boston$rm, Boston$medv,
     xlab = 'Average number of rooms per dwelling', 
     ylab = 'medv', 
     col = 'steelblue')

plot(Boston$lstat, Boston$medv,
     xlab = 'Lower status of the population (percent)', 
     ylab = 'medv', 
     col = 'black')

plot(Boston$ptratio, Boston$medv,
     xlab = 'Pupil-teacher ratio by town', 
     ylab = 'medv', 
     col = 'black')

  • Finding 1: Focusing in on medv and crim, the scatterplot showcases a correlation between the two variables. As crim increases, the medv decreases. Therefore, as the Per Capita Crime Rate worsens, the median value of owner-occupied homes drops. This make sense as the demand for homes in more dangerous areas leads to a devaluation in the price of homes there.
  • Finding 2: Focusing in on medv and rm, the scatterplot showcases a correlation between the two variables. As rm increases, the medv also increases. Therefore, as the average number of rooms per dwelling grows, the median value of owner-occupied homes rises. This makes sense as homes with more square footage/space are valued higher than homes with less space.
  • Finding 3: Focusing in on medv and lstat, the scatterplot showcases a correlation between the two variables. As lstat increases, the medv decreases. Therefore, as the lower status of population (percent) increases, the median value of owner-occupied homes drops.


  • Are any of the predictors associated with per capita crime rate? If so, explain the relationship.
# Exercise 10c
par(mfrow = c(2, 2))
plot(Boston$crim ~ Boston$zn,
     log = 'xy',
     col = 'steelblue')

plot(Boston$crim ~ Boston$age,
     log = 'xy',
     col = 'steelblue')

plot(Boston$crim ~ Boston$dis,
     log = 'xy',
     col = 'black')

plot(Boston$crim ~ Boston$lstat,
     log = 'xy',
     col = 'black')

  • Based on visual observation, it appears that the following predictors have some sort of association with crim.
    • age: As the proportion of owner-occupied units built prior to 1940 increases, the Per Capita Crime Rate increases.
    • dis: As the weighted mean of distances to five Boston employment centres increases, the Per Capita Crime Rate decreases.
    • lstat: As the lower status of the population (percent) increases, the Per Capita Crime Rate increases.


  • Do any of the suburbs of Boston appear to have particularly high crime rates? Tax rates? Pupil-teacher ratios? Comment on the range of each predictor.
# Exercise 10d

hist(Boston$crim, breaks=25, col = "steelblue", main = "Histogram of Per Capita Crime Rate")

hist(Boston$tax, breaks=25, col = "black", main = "Histogram of Full-value Property-tax Rate per $10,000")

hist(Boston$ptratio, breaks=25, col = "darkgrey", main = "Histogram of Pupil-teacher Ratio by Town")

  • There are very few Boston suburbs with high crime rates. This is evident by looking at the above Histogram of Per Capita Crime Rate in which the majority of observations align with a zero Per Capita Crime Rate.
  • By looking at the above Histogram of Full-value Property-tax Rate per $10,000, it appears there is a high count of observations that align with a near 700 value on the x-axis. However, there are also a moderate count of observations that fall within the range of 200-400 values on the x-axis.
  • The Histogram of Pupil-teacher Ratio by Town details a relatively even number of observations across the range of Pupil-teacher Ratios. However, there is a spike approximate to 20 on the x-axis that indicates a higher frequency of observations.


  • How many of the suburbs in this data set bound the Charles river?
# Exercise 10e
nrow(subset(Boston, chas ==1))
## [1] 35
  • There are 35 suburbs that bound the Charles river.


  • What is the median pupil-teacher ratio among the towns in this data set?
# Exercise 10f
summary(Boston$ptratio)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.60   17.40   19.05   18.46   20.20   22.00
  • The median pupil-teacher ratio is 19.05


  • Which suburb of Boston has lowest median value of owner-occupied homes? What are the values of the other predictors for that suburb, and how do those values compare to the overall ranges for those predictors? Comment on your findings.
# Exercise 10g
selection = Boston[order(Boston$medv), ]
selection[1, ]
##        crim zn indus chas   nox    rm age    dis rad tax ptratio black lstat
## 399 38.3518  0  18.1    0 0.693 5.453 100 1.4896  24 666    20.2 396.9 30.59
##     medv
## 399    5
  • Suburb #399 has the lowest median value of owner-occupied homes. The median-value is $5,000.


summary(Boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00
  • By comparing Suburb #399 to the summary of Boston as a whole:
    • crim: Suburb #399 (38.3518) - This suburb has a Per Capital Crime Rate that is approximately 10 times the average of Boston suburbs.
    • zn: Suburb #399 (0) - This suburb is below the average of Boston suburbs in regards to the proportion of residential land zoned for lots over 25,000 sq.ft.
    • indus: Suburb #399 (18.1) - This suburb is above the average of Boston suburbs in regards to the proportion of non-retail business acres per town.
    • chas: Suburb #399 (0) - This suburb is in relatively in line with that of the average of Boston suburbs.
    • nox: Suburb #399 (0.693) - This suburb is slightly above the average of Boston suburbs in regards to the nitrogen oxides concentration (parts per 10 million).
    • rm: Suburb #399 (5.453) - This suburb is below the average of Boston suburbs in regards to the average number of rooms per dwelling.
    • age: Suburb #399 (100) - This suburb is above the average of Boston suburbs in regards to the proportion of owner-occupied units built prior to 1940.
    • dis: Suburb #399 (1.4896) - This suburb is below the average of Boston suburbs in regards to the weighted mean of distances to five Boston employment centres.
    • rad: Suburb #399 (24) - This suburb is above the average of Boston suburbs in regards to the index of accessibility to radial highways.
    • tax: Suburb #399 (666) - This suburb is above the average of Boston suburbs in regards to the full-value property-tax rate per $10,000.
    • ptratio: Suburb #399 (20.2) - This suburb is above the average of Boston suburbs in regards to the pupil-teacher ratio by town.
    • black: Suburb #399 (396.9) - This suburb is above the average of Boston suburbs in regards to the proportion of blacks by town.
    • lstat: Suburb #399 (30.59) - This suburb is above the average of Boston suburbs in regards to the lower status of the population (percent).
    • medv: Suburb #399 (5) - This suburb is below the average of Boston suburbs in regards to the median value of owner-occupied homes in $1000s.
  • Suburb #399 Highlights: More Crime, Less Rooms per Dwelling, Low Status of the Population (Percent), and Low Median Value of Owner Homes.


  • In this data set, how many of the suburbs average more than seven rooms per dwelling? More than eight rooms per dwelling? Comment on the suburbs that average more than eight rooms per dwelling.
# Exercise 10h-1
nrow(subset(Boston, rm  > 7))
## [1] 64
nrow(subset(Boston, rm  > 8))
## [1] 13
  • 64 suburbs average more than seven rooms per dwelling
  • 13 suburbs average more than eight rooms per dwelling


# Exercise 10h-2
dwelling_selection = subset(Boston, rm > 8)
summary(dwelling_selection)
##       crim               zn            indus             chas       
##  Min.   :0.02009   Min.   : 0.00   Min.   : 2.680   Min.   :0.0000  
##  1st Qu.:0.33147   1st Qu.: 0.00   1st Qu.: 3.970   1st Qu.:0.0000  
##  Median :0.52014   Median : 0.00   Median : 6.200   Median :0.0000  
##  Mean   :0.71879   Mean   :13.62   Mean   : 7.078   Mean   :0.1538  
##  3rd Qu.:0.57834   3rd Qu.:20.00   3rd Qu.: 6.200   3rd Qu.:0.0000  
##  Max.   :3.47428   Max.   :95.00   Max.   :19.580   Max.   :1.0000  
##       nox               rm             age             dis       
##  Min.   :0.4161   Min.   :8.034   Min.   : 8.40   Min.   :1.801  
##  1st Qu.:0.5040   1st Qu.:8.247   1st Qu.:70.40   1st Qu.:2.288  
##  Median :0.5070   Median :8.297   Median :78.30   Median :2.894  
##  Mean   :0.5392   Mean   :8.349   Mean   :71.54   Mean   :3.430  
##  3rd Qu.:0.6050   3rd Qu.:8.398   3rd Qu.:86.50   3rd Qu.:3.652  
##  Max.   :0.7180   Max.   :8.780   Max.   :93.90   Max.   :8.907  
##       rad              tax           ptratio          black      
##  Min.   : 2.000   Min.   :224.0   Min.   :13.00   Min.   :354.6  
##  1st Qu.: 5.000   1st Qu.:264.0   1st Qu.:14.70   1st Qu.:384.5  
##  Median : 7.000   Median :307.0   Median :17.40   Median :386.9  
##  Mean   : 7.462   Mean   :325.1   Mean   :16.36   Mean   :385.2  
##  3rd Qu.: 8.000   3rd Qu.:307.0   3rd Qu.:17.40   3rd Qu.:389.7  
##  Max.   :24.000   Max.   :666.0   Max.   :20.20   Max.   :396.9  
##      lstat           medv     
##  Min.   :2.47   Min.   :21.9  
##  1st Qu.:3.32   1st Qu.:41.7  
##  Median :4.14   Median :48.3  
##  Mean   :4.31   Mean   :44.2  
##  3rd Qu.:5.12   3rd Qu.:50.0  
##  Max.   :7.44   Max.   :50.0
  • These 13 suburbs that average more than 8 rooms per dwelling exhibit the below characteristics:
    • Very low average Per Capita Crime Rate of 0.71879 (below Boston average)
    • The average proportion of owner-occupied units built prior to 1940 is 71.54.
    • Has an average median value of owner-occupied homes at $44,000 (above the Boston average)
    • Has an average lower status of the population (percent) of 4.31 (below the Boston average)