Lab Exercise 1

Question 2
Question 5
Question 6
Question 8
Question 9
Question 10

Question 2

For each of the following scenarios, indicate whether it is a regression or classification problem, whether the goal is inference or prediction, and the values of n and p.

(a) CEO Salary Analysis:

Type: Regression (CEO salary is a continuous variable)
Goal: Inference (To understand which factors affect salary)
n (observations): 500 firms
p (predictors): Number of employees, industry, and profit (3 predictors)

(b) New Product Success Prediction:

Type: Classification (success or failure is a categorical outcome)
Goal: Prediction (Predict success for a new product)
n (observations): 20 past product launches
p (predictors): Price, marketing budget, competition price, and ten other variables (total of 13 predictors)

(c) USD/Euro Exchange Rate Prediction:

Type: Regression (Percentage change in the exchange rate is continuous)
Goal: Prediction (Predict the % change in the USD/Euro exchange rate)
n (observations): 52 weeks (data for one year)
p (predictors): % change in the US, British, and German markets (3 predictors)

Question 5

Explain the difference between a flexible and inflexible statistical learning approach. In what situations would a flexible approach be preferred over a less flexible approach? In what situations would a less flexible approach be preferred?

A very flexible approach in regression or classification refers to models that can adapt to complex, non-linear relationships in the data.

Advantages:
- Lower Bias: Better approximates the true relationship, reducing bias.
- Better Fit for Complex Data: Captures non-linear patterns effectively.
- Higher Accuracy (With Sufficient Data): Learns intricate patterns for more accurate predictions.
Disadvantages:
- Higher Variance & Overfitting Risk: Sensitive to training data, may not generalize well.
- Increased Complexity & Cost: More parameters, computationally expensive, and harder to interpret.
- Reduced Interpretability: Complex relationships make understanding predictors difficult.
When to Use a Flexible Approach:
- Prioritize prediction accuracy with complex, non-linear data.
- Large dataset available to manage variance.
- Interpretability isn’t a major concern.
When to Use a Less Flexible Approach:
- Focus on interpretability and inference.
- Small dataset or high noise, where simpler models avoid overfitting.
- Need for computational efficiency and fewer resources.

Question 6

Describe the differences between parametric and non-parametric statistical learning approaches. What are the advantages of a parametric approach to regression or classification (as opposed to a non-parametric approach)? What are its disadvantages?

Differences Between Parametric and Non-Parametric Approaches:
- Parametric: Assumes a specific form for the relationship between predictors and response (e.g., linear regression). Use a fixed number of parameters.
- Non-Parametric: Learns relationships directly from data (e.g., KNN, decision trees). Complexity increases with more data.
Advantages of Parametric Models:
- Simplicity & Interpretability: Easier to understand due to fixed relationships.
- Efficiency: Faster and less resource intensive.
- Lower Overfitting Risk (if assumptions hold): Fixed structure reduces overfitting.
- Needs Less Data: Requires fewer data points for accuracy.
Disadvantages of Parametric Models:
- Incorrect Assumptions Hurt Performance: Performance drops if assumptions (e.g., linearity) are wrong.
- Less Flexibility: Rigid model structure limits adaptability.
- Risk of Underfitting: May miss complex patterns in the data.
- Sensitive to Outliers: Extreme values can distort the model’s results.

Question 8

a) Load the College dataset

college <- read.csv("data/College.csv")

b) Set the row names to be the college names

rownames(college) <- college[, 1]
View(college)

college <- college [, -1]
View(college)

c) Exploratory Data Analysis

i) Summary of the data set

summary(college)

##    Private               Apps           Accept          Enroll    
##  Length:777         Min.   :   81   Min.   :   72   Min.   :  35  
##  Class :character   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242  
##  Mode  :character   Median : 1558   Median : 1110   Median : 434  
##                     Mean   : 3002   Mean   : 2019   Mean   : 780  
##                     3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902  
##                     Max.   :48094   Max.   :26330   Max.   :6392  
##    Top10perc       Top25perc      F.Undergrad     P.Undergrad     
##  Min.   : 1.00   Min.   :  9.0   Min.   :  139   Min.   :    1.0  
##  1st Qu.:15.00   1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0  
##  Median :23.00   Median : 54.0   Median : 1707   Median :  353.0  
##  Mean   :27.56   Mean   : 55.8   Mean   : 3700   Mean   :  855.3  
##  3rd Qu.:35.00   3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0  
##  Max.   :96.00   Max.   :100.0   Max.   :31643   Max.   :21836.0  
##     Outstate       Room.Board       Books           Personal   
##  Min.   : 2340   Min.   :1780   Min.   :  96.0   Min.   : 250  
##  1st Qu.: 7320   1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850  
##  Median : 9990   Median :4200   Median : 500.0   Median :1200  
##  Mean   :10441   Mean   :4358   Mean   : 549.4   Mean   :1341  
##  3rd Qu.:12925   3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700  
##  Max.   :21700   Max.   :8124   Max.   :2340.0   Max.   :6800  
##       PhD            Terminal       S.F.Ratio      perc.alumni   
##  Min.   :  8.00   Min.   : 24.0   Min.   : 2.50   Min.   : 0.00  
##  1st Qu.: 62.00   1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00  
##  Median : 75.00   Median : 82.0   Median :13.60   Median :21.00  
##  Mean   : 72.66   Mean   : 79.7   Mean   :14.09   Mean   :22.74  
##  3rd Qu.: 85.00   3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00  
##  Max.   :103.00   Max.   :100.0   Max.   :39.80   Max.   :64.00  
##      Expend        Grad.Rate     
##  Min.   : 3186   Min.   : 10.00  
##  1st Qu.: 6751   1st Qu.: 53.00  
##  Median : 8377   Median : 65.00  
##  Mean   : 9660   Mean   : 65.46  
##  3rd Qu.:10830   3rd Qu.: 78.00  
##  Max.   :56233   Max.   :118.00

ii) Create a pairs plot of the first 10 columns

college$Private <- college$Private == "Yes"
pairs(college[, 1:10], cex = 0.2)

iii) Side-by-side boxplots of Outstate vs Private

# Side-by-side boxplots of Outstate vs Private
plot(factor(college$Private), college$Outstate, main="Outstate Tuition by Private/Public Universities",
     xlab="Private", ylab="Outstate Tuition", col=c("lightblue", "pink"))

Inference: Universities classified as “Elite” (those admitting more than 50% of students from the top 10% of their high school class) tend to have significantly higher tuition costs.

iv) Boxplot of Outstate vs Elite

Elite <- rep("No", nrow(college))
Elite[college$Top10perc > 50] <- "Yes"
Elite <- as.factor(Elite)
college <- data.frame(college, Elite)
summary(college$Elite)

##  No Yes 
## 699  78

plot(college$Elite, college$Outstate, main="Outstate Tuition by Elite Status",
     xlab="Elite Status", ylab="Outstate Tuition", col=c("lightgreen", "orange"))

Inference:
- Universities classified as “Elite” (those admitting more than 50% of students from the top 10% of their high school class) tend to have significantly higher tuition costs.

v) Histograms of few Numeric Variables varying bins

par(mfrow = c(2, 2))  

hist(college$Apps, breaks = 30, main = "Histogram of Applications", col = "skyblue", xlab = "Applications")
hist(college$Accept, breaks = 25, main = "Histogram of Acceptances", col = "lightgreen", xlab = "Acceptances")
hist(college$Enroll, breaks = 20, main = "Histogram of Enrollments", col = "lightpink", xlab = "Enrollments")
hist(college$Top10perc, breaks = 10, main = "Histogram of Top 10% Students", col = "lightyellow", xlab = "Top 10%")

Inference:
- Applications: The distribution of applications is right-skewed, meaning that most universities receive a moderate number of applications. However, a few elite institutions stand out with an exceptionally high number of applications.
- Acceptances: Similarly, the distribution of acceptances is right-skewed. This indicates that while most universities accept a reasonable number of students, a few institutions have very high acceptance numbers, which might point to lower selectivity in those cases.
- Enrollments: The distribution of enrollments is also right-skewed, but it shows fewer extreme outliers compared to applications and acceptances. Most universities have a moderate number of enrollments, with a few universities having significantly higher enrollments.
- Top 10% Students: The distribution of the percentage of students from the top 10% of their high school class is lightly right-skewed, with most universities admitting a low percentage of these top students. However, a few highly selective institutions admit a higher proportion of students from the top 10%.

(vi) Further exploration:

Boxplots for comparison of quantitative variables by Private Status

par(mfrow = c(2, 2))  
boxplot(Outstate ~ Private, data = college, 
        xlab = "Private", ylab = "Top10perc", col = c("lightblue", "lightgreen"))
boxplot(Enroll ~ Private, data = college, 
        xlab = "Private", ylab = "Enrollments", col = c("lightblue", "lightgreen"))
boxplot(Accept ~ Private, data = college, 
        xlab = "Private", ylab = "Acceptances", col = c("lightblue", "lightgreen"))
boxplot(Apps ~ Private, data = college,
        xlab = "Private", ylab = "Applications", col = c("lightblue", "lightgreen"))

# Reset the plotting window
par(mfrow = c(1, 1))

Inference: Public universities generally have more variable enrollments, acceptances, and applications, while private institutions maintain a narrower range in these metrics.

Correlation analysis between key quantitative variables

cor_matrix <- cor(college[, sapply(college, is.numeric)], use = "complete.obs")
heatmap(cor_matrix, col = colorRampPalette(c("blue", "white", "red"))(256), scale = "none", margins = c(5,5),cexRow = 0.8, cexCol = 0.8)

Key Findings:
- High Correlations:
  - Applications - Acceptance (0.94): More applications correlate with more acceptances.
  - Applications - Enrollment (0.85): More applications lead to higher enrollments.
  - F.Undergrad - Enrollment (0.96): More full-time undergraduates mean more enrollments.
  - Outstate Tuition - Expenditure (0.67): Higher out-of-state tuition is linked to higher expenditures.
- Moderate Correlations:
  - Top10perc High school students - Top25perc High school students (0.89): More top 10% students correlate with more top 25% students.
  - Top10perc High school students - Expenditure (0.66): More top 10% students generally mean higher spending.
- Negative Correlations:
  - Outstate Tuition↔︎ S.F.Ratio (-0.55): Higher out-of-state tuition correlates with lower student-faculty ratios.
  - S.F.Ratio ↔︎ Expenditure (-0.58): Higher student-faculty ratios link to lower per-student expenditures.
- Weak Correlations:
  - Books ↔︎ Personal (0.18): Low correlation between spending on books and personal expenses.
- Overall Summary: Universities with more applications and full-time undergraduates tend to have higher enrollments. Higher tuition (Outstate) is associated with higher spending and lower student-faculty ratios. Selective schools (Top10perc) tend to have higher spending.

Question 9

auto_data <- read.table("data/Auto.data", header = TRUE, na.strings = "?")
auto_data <- na.omit(auto_data)

(a) Identify which of the predictors are quantitative, and which are qualitative

sapply(auto_data, class)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##    "numeric"    "integer"    "numeric"    "numeric"    "numeric"    "numeric" 
##         year       origin         name 
##    "integer"    "integer"  "character"

numeric <- which(sapply(auto_data, class) == "numeric")
names(numeric)

## [1] "mpg"          "displacement" "horsepower"   "weight"       "acceleration"

Quantitative Predictors (Numeric):
- mpg,
- displacement,
- horsepower,
- weight,
- acceleration.
Qualitative Predictors (Categorical):
- name,
- cylinders (integer but can be treated categorically),
- year (integer can be treated categorically),
- origin(based on data description - Origin of car (1. American, 2. European, 3. Japanese))

auto_data$cylinders = as.factor(auto_data$cylinders)
auto_data$year = as.factor(auto_data$year)
auto_data$origin = as.factor(auto_data$origin)

(b) Range of quantitative predictors:

sapply(auto_data[, numeric], function(auto_data) diff(range(auto_data)))

##          mpg displacement   horsepower       weight acceleration 
##         37.6        387.0        184.0       3527.0         16.8

(c) Mean and standard deviation of quantitative predictors:

Mean

sapply(auto_data[, c("mpg", "displacement", "horsepower", "weight", "acceleration")], mean)

##          mpg displacement   horsepower       weight acceleration 
##     23.44592    194.41199    104.46939   2977.58418     15.54133

Standard deviation

sapply(auto_data[, c("mpg", "displacement", "horsepower", "weight", "acceleration")], sd)

##          mpg displacement   horsepower       weight acceleration 
##     7.805007   104.644004    38.491160   849.402560     2.758864

(d) Remove few observations (10:85) and recalculate range , mean and standard deviation:

subset_data <- auto_data[-(10:85), ]  # Remove rows 10 to 85
print("Range")

## [1] "Range"

sapply(subset_data[, c("mpg", "displacement", "horsepower", "weight", "acceleration")], range)

##       mpg displacement horsepower weight acceleration
## [1,] 11.0           68         46   1649          8.5
## [2,] 46.6          455        230   4997         24.8

print("Mean")

## [1] "Mean"

sapply(subset_data[, c("mpg", "displacement", "horsepower", "weight", "acceleration")], mean)

##          mpg displacement   horsepower       weight acceleration 
##     24.40443    187.24051    100.72152   2935.97152     15.72690

print("Standard deviation")

## [1] "Standard deviation"

sapply(subset_data[, c("mpg", "displacement", "horsepower", "weight", "acceleration")], sd)

##          mpg displacement   horsepower       weight acceleration 
##     7.867283    99.678367    35.708853   811.300208     2.693721

(e) Exploratory Data Analysis through visualizations:

numeric_vars <- sapply(auto_data, is.numeric)
pairs(auto_data[, numeric_vars], main = "Scatterplot Matrix")

Inference:
- mpg and weight: There’s a negative relationship between mpg and weight. Heavier cars typically consume more fuel.
- horsepower and mpg: A negative correlation can be seen; cars with more horsepower are generally less fuel-efficient.
- acceleration and mpg: observed a slight positive correlation; cars with higher acceleration may tend to be more efficient.
- displacement and mpg: Similar to weight and horsepower, displacement tends to negatively correlate with mpg, indicating that larger engines are less fuel-efficient.
- Weight and horsepower: Likely to be positively correlated, as larger and more powerful cars are often heavier.
- Weight and displacement: Typically positively correlated, as larger cars with bigger engines tend to be heavier.

# Boxplot of mpg by cylinders
boxplot(mpg ~ cylinders, data = auto_data, main = "mpg by Cylinders",xlab = "Cylinders", ylab = "mpg", col = "lightblue")

# Boxplot of mpg by year
boxplot(mpg ~ year, data = auto_data, main = "mpg by Year", xlab = "Year", ylab = "mpg", col = "lightcoral")

# Boxplot of mpg by origin
boxplot(mpg ~ origin, data = auto_data, main = "mpg by Origin", xlab = "Origin", ylab = "mpg", col = c("lightgreen", "lightpink", "lightyellow"))

# Scatter plot of horsepower vs mpg, colored by origin
ggplot(auto_data, aes(x = horsepower, y = mpg, color = factor(origin))) +
  geom_point() +
  labs(title = "Horsepower vs mpg by Origin", 
       x = "Horsepower", y = "mpg")

Inference
- Cylinders: Fewer cylinders generally lead to higher mpg.
- Year: As time progresses, mpg has improved in newer cars, possibly due to advances in fuel-efficient technology.
- Origin: Japanese (3) cars tend to have higher mpg than American (1) or European (2) cars, indicating their focus on fuel efficiency.
- Horsepower: Cars with higher horsepower typically have lower mpg, but the pattern might differ across different origins.

(f) Predictors of mpg:

Based on the relationships observed in the plots:

- Strong Predictors of mpg: Displacement, Horsepower, Weight, and Cylinders are the most significant predictors of mpg due to their strong correlations (either negative or positive).
- Moderate Predictors of mpg: Acceleration and Year may be moderately useful, with acceleration having a weak positive correlation and year showing a clear trend over time.
- Categorical Variable: Origin is another useful predictor, especially given that cars from different regions (e.g., Japanese cars) exhibit varying fuel efficiencies.

Question 10

(a) Load the data and check dimensions:

data(Boston)

dim(Boston)

## [1] 506  13

-There are 506 Rows and 13 columns in this dataset

(b) Pairwise scatter plots and correlation analysis:

pairs(Boston)

Key Insights from Pairwise Scatterplot:
- medv and rm: Strong positive relationship. As the number of rooms increases, the median home value (medv) also increases.
- medv and lstat: Strong negative relationship. Higher percentage of lower-status population (lstat) is associated with lower median home values.
- medv and tax: Weak negative relationship. Higher property tax rates tend to be associated with slightly lower home values.
- medv and age: Weak negative trend. Older homes (age) tend to have lower median values.
- nox and indus: Moderate positive correlation. More industrial areas tend to have higher nitrogen oxide levels.
- zn and medv: Weak positive relationship. More residential land zoned for large lots correlates with higher home values.
- crim, rad, and dis: Weaker relationships with medv. Crime rate (crim) shows a slight negative trend, while dis and rad show weak positive trends.

(c) Predictors associated with per capita crime rate

# Correlation between crime rate and other predictors
cor(Boston$crim, Boston[, -13])  # Exclude MEDV from correlation calculation

##      crim         zn     indus        chas       nox         rm       age
## [1,]    1 -0.2004692 0.4065834 -0.05589158 0.4209717 -0.2192467 0.3527343
##             dis       rad       tax   ptratio     lstat
## [1,] -0.3796701 0.6255051 0.5827643 0.2899456 0.4556215

Positive Correlations:
- Industrial areas (INDUS), pollution (NOX), older neighborhoods (AGE), proximity to highways (RAD), and lower-income areas (LSTAT) are positively correlated with crime.
- Property tax rates (TAX) are also positively correlated with crime, possibly reflecting the challenges faced by economically s
Negative Correlations:
- Larger residential areas (ZN), larger homes (RM), and greater distances from employment centers (DIS) tend to have lower crime rates.
Weak or No Significant Correlation:
- Proximity to the Charles River (CHAS) does not have a strong correlation with crime rates.truggling areas with high taxes.

(d) Census tracts with particularly high crime rates, tax rates, pupil-teacher ratios

Boston |>
  pivot_longer(cols = 1:13) |>
  filter(name %in% c("crim", "tax", "ptratio")) |>
  ggplot(aes(value)) +
  geom_histogram(bins = 20) +
  facet_wrap(~name, scales = "free", ncol = 1)

print("Summary statistics of crime rate")

## [1] "Summary statistics of crime rate"

summary(Boston$crim)  # Summary statistics of crime rate

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.00632  0.08204  0.25651  3.61352  3.67708 88.97620

print("Summary statistics of pupil-teacher ratio")

## [1] "Summary statistics of pupil-teacher ratio"

summary(Boston$ptratio)  # Summary statistics of pupil-teacher ratio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.60   17.40   19.05   18.46   20.20   22.00

print("Summary statistics of property tax")

## [1] "Summary statistics of property tax"

summary(Boston$tax)  # Summary statistics of property tax

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   187.0   279.0   330.0   408.2   666.0   711.0

Key Descriptive Statistics and Observations:

Crime Rates (crim): Significant variation.
Tax Rates (tax): Considerable range.
Pupil-Teacher Ratios (ptratio): Narrower range compared to crime and tax rates.

(e) Census tracts bounding the Charles River

print("count of census tracts bound charles river")

## [1] "count of census tracts bound charles river"

sum(Boston$chas)

## [1] 35

(f) Median pupil-teacher ratio

print(" median pupil-teacher ratio")

## [1] " median pupil-teacher ratio"

median(Boston$ptratio)

## [1] 19.05

(g) Census tract with the lowest median home value

min_medv_row <- Boston[which.min(Boston$medv), ]  # Row with the minimum median home value
str(min_medv_row)  # View the row

## 'data.frame':    1 obs. of  13 variables:
##  $ crim   : num 38.4
##  $ zn     : num 0
##  $ indus  : num 18.1
##  $ chas   : int 0
##  $ nox    : num 0.693
##  $ rm     : num 5.45
##  $ age    : num 100
##  $ dis    : num 1.49
##  $ rad    : int 24
##  $ tax    : num 666
##  $ ptratio: num 20.2
##  $ lstat  : num 30.6
##  $ medv   : num 5

summary(Boston)  # summary statistics for the entire dataset

##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          lstat      
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   : 1.73  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.: 6.95  
##  Median : 5.000   Median :330.0   Median :19.05   Median :11.36  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :12.65  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:16.95  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :37.97  
##       medv      
##  Min.   : 5.00  
##  1st Qu.:17.02  
##  Median :21.20  
##  Mean   :22.53  
##  3rd Qu.:25.00  
##  Max.   :50.00

Analysis of Census Tract with Lowest Median Home Value:

Compared to Overall Ranges:
- Crime Rate (crim): Extremely high.
- Industrial Proportion (indus): Above the median.
- Rooms (rm): Below the median.
- Tax Rate (tax): Among the highest.
- Lower Status Population (lstat): Markedly high.
Inferences: Low home values likely influenced by high crime rates, a predominance of industrial land use, socioeconomic challenges, a combination of small homes, and high taxes.

(h) Census tracts averaging more than seven rooms per dwelling, and more than eight rooms per dwelling

print("count of census tract average more than 7 rooms per dwelling")

## [1] "count of census tract average more than 7 rooms per dwelling"

sum(Boston$rm > 7)

## [1] 64

print("count of census tract average more than 8 rooms per dwelling")

## [1] "count of census tract average more than 8 rooms per dwelling"

sum(Boston$rm > 8)

## [1] 13

large_homes <- Boston[Boston$rm > 8, ]
summary(large_homes)

##       crim               zn            indus             chas       
##  Min.   :0.02009   Min.   : 0.00   Min.   : 2.680   Min.   :0.0000  
##  1st Qu.:0.33147   1st Qu.: 0.00   1st Qu.: 3.970   1st Qu.:0.0000  
##  Median :0.52014   Median : 0.00   Median : 6.200   Median :0.0000  
##  Mean   :0.71879   Mean   :13.62   Mean   : 7.078   Mean   :0.1538  
##  3rd Qu.:0.57834   3rd Qu.:20.00   3rd Qu.: 6.200   3rd Qu.:0.0000  
##  Max.   :3.47428   Max.   :95.00   Max.   :19.580   Max.   :1.0000  
##       nox               rm             age             dis       
##  Min.   :0.4161   Min.   :8.034   Min.   : 8.40   Min.   :1.801  
##  1st Qu.:0.5040   1st Qu.:8.247   1st Qu.:70.40   1st Qu.:2.288  
##  Median :0.5070   Median :8.297   Median :78.30   Median :2.894  
##  Mean   :0.5392   Mean   :8.349   Mean   :71.54   Mean   :3.430  
##  3rd Qu.:0.6050   3rd Qu.:8.398   3rd Qu.:86.50   3rd Qu.:3.652  
##  Max.   :0.7180   Max.   :8.780   Max.   :93.90   Max.   :8.907  
##       rad              tax           ptratio          lstat           medv     
##  Min.   : 2.000   Min.   :224.0   Min.   :13.00   Min.   :2.47   Min.   :21.9  
##  1st Qu.: 5.000   1st Qu.:264.0   1st Qu.:14.70   1st Qu.:3.32   1st Qu.:41.7  
##  Median : 7.000   Median :307.0   Median :17.40   Median :4.14   Median :48.3  
##  Mean   : 7.462   Mean   :325.1   Mean   :16.36   Mean   :4.31   Mean   :44.2  
##  3rd Qu.: 8.000   3rd Qu.:307.0   3rd Qu.:17.40   3rd Qu.:5.12   3rd Qu.:50.0  
##  Max.   :24.000   Max.   :666.0   Max.   :20.20   Max.   :7.44   Max.   :50.0

Analysis of Census Tracts with More Than Eight Rooms per Dwelling:

High-Level Traits: Typically characterize census tracts with large, upscale residences.
Notable Data Points (Compared to Entire Dataset):
- Crime Rate (crim): Lower average crime rates.
- Distance to Employment Centers (dis): Tendency to be farther from employment centers.
- Property Tax Rates (tax): Generally, lower property tax rates.
- Socioeconomic Status (lstat): Show lower percentages of lower-status population.
- Median Home Value (medv): Substantially elevated.