My project relates to Real Estate and I proposed to study the Median Value of owner-occupied homes and investigate the impact of the factors such as crime rate, Full-value property-tax rate per $10,000, index of accessibility to radial highways on median value of owner-occupied homes.
Number of data columns- 14
Number of rows- 506
There are 14 attributes in each case of the dataset. They are:
CRIM - per capita crime rate by town
ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS - proportion of non-retail business acres per town.
CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX - nitric oxides concentration (parts per 10 million)
RM - average number of rooms per dwelling
AGE - proportion of owner-occupied units built prior to 1940
DIS - weighted distances to five employment centers
RAD - index of accessibility to radial highways
TAX - full-value property-tax rate per $10,000
PTRATIO - pupil-teacher ratio by town
B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT - % lower status of the population
MEDV - Median value of owner-occupied homes in $1000’s
house.df<-read.csv(paste("housingdata.csv"))
View(house.df)
dim(house.df)
## [1] 506 14
str(house.df)
## 'data.frame': 506 obs. of 14 variables:
## $ CRIM : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ ZN : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ INDUS : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ CHAS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ NOX : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ RM : num 6.58 6.42 7.18 7 7.15 ...
## $ AGE : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ DIS : num 4.09 4.97 4.97 6.06 6.06 ...
## $ RAD : int 1 2 2 3 3 3 5 5 5 5 ...
## $ TAX : int 296 242 242 222 222 222 311 311 311 311 ...
## $ PTRATIO: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ B : num 397 397 393 395 397 ...
## $ LSTAT : num 4.98 9.14 4.03 2.94 5.33 ...
## $ MEDV : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
attach(house.df)
summary(house.df)
## CRIM ZN INDUS CHAS
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## NOX RM AGE DIS
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## RAD TAX PTRATIO B
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## LSTAT MEDV
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
library(psych)
describe(house.df)
## vars n mean sd median trimmed mad min max range
## CRIM 1 506 3.61 8.60 0.26 1.68 0.33 0.01 88.98 88.97
## ZN 2 506 11.36 23.32 0.00 5.08 0.00 0.00 100.00 100.00
## INDUS 3 506 11.14 6.86 9.69 10.93 9.37 0.46 27.74 27.28
## CHAS 4 506 0.07 0.25 0.00 0.00 0.00 0.00 1.00 1.00
## NOX 5 506 0.55 0.12 0.54 0.55 0.13 0.38 0.87 0.49
## RM 6 506 6.28 0.70 6.21 6.25 0.51 3.56 8.78 5.22
## AGE 7 506 68.57 28.15 77.50 71.20 28.98 2.90 100.00 97.10
## DIS 8 506 3.80 2.11 3.21 3.54 1.91 1.13 12.13 11.00
## RAD 9 506 9.55 8.71 5.00 8.73 2.97 1.00 24.00 23.00
## TAX 10 506 408.24 168.54 330.00 400.04 108.23 187.00 711.00 524.00
## PTRATIO 11 506 18.46 2.16 19.05 18.66 1.70 12.60 22.00 9.40
## B 12 506 356.67 91.29 391.44 383.17 8.09 0.32 396.90 396.58
## LSTAT 13 506 12.65 7.14 11.36 11.90 7.11 1.73 37.97 36.24
## MEDV 14 506 22.53 9.20 21.20 21.56 5.93 5.00 50.00 45.00
## skew kurtosis se
## CRIM 5.19 36.60 0.38
## ZN 2.21 3.95 1.04
## INDUS 0.29 -1.24 0.30
## CHAS 3.39 9.48 0.01
## NOX 0.72 -0.09 0.01
## RM 0.40 1.84 0.03
## AGE -0.60 -0.98 1.25
## DIS 1.01 0.46 0.09
## RAD 1.00 -0.88 0.39
## TAX 0.67 -1.15 7.49
## PTRATIO -0.80 -0.30 0.10
## B -2.87 7.10 4.06
## LSTAT 0.90 0.46 0.32
## MEDV 1.10 1.45 0.41
table(house.df$CHAS)
##
## 0 1
## 471 35
table(house.df$RAD)
##
## 1 2 3 4 5 6 7 8 24
## 20 24 38 110 115 26 17 24 132
table(house.df$ZN)
##
## 0 12.5 17.5 18 20 21 22 25 28 30 33 34 35 40 45
## 372 10 1 1 21 4 10 10 3 6 4 3 3 7 6
## 52.5 55 60 70 75 80 82.5 85 90 95 100
## 3 3 4 3 3 15 2 2 5 4 1
table(house.df$TAX)
##
## 187 188 193 198 216 222 223 224 226 233 241 242 243 244 245 247 252 254
## 1 7 8 1 5 7 5 10 1 9 1 2 4 1 3 4 2 5
## 255 256 264 265 270 273 276 277 279 280 281 284 285 287 289 293 296 300
## 1 1 12 2 7 5 9 11 4 1 4 7 1 8 5 3 8 7
## 304 305 307 311 313 315 329 330 334 335 337 345 348 351 352 358 370 384
## 14 4 40 7 1 2 6 10 2 2 2 3 2 1 2 3 2 11
## 391 398 402 403 411 422 430 432 437 469 666 711
## 8 12 2 30 2 1 3 9 15 1 132 5
table(house.df$CHAS,house.df$RAD)
##
## 1 2 3 4 5 6 7 8 24
## 0 19 24 36 102 104 26 17 19 124
## 1 1 0 2 8 11 0 0 5 8
table(house.df$CHAS,house.df$ZN)
##
## 0 12.5 17.5 18 20 21 22 25 28 30 33 34 35 40 45 52.5 55
## 0 344 10 1 1 18 4 10 10 3 6 4 3 3 4 6 3 3
## 1 28 0 0 0 3 0 0 0 0 0 0 0 0 3 0 0 0
##
## 60 70 75 80 82.5 85 90 95 100
## 0 4 3 3 15 2 2 4 4 1
## 1 0 0 0 0 0 0 1 0 0
table(house.df$RAD, house.df$ZN)
##
## 0 12.5 17.5 18 20 21 22 25 28 30 33 34 35 40 45 52.5
## 1 6 0 0 1 0 0 0 0 0 0 0 0 3 2 0 0
## 2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 26 0 1 0 5 0 0 0 0 0 0 0 0 0 0 0
## 4 77 3 0 0 0 4 0 4 3 0 0 0 0 5 0 0
## 5 78 7 0 0 16 0 0 0 0 0 0 0 0 0 6 0
## 6 17 0 0 0 0 0 0 0 0 6 0 0 0 0 0 3
## 7 0 0 0 0 0 0 10 0 0 0 4 3 0 0 0 0
## 8 18 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0
## 24 132 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## 55 60 70 75 80 82.5 85 90 95 100
## 1 1 2 0 0 3 0 0 2 0 0
## 2 0 0 0 0 3 2 1 0 0 0
## 3 0 0 0 3 0 0 0 1 2 0
## 4 0 2 0 0 9 0 1 0 2 0
## 5 2 0 3 0 0 0 0 2 0 1
## 6 0 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0
## 24 0 0 0 0 0 0 0 0 0 0
boxplot(house.df$CRIM,main="per capita crime rate by town",xlab="CRIM"
, horizontal = T, col="light blue")
boxplot(house.df$INDUS,main="proportion of non-retail business acres per town",
xlab="INDUS", col="light blue")
boxplot(house.df$NOX, main=" nitric oxides concentration (parts per 10 million)",
xlab="NOX", col="light blue")
boxplot(house.df$RM, main="average number of rooms per dwelling", xlab="RM",
col="light blue")
boxplot(house.df$DIS, main="weighted distances to five employment centres",
xlab="DIS", col="light blue")
boxplot(house.df$TAX, main="full-value property-tax rate per $10,000",
xlab="TAX", col="light blue")
boxplot(house.df$PTRATIO, main="pupil-teacher ratio by town ",
xlab="PTRATIO", col="light blue")
boxplot(house.df$LSTAT, main="lower status of the population(%)",
xlab="LSTAT", col="light blue")
boxplot(house.df$MEDV, main="Median value of owner-occupied homes in $1000's",
xlab="MEDV", col="light blue")
library(lattice)
histogram(house.df$ZN, main="proportion of residential land zoned
for lots over 25,000 sq.ft", xlab="ZN", col="maroon")
charles=factor(house.df$CHAS, levels=c(1,0), labels=c("tract bounds river","Otherwise"))
histogram(charles,col="maroon", main="Charles River dummy variable")
histogram(house.df$AGE, main="proportion of owner-occupied units built prior to 1940",
xlab="AGE", col="maroon")
histogram(house.df$RAD, main="index of accessibility to radial highways",
xlab="RAD", col="maroon")
plot(house.df$MEDV,house.df$CRIM, main="plot of CHAS v/s MEDV",
ylab = "per capita crime rate by town",
xlab="Median value of owner-occupied homes in $1000's")
plot(house.df$MEDV,house.df$INDUS, main="plot of CHAS v/s INDUS",
xlab = "per capita crime rate by town",
ylab="proportion of non-retail business acres per town")
plot(house.df$MEDV,house.df$TAX, main="plot of CHAS v/s TAX",
xlab = "per capita crime rate by town",
ylab="full-value property-tax rate per $10,000")
plot(house.df$MEDV,house.df$RAD, main="plot of CHAS v/s RAD",
xlab = "per capita crime rate by town",
ylab="Index of accessibility to radial highways ")
hnum<-house.df[,c(1,2,3,5,6,7,8,11,12,13,14)]
cor(hnum)
## CRIM ZN INDUS NOX RM AGE
## CRIM 1.0000000 -0.2004692 0.4065834 0.4209717 -0.2192467 0.3527343
## ZN -0.2004692 1.0000000 -0.5338282 -0.5166037 0.3119906 -0.5695373
## INDUS 0.4065834 -0.5338282 1.0000000 0.7636514 -0.3916759 0.6447785
## NOX 0.4209717 -0.5166037 0.7636514 1.0000000 -0.3021882 0.7314701
## RM -0.2192467 0.3119906 -0.3916759 -0.3021882 1.0000000 -0.2402649
## AGE 0.3527343 -0.5695373 0.6447785 0.7314701 -0.2402649 1.0000000
## DIS -0.3796701 0.6644082 -0.7080270 -0.7692301 0.2052462 -0.7478805
## PTRATIO 0.2899456 -0.3916785 0.3832476 0.1889327 -0.3555015 0.2615150
## B -0.3850639 0.1755203 -0.3569765 -0.3800506 0.1280686 -0.2735340
## LSTAT 0.4556215 -0.4129946 0.6037997 0.5908789 -0.6138083 0.6023385
## MEDV -0.3883046 0.3604453 -0.4837252 -0.4273208 0.6953599 -0.3769546
## DIS PTRATIO B LSTAT MEDV
## CRIM -0.3796701 0.2899456 -0.3850639 0.4556215 -0.3883046
## ZN 0.6644082 -0.3916785 0.1755203 -0.4129946 0.3604453
## INDUS -0.7080270 0.3832476 -0.3569765 0.6037997 -0.4837252
## NOX -0.7692301 0.1889327 -0.3800506 0.5908789 -0.4273208
## RM 0.2052462 -0.3555015 0.1280686 -0.6138083 0.6953599
## AGE -0.7478805 0.2615150 -0.2735340 0.6023385 -0.3769546
## DIS 1.0000000 -0.2324705 0.2915117 -0.4969958 0.2499287
## PTRATIO -0.2324705 1.0000000 -0.1773833 0.3740443 -0.5077867
## B 0.2915117 -0.1773833 1.0000000 -0.3660869 0.3334608
## LSTAT -0.4969958 0.3740443 -0.3660869 1.0000000 -0.7376627
## MEDV 0.2499287 -0.5077867 0.3334608 -0.7376627 1.0000000
library(corrgram)
corrgram(house.df, order=FALSE, lower.panel=panel.shade, upper.panel=panel.pie,
text.panel=panel.txt, main="Corrgram of housing dataset")
library(corrgram)
corrgram(hnum, order=FALSE, lower.panel=panel.shade, upper.panel=panel.pie,
text.panel=panel.txt, main="Corrgram of housing dataset (numeric type)")
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix(~MEDV+CRIM+INDUS+RAD+TAX, data=house.df, main="Scatterplot matrix Median value of owner-occupied homes in $1000's v/s other factors")
Null Hypothesis 1: There is no relationship between CRIM (per capita crime rate by town) and MEDV (Median value of owner-occupied homes in $1000’s)
cor.test(house.df$CRIM,house.df$MEDV)
##
## Pearson's product-moment correlation
##
## data: house.df$CRIM and house.df$MEDV
## t = -9.4597, df = 504, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4599064 -0.3116859
## sample estimates:
## cor
## -0.3883046
The p-value is less than 0.05, hence, we reject the null hypothesis and establish that there is significant relationship between CRIM and MEDV.
Null Hypothesis 2: There is no relationship between between CHAS (Charles River dummy variable) and MEDV(Median value of owner-occupied homes in $1000’s)
cor.test(house.df$CHAS, house.df$MEDV)
##
## Pearson's product-moment correlation
##
## data: house.df$CHAS and house.df$MEDV
## t = 3.9964, df = 504, p-value = 7.391e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08945816 0.25848001
## sample estimates:
## cor
## 0.1752602
The p-value is more than 0.05, hence, we fail to reject the null hypothesis.
Null Hypothesis 3: There is no relationship between TAX(full-value property-tax rate per $10,000) and MEDV(Median value of owner-occupied homes in $1000’s)
cor.test(house.df$TAX, house.df$MEDV)
##
## Pearson's product-moment correlation
##
## data: house.df$TAX and house.df$MEDV
## t = -11.906, df = 504, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5338993 -0.3976061
## sample estimates:
## cor
## -0.4685359
The p-value is less than 0.05, hence, we reject the null hypothesis and establish that there is a significant relationship between TAX and MEDV.
Null Hypothesis 4: There is no relationship between CRIM(per capita crime rate by town) and RAD(index of accessibility to radial highways)
cor.test(house.df$CRIM,house.df$RAD)
##
## Pearson's product-moment correlation
##
## data: house.df$CRIM and house.df$RAD
## t = 17.998, df = 504, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5693817 0.6758248
## sample estimates:
## cor
## 0.6255051
The p-value is less than 0.05, hence, we reject the null hypothesis and establish that there is a significant relationship between CRIM and RAD.
Null Hypothesis 5: There is no relationship between AGE(proportion of owner-occupied units built prior to 1940) and TAX(full-value property-tax rate per $10,000)
t.test(house.df$AGE, house.df$TAX)
##
## Welch Two Sample t-test
##
## data: house.df$AGE and house.df$TAX
## t = -44.715, df = 533.15, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -354.5843 -324.7402
## sample estimates:
## mean of x mean of y
## 68.5749 408.2372
The p-value is less than 0.05, hence, through t-test we reject the null hypothesis and establish that there is significant relationship between AGE and TAX.
Null Hypothesis 6: There is no relationship between DIS(weighted distances to five employment centers) and MEDV(Median value of owner-occupied homes in $1000’s)
t.test(house.df$DIS, house.df$MEDV)
##
## Welch Two Sample t-test
##
## data: house.df$DIS and house.df$MEDV
## t = -44.673, df = 557.8, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -19.56164 -17.91389
## sample estimates:
## mean of x mean of y
## 3.795043 22.532806
The p-value is less than 0.05, hence, through t.test we reject the null hypothesis and establish that there is no significant relationship between DIS and MEDV.
HYPOTHESIS 1: If the house’s tract bounds the river, then the median value of the house is affected.
scatterplot(house.df$CHAS,house.df$MEDV, ylab= "Median value of
owner-occupied homes in $1000's", xlab="Charles River dummy variable")
t.test(house.df$CHAS,house.df$MEDV)
##
## Welch Two Sample t-test
##
## data: house.df$CHAS and house.df$MEDV
## t = -54.921, df = 505.77, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -23.26722 -21.66005
## sample estimates:
## mean of x mean of y
## 0.06916996 22.53280632
The p-value is less than 0.05, indicating that CHAS (Charles River dummy variable) and MEDV (Median value of owner-occupied homes in $1000’s)are correlated.
fit1<- lm(CHAS~MEDV, data=house.df)
summary(fit1)
##
## Call:
## lm(formula = CHAS ~ MEDV, data = house.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20211 -0.07869 -0.05860 -0.03223 0.97503
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.039891 0.029471 -1.354 0.176
## MEDV 0.004840 0.001211 3.996 7.39e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2503 on 504 degrees of freedom
## Multiple R-squared: 0.03072, Adjusted R-squared: 0.02879
## F-statistic: 15.97 on 1 and 504 DF, p-value: 7.391e-05
The p-value is less than 0.05, confirming that that CHAS and MEDV are correlated. The estimate calculated shows, Median value of owner-occupied homes increases if the house’s tract bounds the river.
HYPOTHESIS 2: The accessibility to the radial highway accessibility affects the median value of houses.
scatterplot(house.df$RAD, house.df$MEDV, ylab="Median value of owner-occupied homes in $1000's",xlab="index of accessibility to radial highways")
t.test(house.df$RAD, house.df$MEDV)
##
## Welch Two Sample t-test
##
## data: house.df$RAD and house.df$MEDV
## t = -23.06, df = 1007, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -14.08824 -11.87855
## sample estimates:
## mean of x mean of y
## 9.549407 22.532806
The p-value is less than 0.05, indicating that RAD and MEDV are correlated, but the estimate calculated shows that median value of the owner owned homes decreases because of the radial highway accessibility.
fit2<-lm(RAD~MEDV, data=house.df)
summary(fit2)
##
## Call:
## lm(formula = RAD ~ MEDV, data = house.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.391 -5.862 -3.658 8.893 24.375
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.69052 0.94853 18.650 <2e-16 ***
## MEDV -0.36130 0.03898 -9.269 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.056 on 504 degrees of freedom
## Multiple R-squared: 0.1456, Adjusted R-squared: 0.1439
## F-statistic: 85.91 on 1 and 504 DF, p-value: < 2.2e-16
The p-value is less than 0.05, indicating that RAD and MEDV are correlated, but the estimate calculated shows, that median value of the owner owned homes decreases because of the radial highway accessability.
lr.df<-house.df[,c(1,2,3,4,5,6,7,8,9,10,11,13,14)]
corrgram(lr.df, order=FALSE, lower.panel=panel.shade, upper.panel=panel.pie,
text.panel=panel.txt, main="Corrgram of housing dataset (linear regression)")
fit3<- lm(MEDV~CRIM+ZN+INDUS+CHAS+NOX+RM+AGE+DIS+RAD+TAX+PTRATIO+
LSTAT,data=house.df)
summary(fit3)
##
## Call:
## lm(formula = MEDV ~ CRIM + ZN + INDUS + CHAS + NOX + RM + AGE +
## DIS + RAD + TAX + PTRATIO + LSTAT, data = house.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.1304 -2.7673 -0.5814 1.9414 26.2526
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.617270 4.936039 8.431 3.79e-16 ***
## CRIM -0.121389 0.033000 -3.678 0.000261 ***
## ZN 0.046963 0.013879 3.384 0.000772 ***
## INDUS 0.013468 0.062145 0.217 0.828520
## CHAS 2.839993 0.870007 3.264 0.001173 **
## NOX -18.758022 3.851355 -4.870 1.50e-06 ***
## RM 3.658119 0.420246 8.705 < 2e-16 ***
## AGE 0.003611 0.013329 0.271 0.786595
## DIS -1.490754 0.201623 -7.394 6.17e-13 ***
## RAD 0.289405 0.066908 4.325 1.84e-05 ***
## TAX -0.012682 0.003801 -3.337 0.000912 ***
## PTRATIO -0.937533 0.132206 -7.091 4.63e-12 ***
## LSTAT -0.552019 0.050659 -10.897 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.798 on 493 degrees of freedom
## Multiple R-squared: 0.7343, Adjusted R-squared: 0.7278
## F-statistic: 113.5 on 12 and 493 DF, p-value: < 2.2e-16
The Linear Regression conducted above is valid as the p-value calculated is less than 0.05.
Through the Linear regression, we can decipher that:
CRIM(per capita crime by town), NOX(Nitric oxide concentration), DIS(weighted distance to five employment centers), TAX(full-value property-tax per 10000 dollars), PTRATIO(pupil-teacher ratio per town) and LSTAT(lower status of the population) have a negative effect on the MEDV(Median value of owner occupied homes in $10000s).
INDUS(proportion of non-retail businesses per town), AGE(proportion of owner-occupied units built prior to 1940) doesn’t influence MEDV.
ZN(proportion of residential land zoned for lots over 25,000 sq.ft.) and CHAS(Charles River Dummy Variable), RM(Average rooms per dwelling), RAD(index of accessibility to radial highways) influence MEDV positively.