library(rmarkdown)
library(knitr)
library(corrplot)
library(psych)
propertyValues <-read.csv(url("https://www.dropbox.com/s/kbzr00qy4b9kks3/STAT_360_-_Property_Values.csv?dl=1"))
attach(propertyValues)
Use lm() to construct a multiple linear regression model of the effects of all 9 predictor variables on SellingPrice, assign it to mod1, and use summary() to display the resulting model.
mod1<-lm(SellingPrice~., data=propertyValues)
summary(mod1)
##
## Call:
## lm(formula = SellingPrice ~ ., data = propertyValues)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1649401 -126821 -13161 100197 3960635
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.761e+06 1.507e+05 38.219 < 2e-16 ***
## Bedrooms -6.180e+04 2.178e+03 -28.374 < 2e-16 ***
## Bathrooms 6.214e+04 3.756e+03 16.542 < 2e-16 ***
## TotalSqFt 2.922e+02 2.899e+00 100.777 < 2e-16 ***
## LotSize -2.956e-01 3.970e-02 -7.445 1.01e-13 ***
## Floors 5.563e+04 3.687e+03 15.090 < 2e-16 ***
## Waterfront 7.324e+05 1.880e+04 38.956 < 2e-16 ***
## Condition 1.842e+04 2.732e+03 6.745 1.57e-11 ***
## YearBuilt -3.311e+03 1.328e+02 -24.933 < 2e-16 ***
## YearRenovated 3.175e+02 1.411e+02 2.251 0.0244 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 236300 on 21603 degrees of freedom
## Multiple R-squared: 0.5864, Adjusted R-squared: 0.5863
## F-statistic: 3404 on 9 and 21603 DF, p-value: < 2.2e-16
Use principal() to reduce the 9 predictor variables into 9 orthogonal components, assign it to pca.full, and display the results. Hint: the argument nfactors can be used to designate the number of orthogonal components that you want the analysis to construct.
propertyval1<-propertyValues[2:10]
pca.full<-principal(propertyval1, nfactors = 9)
pca.full
## Principal Components Analysis
## Call: principal(r = propertyval1, nfactors = 9)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC7 RC2 RC6 RC5 RC4 RC3 RC8 RC9 h2
## Bedrooms 0.06 0.27 0.95 0.05 0.03 0.00 -0.01 0.12 0.00 1
## Bathrooms 0.34 0.48 0.28 0.24 -0.02 0.03 0.03 0.72 0.00 1
## TotalSqFt 0.17 0.90 0.31 0.14 -0.01 0.10 0.06 0.19 0.00 1
## LotSize 0.03 0.07 0.01 -0.01 0.00 1.00 0.01 0.02 0.00 1
## Floors 0.29 0.15 0.06 0.93 -0.11 -0.02 0.01 0.12 0.00 1
## Waterfront -0.01 0.05 -0.01 0.01 0.01 0.01 1.00 0.02 0.00 1
## Condition -0.22 -0.01 0.03 -0.10 0.97 0.00 0.01 -0.01 0.00 1
## YearBuilt 0.94 0.11 0.05 0.18 -0.14 0.02 -0.02 0.10 -0.20 1
## YearRenovated 0.92 0.13 0.05 0.19 -0.18 0.02 0.00 0.13 0.22 1
## u2 com
## Bedrooms 1.1e-16 1.2
## Bathrooms -4.4e-16 3.0
## TotalSqFt -4.4e-16 1.5
## LotSize -6.7e-16 1.0
## Floors -6.7e-16 1.3
## Waterfront -4.4e-16 1.0
## Condition 0.0e+00 1.1
## YearBuilt -8.9e-16 1.3
## YearRenovated -1.3e-15 1.4
##
## RC1 RC7 RC2 RC6 RC5 RC4 RC3 RC8 RC9
## SS loadings 2.01 1.16 1.09 1.02 1.01 1.01 1.00 0.61 0.09
## Proportion Var 0.22 0.13 0.12 0.11 0.11 0.11 0.11 0.07 0.01
## Cumulative Var 0.22 0.35 0.47 0.59 0.70 0.81 0.92 0.99 1.00
## Proportion Explained 0.22 0.13 0.12 0.11 0.11 0.11 0.11 0.07 0.01
## Cumulative Proportion 0.22 0.35 0.47 0.59 0.70 0.81 0.92 0.99 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 9 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0
## with the empirical chi square 0 with prob < NA
##
## Fit based upon off diagonal values = 1
Use corrplot() to display the loadings table of this analysis (essentially a correlation table of the 9 predictor variables against the 9 orthogonal components generated by principal()) in a more visually-appealing manner. Hint: the $ command can be used to extract specific component of the results, such as the loadings table (loadings) or the residual mean square (rms).
corrplot(pca.full$loadings)
Do any of the 9 orthogonal components generated by principal() appear to be unrelated to the original 9 predictor variables?
Yes the 9th orthogonal component appears to be unrelated to the original 9 predictor variables.
Use plot() create a scree plot of this principal component analysis (i.e., a scatterplot of the eigenvalues stored in $values). Hint: The “type” argument in plot() can help you create a scatterplot where the dots are connected by lines.
plot(pca.full$values, ylab="Eigenvalues", xlab= "Component Number", type="o")
Identify the point of inflection in the scree plot. Hint: make sure that the height and width of your scree plot are similar before making a conclusion.
The point of inflection is at the third component.
Use the point of inflection to determine how many orthogonal components the original 9 predictor variables should be reduced to.
The original 9 predictor variables should be reduced to 2 orthogonal components.
Use principal() to reduce the 9 predictor variables into the number of orthogonal components identified in Exercise 7, assign it to pca.new, and display the results. Hint: the argument nfactors can be used to designate the number of orthogonal components that you want the analysis to construct.
pca.new<-principal(propertyval1, nfactors = 2)
pca.new
## Principal Components Analysis
## Call: principal(r = propertyval1, nfactors = 2)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 h2 u2 com
## Bedrooms 0.07 0.80 0.644 0.36 1.0
## Bathrooms 0.53 0.73 0.813 0.19 1.8
## TotalSqFt 0.30 0.85 0.807 0.19 1.3
## LotSize 0.00 0.22 0.048 0.95 1.0
## Floors 0.67 0.23 0.501 0.50 1.2
## Waterfront -0.06 0.17 0.032 0.97 1.2
## Condition -0.63 0.26 0.466 0.53 1.3
## YearBuilt 0.90 0.10 0.818 0.18 1.0
## YearRenovated 0.91 0.12 0.848 0.15 1.0
##
## RC1 RC2
## SS loadings 2.86 2.11
## Proportion Var 0.32 0.23
## Cumulative Var 0.32 0.55
## Proportion Explained 0.58 0.42
## Cumulative Proportion 0.58 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 2 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.08
## with the empirical chi square 10014.93 with prob < 0
##
## Fit based upon off diagonal values = 0.94
Use corrplot() to display the loadings table of this new analysis in a more visually-appealing manner.
corrplot(pca.new$loadings)
One of the orthogonal components is constructed, in part, from Bedrooms - describe the association between Bedrooms and the other predictor variables that played a significant role in the construction of this orthogonal component.
The orthogonal component is constructed significantly by the predictor variables Bedrooms, Bathrooms, TotalSqFt and strongly positively related.
What latent variable do you believe is being represented by this orthogonal component?
The max number of people the house can have living in it.
Use the loadings table to calculate the values of the orthogonal components for the first property (i.e., the first row) using the values of the corresponding predictor variables. Hint: each orthogonal component can be thought of as a linear combination (i.e., a sum) of all 9 predictor variables.
rowSums(pca.new$loadings*propertyValues[1,2:10])
## 1
## 3917.373
Use the loadings table to calculate the values of the orthogonal components for all 21,613 properties and assign them to RC1, RC2, RC3, etc., respectively, depending on the number of orthogonal components. Hint: the command rowSums may come in handy.
RC1<-colSums(pca.new$loadings[,1]*t(propertyValues[,2:10]))
RC2<-colSums(pca.new$loadings[,2]*t(propertyValues[,2:10]))
Use lm() to construct a multiple linear regression model of the effects of the orthogonal components on SellingPrice, assign it to mod.pca, and use summary() to display the resulting model.
mod.pca<-lm(SellingPrice~RC1+RC2)
summary(mod.pca)
##
## Call:
## lm(formula = SellingPrice ~ RC1 + RC2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1081449 -159970 -33942 105597 4735295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.992e+06 2.754e+04 -108.65 <2e-16 ***
## RC1 8.467e+02 6.641e+00 127.49 <2e-16 ***
## RC2 -1.192e+01 2.451e-01 -48.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 274500 on 21610 degrees of freedom
## Multiple R-squared: 0.4416, Adjusted R-squared: 0.4416
## F-statistic: 8546 on 2 and 21610 DF, p-value: < 2.2e-16