West Roxbury Housing Analysis

Filip Dragicevic

Data visualization of West Roxbury Housing Prices

2022-03-09

## Loading required package: splines
## Loading required package: RcmdrMisc
## Loading required package: car
## Loading required package: carData
## Loading required package: sandwich
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
## Loading required package: effects
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
## The Commander GUI is launched only in interactive sessions
## 
## Attaching package: 'Rcmdr'
## The following object is masked from 'package:base':
## 
##     errorCondition
> WestRoxburyHousing <- 
+   read.table("C:/Users/filip/OneDrive/Desktop/MSCI 3230/westRoxbury (1).csv", 
+   header=TRUE, stringsAsFactors=TRUE, sep=",", na.strings="NA", dec=".", 
+   strip.white=TRUE)
> Boxplot( ~ TOTAL.VALUE, data=WestRoxburyHousing, id=list(method="y"))

 [1] "1"    "2"    "4346" "4345" "4344" "4343" "5802" "4342" "4341" "5801"
[11] "5800" "4340"
> with(WestRoxburyHousing, Hist(TOTAL.VALUE, scale="frequency", 
+   breaks="Sturges", col="darkgray")) 

> ## total value without grouping shows that value is right skewed, since the tail on the right is slightly longer.
> ## this can also be seen from the boxplot of total value above
> with(WestRoxburyHousing, Hist(TOTAL.VALUE, groups=REMODEL, 
+   scale="frequency", breaks="Sturges", col="darkgray")) 

> ## when grouping total value by remodel status, we can see that all are right skewed. 
> ## we can also observe that the frequency of homes that have not been remodeled is much higher than those who have (old and new). 

cor(WestRoxburyHousing[,c(“GROSS.AREA”,“LIVING.AREA”,“LOT.SQFT”,“TAX”, “TOTAL.VALUE”,“YR.BUILT”)], use=“complete”) ## by creating a correlation matrix, we are able to see which factors are strongly correlated to each other based on the value. Positive numbers indicate higher correlation ## the higher the tax value, the higher the total value; ## living and gross area are also strongly correlated to total value ```

> with(WestRoxburyHousing, Hist(LOT.SQFT, scale="frequency", breaks="Sturges", col="darkgray"))

> ## similar to before, lot size (in square footage) is also slightly right skewed, with a majority of the houses falling under 20,000 sqft.
> with(WestRoxburyHousing, Hist(LIVING.AREA, scale="frequency", breaks="Sturges", col="darkgray"))

> Boxplot(TOTAL.VALUE~REMODEL, data=WestRoxburyHousing, id=list(method="y"))

 [1] "1"    "2"    "4346" "4345" "4344" "4343" "4342" "4341" "4340" "4339"
[11] "4338" "4337" "4927" "4926" "4925" "4924" "4923" "4922" "4921" "4920"
[21] "4919" "4918" "5802" "5801" "5800" "5799" "5798" "5797" "5796" "5795"
[31] "5794" "5793"
> ## by creating a boxplot sorted by remodel status, we can see that homes with a recent remodel are of higher value (on average) than those who have an old remodel or none at all
> ## we can also see that there are many more outliers in terms of home value for those who have not been remodeled.
> with(WestRoxburyHousing, Hist(BEDROOMS, scale="frequency", breaks="Sturges", col="darkgray"))

> with(WestRoxburyHousing, Hist(FLOORS, scale="frequency", breaks="Sturges", col="darkgray"))

> with(WestRoxburyHousing, Hist(ROOMS, scale="frequency", breaks="Sturges", col="darkgray"))

> with(WestRoxburyHousing, Hist(FULL.BATH, scale="frequency", breaks="Sturges", col="darkgray"))

> with(WestRoxburyHousing, Hist(TAX, scale="frequency", breaks="Sturges", 
+   col="darkgray")) 

> ## histograms of bedrooms, floors, rooms, full baths, and tax are all also right skewed, which makes sense because you are more likely to see homes with fewer of these room/floor options than those with more.
> scatterplot(TOTAL.VALUE~BEDROOMS, regLine=FALSE, smooth=FALSE, boxplots=FALSE, 
+   data=WestRoxburyHousing)

> ## a scatterplot showing number of bedrooms vs. total value indicates that as the number of bedrooms increases, so does the total value. there are less and less instances as number of bedrooms increases.
> scatterplot(TOTAL.VALUE~LIVING.AREA, regLine=FALSE, smooth=FALSE, boxplots=FALSE, 
+   data=WestRoxburyHousing)

> ## similarly as before, living area vs. total value also has a line of best fit with a positive slope, meaning as living area increases, so does value. having the R^2-value in this instance would better help us determine the correlation between the two.
> library(colorspace, pos=18)