Analyse the Athens Real Estate Market

Visualization

Make any plot(s) you think are informative.

To explore a linear relationship between price and square footage, I will plot square footage against price. However, I’m also interested in having some bedrooms without bathrooms for a couple reasons: (i) office room, (ii) storage room, and (iii) bedrooms sharing bathrooms.

plot.beds <- ggplot(data=Final_dataframe, aes(bedrooms))+ geom_histogram()+
                labs(title="Histogram for Bedrooms") +
                    labs(x="number of bedrooms", y="Count")
plot.beds

plot.baths <- ggplot(data=Final_dataframe, aes(bathrooms)) + geom_histogram()+
                labs(title="Histogram for Bathrooms") +
                    labs(x="number of bathrooms", y="Count")
plot.baths

plot.price <- ggplot(data=Final_dataframe, aes(price)) + geom_histogram()+
                labs(title="Histogram for Price") +
                    labs(x="price of homes", y="Count")
plot.price

graph1 <- ggplot(data = Dataset, aes(x = sqft, y = price, color = bedrooms)) + 
  geom_point() +
  labs(y = "Price ($)",
       x = "Square footage (squared feet)",
       title = "Square footage vs. Price of houses Athens GA")
graph1

We see a positive relationship of price ad square-foot per bedroom sizes. Basically as bedrooms go on, prices and suqare-foot per bedroom aksi goes up. But we see clustering in smaller bedroom and an outlier is seen when number of bedrooms is high, square-foot per bedroom was small.

Regression analysis and goodness of fit

Run a simple OLS command, as follows (with your variable names, of course) I estimated two models: linear and log. Not surprisingly after the visual analysis, the regression results indicate a better fit (highest R-squared) with the logistic models.

#model 1
#linear model 
linear_model <- lm(price~bedrooms+sqft, data=Final_dataframe)
summary(linear_model)
## 
## Call:
## lm(formula = price ~ bedrooms + sqft, data = Final_dataframe)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -372710  -71528   23765   70501  367317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  75159.74   31868.41   2.358   0.0203 *  
## bedrooms    -18649.12   10406.62  -1.792   0.0761 .  
## sqft           194.79      11.48  16.968   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 140000 on 102 degrees of freedom
## Multiple R-squared:  0.8184, Adjusted R-squared:  0.8149 
## F-statistic: 229.9 on 2 and 102 DF,  p-value: < 2.2e-16
#model 2
#non-linear model
nonlinear_model <- lm(log(price)~log(bedrooms)+log(sqft), data=Final_dataframe)
summary(nonlinear_model)
## 
## Call:
## lm(formula = log(price) ~ log(bedrooms) + log(sqft), data = Final_dataframe)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26260 -0.19347  0.08338  0.20092  0.54001 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    5.14201    0.53111   9.682 4.13e-16 ***
## log(bedrooms) -0.15733    0.13178  -1.194    0.235    
## log(sqft)      1.04069    0.08322  12.505  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3217 on 102 degrees of freedom
## Multiple R-squared:  0.7422, Adjusted R-squared:  0.7372 
## F-statistic: 146.9 on 2 and 102 DF,  p-value: < 2.2e-16

Suggestions

Given the following analysis, I would have some suggestions: * I would advise on choosing houses with a median of $500,000 and an average of 2-3 bedrooms because a rise in bedrooms with small square-feet would actually may have an inverse relationship on the price. That entirely depends on the area, one is choosing to buy their house in.

  • I may also like to include additional variables like safety indes, crime rate of the area, happiness index.