Ames Housing Regression Homework

Intro:

The purpose of this analysis is to find predictors for housing prices. We’re given 79 variables, but only five are of interest to us, those being: “SalePrice”, “GrLiveArea”,“YearBuilt”,“Neighborhood”,“OverallQual”. Using “SalePrice” as the dependent variable and all other as independent variables we can now run our tests. My initial hypothesis is that these four variables will be a strong predictor/indicator for SalePrice, and this hypothesis is proven correct as the regression has a strong R^2 value of .8.

Data Information:

Ames housing data has 2,930 observations of 79 total variables. The five used for this analysis are as follows: SalePrice: Number field showing final sale price for property in USD. OverallQual: Categorical variable converted to a number scale, 1-10, with 1 being the worst and 10 being the best. Lists the overall condition a piece of property is in. Neighborhood: Categorical character field listing all the possible observed neighborhoods. YearBuilt: Integer field listing the year the building was constructed. GrLiveArea: Numeric field, showing the above ground living area each building has.

## `geom_smooth()` using formula = 'y ~ x'

Methods:

The regression model predicted the sale price. The independent variables included overall quality, above ground living area, neighborhood, and the year built. The model has a R^2 value of .8.

Limitations:

We were limited to selecting only 4 independent variables, adding more may give us a higher R^2 value (without over fitting).