The goal of my regression analysis is to find variables that help predict sales price of homes. I believed that lot area, gr_liv_area, and year_built would be good indications for sales price. I ended up with an R squared value of 0.664 and an RSME value of 46112, meaning on average the variation of a prediction is off by $46,112.
I noticed that the sales price column was skewed in one direction which would throw off my correlations. I fixed this by putting it on a Log scale which gave it a normal distribution. I also tried turning the street column into flag values to see a correlation but decided to not include it as a variable.
The correlation between these variables and sales price isn’t the best but it did have some sort of shape. Below is a visual showing the shpe of the relationships. There is also a correlation matrix showing the values of how correlated they are.