This project analyzes housing data using R to identify important factors influencing house prices. The project includes data cleaning, exploratory data analysis, visualizations, and predictive modeling.
Load Packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.3 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(housing, aes(x = Sale_Price)) +geom_histogram(fill ="skyblue", bins =20) +labs(title ="Distribution of House Prices",x ="Sale Price",y ="Count" )
Living Area vs Sale Price
ggplot(housing, aes(x = Gr_Liv_Area, y = Sale_Price)) +geom_point(color ="blue") +geom_smooth(method ="lm", color ="red") +labs(title ="Living Area vs Sale Price",x ="Living Area",y ="Sale Price" )
This project explored housing data using R and applied predictive analytics techniques to predict house prices. The project demonstrated data cleaning, exploratory analysis, visualization, and machine learning using linear regression.