library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ames = read_csv("ames (1).csv")
## Rows: 2930 Columns: 72
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (35): MS.Zoning, Street, Lot.Shape, Land.Contour, Utilities, Lot.Config,...
## dbl (37): Order, PID, area, price, MS.SubClass, Lot.Area, Overall.Qual, Over...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(ames)
## Rows: 2,930
## Columns: 72
## $ Order <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ PID <dbl> 526301100, 526350040, 526351010, 526353030, 527105010,…
## $ area <dbl> 1656, 896, 1329, 2110, 1629, 1604, 1338, 1280, 1616, 1…
## $ price <dbl> 215000, 105000, 172000, 244000, 189900, 195500, 213500…
## $ MS.SubClass <dbl> 20, 20, 20, 20, 60, 60, 120, 120, 120, 60, 60, 20, 60,…
## $ MS.Zoning <chr> "RL", "RH", "RL", "RL", "RL", "RL", "RL", "RL", "RL", …
## $ Lot.Area <dbl> 31770, 11622, 14267, 11160, 13830, 9978, 4920, 5005, 5…
## $ Street <chr> "Pave", "Pave", "Pave", "Pave", "Pave", "Pave", "Pave"…
## $ Lot.Shape <chr> "IR1", "Reg", "IR1", "Reg", "IR1", "IR1", "Reg", "IR1"…
## $ Land.Contour <chr> "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "HLS"…
## $ Utilities <chr> "AllPub", "AllPub", "AllPub", "AllPub", "AllPub", "All…
## $ Lot.Config <chr> "Corner", "Inside", "Corner", "Corner", "Inside", "Ins…
## $ Land.Slope <chr> "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl"…
## $ Neighborhood <chr> "NAmes", "NAmes", "NAmes", "NAmes", "Gilbert", "Gilber…
## $ Condition.1 <chr> "Norm", "Feedr", "Norm", "Norm", "Norm", "Norm", "Norm…
## $ Condition.2 <chr> "Norm", "Norm", "Norm", "Norm", "Norm", "Norm", "Norm"…
## $ Bldg.Type <chr> "1Fam", "1Fam", "1Fam", "1Fam", "1Fam", "1Fam", "Twnhs…
## $ House.Style <chr> "1Story", "1Story", "1Story", "1Story", "2Story", "2St…
## $ Overall.Qual <dbl> 6, 5, 6, 7, 5, 6, 8, 8, 8, 7, 6, 6, 6, 7, 8, 8, 8, 9, …
## $ Overall.Cond <dbl> 5, 6, 6, 5, 5, 6, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 7, 2, …
## $ Year.Built <dbl> 1960, 1961, 1958, 1968, 1997, 1998, 2001, 1992, 1995, …
## $ Year.Remod.Add <dbl> 1960, 1961, 1958, 1968, 1998, 1998, 2001, 1992, 1996, …
## $ Roof.Style <chr> "Hip", "Gable", "Hip", "Hip", "Gable", "Gable", "Gable…
## $ Roof.Matl <chr> "CompShg", "CompShg", "CompShg", "CompShg", "CompShg",…
## $ Exterior.1st <chr> "BrkFace", "VinylSd", "Wd Sdng", "BrkFace", "VinylSd",…
## $ Exterior.2nd <chr> "Plywood", "VinylSd", "Wd Sdng", "BrkFace", "VinylSd",…
## $ Mas.Vnr.Type <chr> "Stone", "None", "BrkFace", "None", "None", "BrkFace",…
## $ Mas.Vnr.Area <dbl> 112, 0, 108, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 603,…
## $ Exter.Qual <chr> "TA", "TA", "TA", "Gd", "TA", "TA", "Gd", "Gd", "Gd", …
## $ Exter.Cond <chr> "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", …
## $ Foundation <chr> "CBlock", "CBlock", "CBlock", "CBlock", "PConc", "PCon…
## $ Bsmt.Qual <chr> "TA", "TA", "TA", "TA", "Gd", "TA", "Gd", "Gd", "Gd", …
## $ Bsmt.Cond <chr> "Gd", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", …
## $ Bsmt.Exposure <chr> "Gd", "No", "No", "No", "No", "No", "Mn", "No", "No", …
## $ BsmtFin.Type.1 <chr> "BLQ", "Rec", "ALQ", "ALQ", "GLQ", "GLQ", "GLQ", "ALQ"…
## $ BsmtFin.SF.1 <dbl> 639, 468, 923, 1065, 791, 602, 616, 263, 1180, 0, 0, 9…
## $ BsmtFin.Type.2 <chr> "Unf", "LwQ", "Unf", "Unf", "Unf", "Unf", "Unf", "Unf"…
## $ BsmtFin.SF.2 <dbl> 0, 144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1120, 0, 0…
## $ Bsmt.Unf.SF <dbl> 441, 270, 406, 1045, 137, 324, 722, 1017, 415, 994, 76…
## $ Total.Bsmt.SF <dbl> 1080, 882, 1329, 2110, 928, 926, 1338, 1280, 1595, 994…
## $ Heating <chr> "GasA", "GasA", "GasA", "GasA", "GasA", "GasA", "GasA"…
## $ Heating.QC <chr> "Fa", "TA", "TA", "Ex", "Gd", "Ex", "Ex", "Ex", "Ex", …
## $ Central.Air <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ Electrical <chr> "SBrkr", "SBrkr", "SBrkr", "SBrkr", "SBrkr", "SBrkr", …
## $ X1st.Flr.SF <dbl> 1656, 896, 1329, 2110, 928, 926, 1338, 1280, 1616, 102…
## $ X2nd.Flr.SF <dbl> 0, 0, 0, 0, 701, 678, 0, 0, 0, 776, 892, 0, 676, 0, 0,…
## $ Low.Qual.Fin.SF <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Bsmt.Full.Bath <dbl> 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, …
## $ Bsmt.Half.Bath <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Full.Bath <dbl> 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 3, 2, 1, …
## $ Half.Bath <dbl> 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
## $ Bedroom.AbvGr <dbl> 3, 2, 3, 3, 3, 3, 2, 2, 2, 3, 3, 3, 3, 2, 1, 4, 4, 1, …
## $ Kitchen.AbvGr <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Kitchen.Qual <chr> "TA", "TA", "Gd", "Ex", "TA", "Gd", "Gd", "Gd", "Gd", …
## $ TotRms.AbvGrd <dbl> 7, 5, 6, 8, 6, 7, 6, 5, 5, 7, 7, 6, 7, 5, 4, 12, 8, 8,…
## $ Functional <chr> "Typ", "Typ", "Typ", "Typ", "Typ", "Typ", "Typ", "Typ"…
## $ Fireplaces <dbl> 2, 0, 0, 2, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, …
## $ Garage.Type <chr> "Attchd", "Attchd", "Attchd", "Attchd", "Attchd", "Att…
## $ Garage.Cars <dbl> 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, …
## $ Garage.Area <dbl> 528, 730, 312, 522, 482, 470, 582, 506, 608, 442, 440,…
## $ Paved.Drive <chr> "P", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ Wood.Deck.SF <dbl> 210, 140, 393, 0, 212, 360, 0, 0, 237, 140, 157, 483, …
## $ Open.Porch.SF <dbl> 62, 0, 36, 0, 34, 36, 0, 82, 152, 60, 84, 21, 75, 0, 5…
## $ Enclosed.Porch <dbl> 0, 0, 0, 0, 0, 0, 170, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X3Ssn.Porch <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Screen.Porch <dbl> 0, 120, 0, 0, 0, 0, 0, 144, 0, 0, 0, 0, 0, 0, 140, 210…
## $ Pool.Area <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Misc.Val <dbl> 0, 0, 12500, 0, 0, 0, 0, 0, 0, 0, 0, 500, 0, 0, 0, 0, …
## $ Mo.Sold <dbl> 5, 6, 6, 4, 3, 6, 4, 1, 3, 6, 4, 3, 5, 2, 6, 6, 6, 6, …
## $ Yr.Sold <dbl> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, …
## $ Sale.Type <chr> "WD", "WD", "WD", "WD", "WD", "WD", "WD", "WD", "WD", …
## $ Sale.Condition <chr> "Normal", "Normal", "Normal", "Normal", "Normal", "Nor…
There are 2,930 rows in my dataset
There are 72 variables in my dataset
anyNA(ames)
## [1] TRUE
There are (is) missing value in my dataset
colSums(is.na(ames))
## Order PID area price MS.SubClass
## 0 0 0 0 0
## MS.Zoning Lot.Area Street Lot.Shape Land.Contour
## 0 0 0 0 0
## Utilities Lot.Config Land.Slope Neighborhood Condition.1
## 0 0 0 0 0
## Condition.2 Bldg.Type House.Style Overall.Qual Overall.Cond
## 0 0 0 0 0
## Year.Built Year.Remod.Add Roof.Style Roof.Matl Exterior.1st
## 0 0 0 0 0
## Exterior.2nd Mas.Vnr.Type Mas.Vnr.Area Exter.Qual Exter.Cond
## 0 23 23 0 0
## Foundation Bsmt.Qual Bsmt.Cond Bsmt.Exposure BsmtFin.Type.1
## 0 80 80 83 80
## BsmtFin.SF.1 BsmtFin.Type.2 BsmtFin.SF.2 Bsmt.Unf.SF Total.Bsmt.SF
## 1 81 1 1 1
## Heating Heating.QC Central.Air Electrical X1st.Flr.SF
## 0 0 0 1 0
## X2nd.Flr.SF Low.Qual.Fin.SF Bsmt.Full.Bath Bsmt.Half.Bath Full.Bath
## 0 0 2 2 0
## Half.Bath Bedroom.AbvGr Kitchen.AbvGr Kitchen.Qual TotRms.AbvGrd
## 0 0 0 0 0
## Functional Fireplaces Garage.Type Garage.Cars Garage.Area
## 0 0 157 1 1
## Paved.Drive Wood.Deck.SF Open.Porch.SF Enclosed.Porch X3Ssn.Porch
## 0 0 0 0 0
## Screen.Porch Pool.Area Misc.Val Mo.Sold Yr.Sold
## 0 0 0 0 0
## Sale.Type Sale.Condition
## 0 0
Variables with the missing values are Mas.Vnr.Type, Mas.Vnr.Area, Bsmt.Cond, BsmtFin.SF., Bsmt.Exposure, BsmtFin.Type.1, Bsmt.Qual, BsmtFin.Type.2, BsmtFin.SF.2, Bsmt.Unf.SF, Total.Bsmt.SF, Bsmt.Full.Bath, Bsmt.Half.Bath, Garage.Type, Garage.Cars, Garage.Area
RQ1: Does the presence of Mas.Vnr.Type, Mas.Vnr.Area, Bsmt.Cond, BsmtFin.SF., Bsmt.Exposure, BsmtFin.Type.1, Bsmt.Qual, BsmtFin.Type.2, BsmtFin.SF.2, Bsmt.Unf.SF, Total.Bsmt.SF, Bsmt.Full.Bath, Bsmt.Half.Bath, Garage.Type, Garage.Cars, Garage.Area affect the price of the house?
HO: Mas.Vnr.Type, Mas.Vnr.Area, Bsmt.Cond, BsmtFin.SF., Bsmt.Exposure, BsmtFin.Type.1, Bsmt.Qual, BsmtFin.Type.2, BsmtFin.SF.2, Bsmt.Unf.SF, Total.Bsmt.SF, Bsmt.Full.Bath, Bsmt.Half.Bath, Garage.Type, Garage.Cars, Garage.Area is associated with the price of the house. H1:Mas.Vnr.Type, Mas.Vnr.Area, Bsmt.Cond, BsmtFin.SF., Bsmt.Exposure, BsmtFin.Type.1, Bsmt.Qual, BsmtFin.Type.2, BsmtFin.SF.2, Bsmt.Unf.SF, Total.Bsmt.SF, Bsmt.Full.Bath, Bsmt.Half.Bath, Garage.Type, Garage.Cars, Garage.Area is not associated with the price of the house.
The relevant variable I will be using is foundation.
My main response variable or target variable is price of a house.
ames %>%
ggplot(aes(x = price))+
geom_histogram(color = "black", fill = "pink")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
labs(x = "Price of a House", y = "Frequency")
## $x
## [1] "Price of a House"
##
## $y
## [1] "Frequency"
##
## attr(,"class")
## [1] "labels"
### Categorical (independent variable) vs Numeric (response)
ames %>%
ggplot(aes(x = price))+
geom_density(color = "black", fill = "purple")
labs(x = "Price in $1,000",
y = "Count")
## $x
## [1] "Price in $1,000"
##
## $y
## [1] "Count"
##
## attr(,"class")
## [1] "labels"
ames %>%
ggplot(aes(x = Garage.Type, y = price, fill = Garage.Type))+
geom_boxplot()+
labs(y = "Garage Type",
y = "price in $1,000")
1. (6 points) Compute and report summary statistics (e.g., mean,
standard deviation, and five number summary) for summarizing the
distribution of the response variable identified in Part I 2(c).
ames %>%
filter(!is.na(price))%>%
summarise(Mean = mean(price),
SD = sd(price),
Min = min(price),
Q1 = quantile(price, 0.25),
Median = median(price),
Q3 = quantile(price, 0.75),
Max = max(price))
The average price of houses is around $180,796. With Minimum of $12,789 and Maximum of $755,000 respectively.
ames %>%
filter(!is.na(price))%>%
ggplot(aes(x=price))+geom_histogram(color = "black", fill = "pink")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Because we have 2930 samples, then we say that my data agrees with the
rules of CLT (Central Limit Theorem).
library(infer)
ames %>%
filter(!is.na(price))%>%
t_test(
response = price,
mu = 180796.1,
conf_int = T,
conf_level = 0.95)
I am 95% confident that the true population mean Price of houses in between $177,902.3 and $183,689.9 dollars.
From the results above, we see that the actual population mean does not fall within the range of the confidence bounds or not enclosed by the bound. Hence, we reject the null hypothesis and conclude there is not enough of sufficient evidence that the mean price of houses are the same.
ames %>%
group_by(House.Style) %>%
summarise(Mean = mean(price),
SD = sd(price))
library(infer)
ames %>%
filter(!is.na(price))%>%
t_test(response = price,
explanatory = House.Style,
order = c("1.5Fin", "1.5Unf"),
mu = 180796.1,
conf_int = T,
conf_level = 0.90)
The results above shows that the mean difference between the prices of house styles are significant. Since the confidence bounds is not enclosed by zero. Hence, we say that the prices of house styles differs. (i.e., price differences between $18,679 and $37,054.53 respectively)
glimpse(ames)
## Rows: 2,930
## Columns: 72
## $ Order <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ PID <dbl> 526301100, 526350040, 526351010, 526353030, 527105010,…
## $ area <dbl> 1656, 896, 1329, 2110, 1629, 1604, 1338, 1280, 1616, 1…
## $ price <dbl> 215000, 105000, 172000, 244000, 189900, 195500, 213500…
## $ MS.SubClass <dbl> 20, 20, 20, 20, 60, 60, 120, 120, 120, 60, 60, 20, 60,…
## $ MS.Zoning <chr> "RL", "RH", "RL", "RL", "RL", "RL", "RL", "RL", "RL", …
## $ Lot.Area <dbl> 31770, 11622, 14267, 11160, 13830, 9978, 4920, 5005, 5…
## $ Street <chr> "Pave", "Pave", "Pave", "Pave", "Pave", "Pave", "Pave"…
## $ Lot.Shape <chr> "IR1", "Reg", "IR1", "Reg", "IR1", "IR1", "Reg", "IR1"…
## $ Land.Contour <chr> "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "HLS"…
## $ Utilities <chr> "AllPub", "AllPub", "AllPub", "AllPub", "AllPub", "All…
## $ Lot.Config <chr> "Corner", "Inside", "Corner", "Corner", "Inside", "Ins…
## $ Land.Slope <chr> "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl"…
## $ Neighborhood <chr> "NAmes", "NAmes", "NAmes", "NAmes", "Gilbert", "Gilber…
## $ Condition.1 <chr> "Norm", "Feedr", "Norm", "Norm", "Norm", "Norm", "Norm…
## $ Condition.2 <chr> "Norm", "Norm", "Norm", "Norm", "Norm", "Norm", "Norm"…
## $ Bldg.Type <chr> "1Fam", "1Fam", "1Fam", "1Fam", "1Fam", "1Fam", "Twnhs…
## $ House.Style <chr> "1Story", "1Story", "1Story", "1Story", "2Story", "2St…
## $ Overall.Qual <dbl> 6, 5, 6, 7, 5, 6, 8, 8, 8, 7, 6, 6, 6, 7, 8, 8, 8, 9, …
## $ Overall.Cond <dbl> 5, 6, 6, 5, 5, 6, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 7, 2, …
## $ Year.Built <dbl> 1960, 1961, 1958, 1968, 1997, 1998, 2001, 1992, 1995, …
## $ Year.Remod.Add <dbl> 1960, 1961, 1958, 1968, 1998, 1998, 2001, 1992, 1996, …
## $ Roof.Style <chr> "Hip", "Gable", "Hip", "Hip", "Gable", "Gable", "Gable…
## $ Roof.Matl <chr> "CompShg", "CompShg", "CompShg", "CompShg", "CompShg",…
## $ Exterior.1st <chr> "BrkFace", "VinylSd", "Wd Sdng", "BrkFace", "VinylSd",…
## $ Exterior.2nd <chr> "Plywood", "VinylSd", "Wd Sdng", "BrkFace", "VinylSd",…
## $ Mas.Vnr.Type <chr> "Stone", "None", "BrkFace", "None", "None", "BrkFace",…
## $ Mas.Vnr.Area <dbl> 112, 0, 108, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 603,…
## $ Exter.Qual <chr> "TA", "TA", "TA", "Gd", "TA", "TA", "Gd", "Gd", "Gd", …
## $ Exter.Cond <chr> "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", …
## $ Foundation <chr> "CBlock", "CBlock", "CBlock", "CBlock", "PConc", "PCon…
## $ Bsmt.Qual <chr> "TA", "TA", "TA", "TA", "Gd", "TA", "Gd", "Gd", "Gd", …
## $ Bsmt.Cond <chr> "Gd", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", …
## $ Bsmt.Exposure <chr> "Gd", "No", "No", "No", "No", "No", "Mn", "No", "No", …
## $ BsmtFin.Type.1 <chr> "BLQ", "Rec", "ALQ", "ALQ", "GLQ", "GLQ", "GLQ", "ALQ"…
## $ BsmtFin.SF.1 <dbl> 639, 468, 923, 1065, 791, 602, 616, 263, 1180, 0, 0, 9…
## $ BsmtFin.Type.2 <chr> "Unf", "LwQ", "Unf", "Unf", "Unf", "Unf", "Unf", "Unf"…
## $ BsmtFin.SF.2 <dbl> 0, 144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1120, 0, 0…
## $ Bsmt.Unf.SF <dbl> 441, 270, 406, 1045, 137, 324, 722, 1017, 415, 994, 76…
## $ Total.Bsmt.SF <dbl> 1080, 882, 1329, 2110, 928, 926, 1338, 1280, 1595, 994…
## $ Heating <chr> "GasA", "GasA", "GasA", "GasA", "GasA", "GasA", "GasA"…
## $ Heating.QC <chr> "Fa", "TA", "TA", "Ex", "Gd", "Ex", "Ex", "Ex", "Ex", …
## $ Central.Air <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ Electrical <chr> "SBrkr", "SBrkr", "SBrkr", "SBrkr", "SBrkr", "SBrkr", …
## $ X1st.Flr.SF <dbl> 1656, 896, 1329, 2110, 928, 926, 1338, 1280, 1616, 102…
## $ X2nd.Flr.SF <dbl> 0, 0, 0, 0, 701, 678, 0, 0, 0, 776, 892, 0, 676, 0, 0,…
## $ Low.Qual.Fin.SF <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Bsmt.Full.Bath <dbl> 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, …
## $ Bsmt.Half.Bath <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Full.Bath <dbl> 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 3, 2, 1, …
## $ Half.Bath <dbl> 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
## $ Bedroom.AbvGr <dbl> 3, 2, 3, 3, 3, 3, 2, 2, 2, 3, 3, 3, 3, 2, 1, 4, 4, 1, …
## $ Kitchen.AbvGr <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Kitchen.Qual <chr> "TA", "TA", "Gd", "Ex", "TA", "Gd", "Gd", "Gd", "Gd", …
## $ TotRms.AbvGrd <dbl> 7, 5, 6, 8, 6, 7, 6, 5, 5, 7, 7, 6, 7, 5, 4, 12, 8, 8,…
## $ Functional <chr> "Typ", "Typ", "Typ", "Typ", "Typ", "Typ", "Typ", "Typ"…
## $ Fireplaces <dbl> 2, 0, 0, 2, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, …
## $ Garage.Type <chr> "Attchd", "Attchd", "Attchd", "Attchd", "Attchd", "Att…
## $ Garage.Cars <dbl> 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, …
## $ Garage.Area <dbl> 528, 730, 312, 522, 482, 470, 582, 506, 608, 442, 440,…
## $ Paved.Drive <chr> "P", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ Wood.Deck.SF <dbl> 210, 140, 393, 0, 212, 360, 0, 0, 237, 140, 157, 483, …
## $ Open.Porch.SF <dbl> 62, 0, 36, 0, 34, 36, 0, 82, 152, 60, 84, 21, 75, 0, 5…
## $ Enclosed.Porch <dbl> 0, 0, 0, 0, 0, 0, 170, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X3Ssn.Porch <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Screen.Porch <dbl> 0, 120, 0, 0, 0, 0, 0, 144, 0, 0, 0, 0, 0, 0, 140, 210…
## $ Pool.Area <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Misc.Val <dbl> 0, 0, 12500, 0, 0, 0, 0, 0, 0, 0, 0, 500, 0, 0, 0, 0, …
## $ Mo.Sold <dbl> 5, 6, 6, 4, 3, 6, 4, 1, 3, 6, 4, 3, 5, 2, 6, 6, 6, 6, …
## $ Yr.Sold <dbl> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, …
## $ Sale.Type <chr> "WD", "WD", "WD", "WD", "WD", "WD", "WD", "WD", "WD", …
## $ Sale.Condition <chr> "Normal", "Normal", "Normal", "Normal", "Normal", "Nor…
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
ames %>%
dplyr::select(price, Fireplaces, Garage.Cars, Garage.Area, Enclosed.Porch, Overall.Cond, Overall.Qual) %>%
ggpairs()
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
There is significant strong positive correlation between Garage
Area and Price.
full.model = lm(price~ Fireplaces + Garage.Cars + Garage.Area + Enclosed.Porch + Overall.Cond + Overall.Qual, data = ames)
summary(full.model)
##
## Call:
## lm(formula = price ~ Fireplaces + Garage.Cars + Garage.Area +
## Enclosed.Porch + Overall.Cond + Overall.Qual, data = ames)
##
## Residuals:
## Min 1Q Median 3Q Max
## -276134 -26062 -2473 19403 381278
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -79252.791 5621.943 -14.097 <2e-16 ***
## Fireplaces 20594.247 1334.240 15.435 <2e-16 ***
## Garage.Cars 5485.027 2370.718 2.314 0.0208 *
## Garage.Area 81.040 8.053 10.063 <2e-16 ***
## Enclosed.Porch -22.599 12.491 -1.809 0.0705 .
## Overall.Cond 186.610 721.622 0.259 0.7960
## Overall.Qual 32678.606 728.625 44.850 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42610 on 2922 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.7161, Adjusted R-squared: 0.7155
## F-statistic: 1229 on 6 and 2922 DF, p-value: < 2.2e-16
\(\hat{Y} = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p\)
price = -79252.791 + 20594.247 x Fireplaces + 5485.027 x Garage.Cars + 81.040 x Garage.Area - 22.599 x Enclosed.Porch + 186.610 x Overall.Qual + 32678.606
From the summary table above, only Enclosed.Porch has a negative estimate of -22.599 respectively. Fireplaces, Garage.Cars, Garage.Area, Overall.Cond, and Overall.Qual has a positive estimate. F-statistic: 1229 on 6 and 2922 DF. P-value: < 2.2e-16. The overall model is statistically significant.
Fireplaces, Garage.Area, and Overall.Qual are all significantly significant (p , 0.001), indicating it has a strong positive influence on car price.
All other variables (Garage.Cars, Enclosed.Porch, and Overall.Cond) show no statistically significant relationship with price at the 5% level.
Despite half of the variables being insignificant individually, the overall model is significant, suggesting potential multicollinearity or combined effects.
Multiple R-squared: 0.7161 , 71.61% of the total variation of price can be explained by Price, Fireplaces, Garage.Cars, Garage.Area, Enclosed.Porch, Overall.Cond, Overall.Qual.
Output from R
House.Style
1.5Fin 137529.9 47225.67
1.5Unf 109663.2 20569.59
1Story 178699.9 81066.94
2.5Fin 220000.0 118211.98
2.5Unf 177158.3 76114.76
2Story 206990.2 85349.91
SFoyer 143472.7 31220.08
SLvl 165527.4 34348.13
From the summary table above, only Enclosed.Porch has a negative estimate of -22.599 respectively. Fireplaces, Garage.Cars, Garage.Area, Overall.Cond, and Overall.Qual has a positive estimate. F-statistic: 1229 on 6 and 2922 DF. P-value: < 2.2e-16. The overall model is statistically significant.
Fireplaces, Garage.Area, and Overall.Qual are all significantly significant (p , 0.001), indicating it has a strong positive influence on car price.
All other variables (Garage.Cars, Enclosed.Porch, and Overall.Cond) show no statistically significant relationship with price at the 5% level.
Despite half of the variables being insignificant individually, the overall model is significant, suggesting potential multicollinearity or combined effects.
##Inference of the results I am 95% confident that the true population mean Price of houses in between $177,902.3 and $183,689.9 dollars.
#Possible improvement to Project Keep raw data, scripts, and outputs in clearly defined folders to maintain clarity. Also, Utilize functions and scripts to streamline repetitive tasks, reducing manual intervention.
TA Abiodun Joseph helped me during this DAP.