Install the necessary packages:
library(devtools)
library(dplyr)
library(statsr)
library(gtools)
library(plotly)First, let us load the data:
load("ames_train.RData")Misc.Feature, Fence, Pool.QC
Misc.Feature, Alley, Pool.QC
Pool.QC, Alley, Fence
Fireplace.Qu, Pool.QC, Lot.Frontage
# type your code for Question 1 here, and Knit
na_count <-data.frame(sapply(ames_train, function(y) sum(length(which(is.na(y))))))int? Change them to factors when conducting your analysis.
# type your code for Question 2 here, and Knit
str(ames_train)## tibble [1,000 × 81] (S3: tbl_df/tbl/data.frame)
## $ PID : int [1:1000] 909176150 905476230 911128020 535377150 534177230 908128060 902135020 528228540 923426010 908186050 ...
## $ area : int [1:1000] 856 1049 1001 1039 1665 1922 936 1246 889 1072 ...
## $ price : int [1:1000] 126000 139500 124900 114000 227000 198500 93000 187687 137500 140000 ...
## $ MS.SubClass : int [1:1000] 30 120 30 70 60 85 20 20 20 180 ...
## $ MS.Zoning : Factor w/ 7 levels "A (agr)","C (all)",..: 6 6 2 6 6 6 7 6 6 7 ...
## $ Lot.Frontage : int [1:1000] NA 42 60 80 70 64 60 53 74 35 ...
## $ Lot.Area : int [1:1000] 7890 4235 6060 8146 8400 7301 6000 3710 12395 3675 ...
## $ Street : Factor w/ 2 levels "Grvl","Pave": 2 2 2 2 2 2 2 2 2 2 ...
## $ Alley : Factor w/ 2 levels "Grvl","Pave": NA NA NA NA NA NA 2 NA NA NA ...
## $ Lot.Shape : Factor w/ 4 levels "IR1","IR2","IR3",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Land.Contour : Factor w/ 4 levels "Bnk","HLS","Low",..: 4 4 4 4 4 4 1 4 4 4 ...
## $ Utilities : Factor w/ 3 levels "AllPub","NoSeWa",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Lot.Config : Factor w/ 5 levels "Corner","CulDSac",..: 1 5 5 1 5 1 5 5 1 5 ...
## $ Land.Slope : Factor w/ 3 levels "Gtl","Mod","Sev": 1 1 1 1 1 1 2 1 1 1 ...
## $ Neighborhood : Factor w/ 28 levels "Blmngtn","Blueste",..: 26 8 12 21 20 8 21 1 15 8 ...
## $ Condition.1 : Factor w/ 9 levels "Artery","Feedr",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Condition.2 : Factor w/ 8 levels "Artery","Feedr",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Bldg.Type : Factor w/ 5 levels "1Fam","2fmCon",..: 1 5 1 1 1 1 2 1 1 5 ...
## $ House.Style : Factor w/ 8 levels "1.5Fin","1.5Unf",..: 3 3 3 6 6 7 3 3 3 7 ...
## $ Overall.Qual : int [1:1000] 6 5 5 4 8 7 4 7 5 6 ...
## $ Overall.Cond : int [1:1000] 6 5 9 8 6 5 4 5 6 5 ...
## $ Year.Built : int [1:1000] 1939 1984 1930 1900 2001 2003 1953 2007 1984 2005 ...
## $ Year.Remod.Add : int [1:1000] 1950 1984 2007 2003 2001 2003 1953 2008 1984 2005 ...
## $ Roof.Style : Factor w/ 6 levels "Flat","Gable",..: 2 2 4 2 2 2 2 2 2 2 ...
## $ Roof.Matl : Factor w/ 8 levels "ClyTile","CompShg",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Exterior.1st : Factor w/ 16 levels "AsbShng","AsphShn",..: 15 7 9 9 14 7 9 16 7 14 ...
## $ Exterior.2nd : Factor w/ 17 levels "AsbShng","AsphShn",..: 16 7 9 9 15 7 9 17 11 15 ...
## $ Mas.Vnr.Type : Factor w/ 6 levels "","BrkCmn","BrkFace",..: 5 3 5 5 5 3 5 3 5 6 ...
## $ Mas.Vnr.Area : int [1:1000] 0 149 0 0 0 500 0 20 0 76 ...
## $ Exter.Qual : Factor w/ 4 levels "Ex","Fa","Gd",..: 4 3 3 3 3 3 2 3 4 4 ...
## $ Exter.Cond : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 5 5 3 5 5 5 5 5 5 ...
## $ Foundation : Factor w/ 6 levels "BrkTil","CBlock",..: 2 2 1 1 3 4 2 3 2 3 ...
## $ Bsmt.Qual : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 4 6 3 4 NA 3 4 6 4 ...
## $ Bsmt.Cond : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 6 6 6 NA 6 6 6 6 ...
## $ Bsmt.Exposure : Factor w/ 5 levels "","Av","Gd","Mn",..: 5 4 5 5 5 NA 5 3 5 3 ...
## $ BsmtFin.Type.1 : Factor w/ 7 levels "","ALQ","BLQ",..: 6 4 2 7 4 NA 7 7 2 4 ...
## $ BsmtFin.SF.1 : int [1:1000] 238 552 737 0 643 0 0 0 647 467 ...
## $ BsmtFin.Type.2 : Factor w/ 7 levels "","ALQ","BLQ",..: 7 2 7 7 7 NA 7 7 7 7 ...
## $ BsmtFin.SF.2 : int [1:1000] 0 393 0 0 0 0 0 0 0 0 ...
## $ Bsmt.Unf.SF : int [1:1000] 618 104 100 405 167 0 936 1146 217 80 ...
## $ Total.Bsmt.SF : int [1:1000] 856 1049 837 405 810 0 936 1146 864 547 ...
## $ Heating : Factor w/ 6 levels "Floor","GasA",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Heating.QC : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 5 1 3 1 1 5 1 5 1 ...
## $ Central.Air : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 1 2 2 2 ...
## $ Electrical : Factor w/ 6 levels "","FuseA","FuseF",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ X1st.Flr.SF : int [1:1000] 856 1049 1001 717 810 495 936 1246 889 1072 ...
## $ X2nd.Flr.SF : int [1:1000] 0 0 0 322 855 1427 0 0 0 0 ...
## $ Low.Qual.Fin.SF: int [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
## $ Bsmt.Full.Bath : int [1:1000] 1 1 0 0 1 0 0 0 0 1 ...
## $ Bsmt.Half.Bath : int [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
## $ Full.Bath : int [1:1000] 1 2 1 1 2 3 1 2 1 1 ...
## $ Half.Bath : int [1:1000] 0 0 0 0 1 0 0 0 0 0 ...
## $ Bedroom.AbvGr : int [1:1000] 2 2 2 2 3 4 2 2 3 2 ...
## $ Kitchen.AbvGr : int [1:1000] 1 1 1 1 1 1 1 1 1 1 ...
## $ Kitchen.Qual : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 3 3 5 3 3 5 3 5 3 ...
## $ TotRms.AbvGrd : int [1:1000] 4 5 5 6 6 7 4 5 6 5 ...
## $ Functional : Factor w/ 8 levels "Maj1","Maj2",..: 8 8 8 8 8 8 4 8 8 8 ...
## $ Fireplaces : int [1:1000] 1 0 0 0 0 1 0 1 0 0 ...
## $ Fireplace.Qu : Factor w/ 5 levels "Ex","Fa","Gd",..: 3 NA NA NA NA 1 NA 3 NA NA ...
## $ Garage.Type : Factor w/ 6 levels "2Types","Attchd",..: 6 2 6 6 2 4 6 2 2 3 ...
## $ Garage.Yr.Blt : int [1:1000] 1939 1984 1930 1940 2001 2003 1974 2007 1984 2005 ...
## $ Garage.Finish : Factor w/ 4 levels "","Fin","RFn",..: 4 2 4 4 2 3 4 2 4 2 ...
## $ Garage.Cars : int [1:1000] 2 1 1 1 2 2 2 2 2 2 ...
## $ Garage.Area : int [1:1000] 399 266 216 281 528 672 576 428 484 525 ...
## $ Garage.Qual : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ Garage.Cond : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 5 6 6 6 6 6 6 6 ...
## $ Paved.Drive : Factor w/ 3 levels "N","P","Y": 3 3 1 1 3 3 3 3 3 3 ...
## $ Wood.Deck.SF : int [1:1000] 0 0 154 0 0 0 0 100 0 0 ...
## $ Open.Porch.SF : int [1:1000] 0 105 0 0 45 0 32 24 0 44 ...
## $ Enclosed.Porch : int [1:1000] 0 0 42 168 0 177 112 0 0 0 ...
## $ X3Ssn.Porch : int [1:1000] 0 0 86 0 0 0 0 0 0 0 ...
## $ Screen.Porch : int [1:1000] 166 0 0 111 0 0 0 0 0 0 ...
## $ Pool.Area : int [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pool.QC : Factor w/ 4 levels "Ex","Fa","Gd",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Fence : Factor w/ 4 levels "GdPrv","GdWo",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Misc.Feature : Factor w/ 5 levels "Elev","Gar2",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Misc.Val : int [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
## $ Mo.Sold : int [1:1000] 3 2 11 5 11 7 2 3 4 5 ...
## $ Yr.Sold : int [1:1000] 2010 2009 2007 2009 2009 2009 2009 2008 2008 2007 ...
## $ Sale.Type : Factor w/ 10 levels "COD","Con","ConLD",..: 10 10 10 10 10 3 10 7 10 10 ...
## $ Sale.Condition : Factor w/ 6 levels "Abnorml","AdjLand",..: 5 5 5 5 5 5 5 6 5 5 ...
StoneBr
Timber
Veenker
NridgHt
# type your code for Question 3 here, and Knit
std_dev_neighborhood <- ames_train %>% group_by(Neighborhood) %>%
summarise(std_devprice=sd(price))price?
Lot.Area
Bedroom.AbvGr
Overall.Qual
Year.Built
# type your code for Question 4 here, and Knit
fig1 <- plot_ly(data=ames_train,x=~Lot.Area,y=~price,type="scatter",mode="markers")
fig1fig2 <- plot_ly(data=ames_train,x=~Bedroom.AbvGr,y=~price,type="scatter",mode="markers")
fig2fig3 <- plot_ly(data=ames_train,x=~Overall.Qual,y=~price,type="scatter",mode="markers")
fig3fig4 <- plot_ly(data=ames_train,x=~Year.Built,y=~price,type="scatter",mode="markers")
fig4price and area. Which of the following variable transformations makes the relationship appear to be the most linear?
price or area
price but not area
area but not price
price and area
# type your code for Question 5 here, and Knit
fig1 <- plot_ly(data=ames_train,x=~log(area),y=~log(price),type="scatter",mode="markers")
fig1# type your code for Question 6 here, and Knit
x <- nrow(ames_train[ames_train$Garage.Area>0,])
n <- 1000
alpha <- 9
beta <- 1
alpha+x## [1] 963
beta+n-x## [1] 47
# type your code for Question 7 here, and Knit
nrow(ames_train[ames_train$Year.Built>1999,])/nrow(ames_train) # not > 30%## [1] 0.272
median(ames_train$price)>mean(ames_train$price)## [1] FALSE
nrow(ames_train[ames_train$Street=="Grvl",])## [1] 3
# type your code for Question 8 here, and Knit
garage <- ames_train %>% filter(Garage.Area>0)
no_garage <- ames_train %>% filter(Garage.Area==0)
t.test(x=garage$area, y = no_garage$area,
alternative = "two.sided",
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95)##
## Welch Two Sample t-test
##
## data: garage$area and no_garage$area
## t = 5.134, df = 50.702, p-value = 4.535e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 211.4183 482.9963
## sample estimates:
## mean of x mean of y
## 1492.251 1145.043
\(Hint:\) Since the Gamma distribution is conjugate to the Poisson distribution, the posterior will be Gamma with parameter value \(k + \sum x_i\) and \(\theta/(n\theta+1)\), where \(k\) and \(\theta\) represent the parameters of the prior distribution. Based on the prior mean and standard deviation, elicit the prior values of \(k\) and \(\theta\).
This question refers to the following learning objective(s): Make inferences about data coming from a Poisson likelihood using a conjugate Gamma prior. Elicit prior beliefs about a parameter in terms of a Beta, Gamma, or Normal distribution.
# type your code for Question 9 here, and Knit
# First, find number of homes > 2000 sq ft.
nrow(ames_train[ames_train$area>2000,])## [1] 138
price) on \(\log\)(area), there are some outliers. Which of the following do the three most outlying points have in common?
# type your code for Question 10 here, and Knit
model <- lm(log(price)~log(area),data=ames_train)
ames_train$sq_residuals <- (residuals(model))^2
write.csv(ames_train, "data.csv")price if used as a dependent variable in a linear regression?
price is right-skewed.
price cannot take on negative values.
price can only take on integer values.# type your code for Question 11 here, and Knit
fig <- plot_ly(data=ames_train,x=~price,type="histogram",nbinsx=60)
figBldg.Type = 1Fam)
# type your code for Question 12 here, and Knit
sf_homes_neighborhood <- ames_train %>%
group_by(Neighborhood) %>% summarise(mean(Bldg.Type == "1Fam"))area) and the number of bedrooms above ground (Bedroom.AbvGr)?
# type your code for Question 13 here, and Knit
fig <- plot_ly(data=ames_train,x=~Bedroom.AbvGr,y=~log(area),type="scatter",
mode="markers")
figcor(ames_train$Bedroom.AbvGr, log(ames_train$area), method = "pearson")## [1] 0.5457625
# type your code for Question 14 here, and Knit
answer <- ames_train[complete.cases(ames_train$Bsmt.Unf.SF),] %>%
filter(Bsmt.Unf.SF!=0) %>% summarise(mean(Bsmt.Unf.SF))
answer## # A tibble: 1 x 1
## `mean(Bsmt.Unf.SF)`
## <dbl>
## 1 595.