Claim: Avg. house price is above $150K
Null Hypothesis: Avg. price is less than or equal to $150K(mu <= 150)
Alternate Hypothesis: Avg. price is above $150K (mu > 150)
houseprice = read.csv("C:\\Users\\Perumalsamy\\Downloads\\houseprices.csv")
xbar = (mean(houseprice$Price))/1000
xbar
## [1] 163.8621
s = (sd(houseprice$Price)/1000)
s
## [1] 67.65156
n = 1047
Here p-value is the probability of finding a sample of 163.9K or more when H0 is true(i.e. mu <= 150)
P(Xbar >= xbar ; mu <= 150)
First will find probability for P(Xbar >= 163.9 ; mu <= 150)
Z = (xbar-mu0)/(s/sqrt(n))
t = (xbar-150)/(s/sqrt(n))
t
## [1] 6.63018
t = 6.63 with df = 1046
pvalue = pt(6.63,1046, lower.tail = FALSE)
pvalue
## [1] 2.684535e-11
since the p-value is less than alpha value (0.05), we can reject the null hypothesis and accept the claim that the average house price in the area is above $150K.
Claim: Average Living area is more thn 1800 sq. Ft
Null Hypothesis: Average Living area is less than or equal to 1800 sq. Ft (mu <= 1800)
Alternate Hypothesis: Average Living area is greater than 1800 sq. ft (mu > 1800)
xbar = mean(houseprice$Living.Area)
s = sd(houseprice$Living.Area)
xbar
## [1] 1807.303
s
## [1] 641.4609
Here p-value is the probability of finding a sample of 1807.3 or more when H0 is true(i.e. mu <= 1800)
P(Xbar >= xbar ; mu <= 1800)
First will find probability for P(Xbar >= 1807.3 ; mu <= 1800)
Z = (xbar-mu0)/(s/sqrt(n))
t = (xbar-1800)/(s/sqrt(n))
t
## [1] 0.3683755
t = 0.368 with df = 1046
pvalue = pt(0.368,1046,lower.tail = FALSE)
pvalue
## [1] 0.3564738
As the p-value is greater than alpha value of 0.05, we stick with the null hypothesis and can reject the claim that the average living area in the area is greater than 1800 sq. Ft
houseprice_inK = houseprice
houseprice_inK$Price = houseprice_inK$Price/1000
boxplot(Price~Fireplace, data=houseprice_inK, xlab = "Fireplace Availability", ylab = "Houseprice in Thousands",names = c("Without Fireplace", "With Fireplace") ,main = "Fireplace availability vs Houseprices")
From the boxplot, it is visible that the average prices of house with Fireplace is higher than the houses that does not have fireplace. Also there are less outliers for houses with fireplaces compared with the houses that does not have fireplace.
Claim: Average house price of the houses with Fireplace (WF)is higher than the average houseprice of the houses without fireplace (WOF) i.e. Average houseprice WF - Average houseprice WOF > 0
Null Hypothesis: mu(WF) - mu(WOF) <= 0 Alternate Hypothesis: mu(WF) - mu(WOF) > 0
wfdata <- data.frame(subset(houseprice,Fireplace == 1))
wofdata <- data.frame(subset(houseprice, Fireplace == 0))
xbar_wf = mean(wfdata$Price)/1000
xbar_wof= mean(wofdata$Price)/1000
s_wf = sd(wfdata$Price)/1000
s_wof = sd(wofdata$Price)/1000
n_wf = nrow(wfdata)
n_wof = nrow(wofdata)
xbar_wf
## [1] 189.6378
xbar_wof
## [1] 126.2877
s_wf
## [1] 66.29643
s_wof
## [1] 49.66239
n_wf
## [1] 621
n_wof
## [1] 426
xbar = xbar_wf - xbar_wof
xbar
## [1] 63.35019
#P(xbar >= 63.35; (mu(wf) - mu(wof)) <= 0)
sp = sqrt((((n_wf-1)*s_wf*s_wf) + ((n_wof-1)*s_wof*s_wof))/(n_wf+n_wof-2))
sp
## [1] 60.08952
df = n_wf+n_wof-2
df
## [1] 1045
t = (xbar-0)/(sp*sqrt((1/n_wf)+(1/n_wof)))
t
## [1] 16.75816
t = 16.76 with df = 1045
pval = pt(16.76, 1045, lower.tail = FALSE)
pval
## [1] 2.532178e-56
Now, we can reject the null hypothesis with almost 0% chance of Type 1 error. i.e Our Claim is correct that the average house price of houses with fireplace is higher than the average houseprice of houses without fireplace
Claim: Average lot size of old houses are higher than the average lot size of the new houses.
Null Hypothesis: mu(oh) - mu(nh) <= 0 Alternate Hypothesis: mu(oh) - mu(nh) > 0
oldh_data = data.frame(subset(houseprice, Age>30))
newh_data = data.frame(subset(houseprice, Age<=30))
xbar_oh = mean(oldh_data$Lot.Size)
xbar_nh= mean(newh_data$Lot.Size)
s_oh = sd(oldh_data$Lot.Size)
s_nh = sd(newh_data$Lot.Size)
n_oh = nrow(oldh_data)
n_nh = nrow(newh_data)
xbar_oh
## [1] 0.5481788
xbar_nh
## [1] 0.578255
s_oh
## [1] 0.7249367
s_nh
## [1] 0.7986463
n_oh
## [1] 302
n_nh
## [1] 745
xbar = xbar_oh - xbar_nh
xbar
## [1] -0.03007623
#P(xbar >= -0.03; (mu(oh) - mu(nh)) <= 0)
f1 = (s_oh*s_oh)/n_oh
f2 =(s_nh*s_nh)/n_nh
f = f1+f2
t = (xbar-0)/(sqrt(f))
t
## [1] -0.5902598
df = (f*f)/(((f1*f1)/(n_oh-1))+((f2*f2)/(n_nh-1)))
df
## [1] 610.2757
t = -0.59 with df = 610
pval = pt(-0.59, 610, lower.tail = FALSE)
pval
## [1] 0.7222954
As the p-value is greater than alpha value of 0.05, we stick with the null hypothesis and can reject the claim that the old houses have larger lot sizes than new houses.
Claim: New houses have more fileplaces than old houses.
Null Hypothesis: pi(nh) - pi(oh) <= 0 Alternate Hypothesis: pi(nh) - pi(oh) > 0
n_oh = nrow(oldh_data)
n_nh = nrow(newh_data)
a = nrow(subset(oldh_data, Fireplace == 1))
pi_oh = a/n_oh
b = nrow(subset(newh_data, Fireplace == 1))
pi_nh = b/n_nh
n_oh
## [1] 302
n_nh
## [1] 745
pi_oh
## [1] 0.4470199
pi_nh
## [1] 0.652349
P = pi_nh-pi_oh
P
## [1] 0.2053291
#P(P >= 0.205; (pi(nh) - pi(oh)) <= 0)
#Pooled sample proportionn
pbar = ((pi_nh*n_nh)+(pi_oh*n_oh))/(n_nh+n_oh)
z = (pi_nh-pi_oh)/(sqrt(pbar*(1-pbar)*((1/n_nh)+(1/n_oh))))
pval = pnorm(6.12, lower.tail = FALSE)
pval
## [1] 4.678768e-10
Since the pval is lesser than the alpha value (0.1), we can reject the null hypothesis and confirm that the claim New houses have more fireplaces than old houses are true.
Claim: Prices of Small, Medium and Big houses are not same Null Hypothesis: Prices of small, medium and big houses are same Alternate Hypothesis: Prices of Small, Medium and Big houses are not same
Level of significance: 1% or 0.01
houseprice$Category[houseprice$Bedrooms==1 | houseprice$Bedrooms==2] <- 'Small'
houseprice$Category[houseprice$Bedrooms==3 | houseprice$Bedrooms==4] <- 'Medium'
houseprice$Category[houseprice$Bedrooms==5 | houseprice$Bedrooms==6] <- 'Big'
price_anova = aov(houseprice$Price~houseprice$Category)
summary(price_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## houseprice$Category 2 4.840e+11 2.420e+11 58.71 <2e-16 ***
## Residuals 1044 4.303e+12 4.122e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is less than the alpha value, we can reject the null hypothesis and conclude that Houseprices for small, medium, big houses are different.