1.Introduction

by Refering to the “HomesForSale” data from https://www.lock5stat.com/datapage3e.html. We are going to explore Homes in California by answering the following questions:

-1.using the data only for California, how much does the size of a home influence its price. -2.using the data only for California, how does the number of bedrooms of a home influence its price -3.using the data only for California, how does the number of bathrooms of a home influence its price -4.using the data only for California, how does the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price. -5.Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

2.data

loading the lessR package

library(lessR)
## 
## lessR 4.3.8                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Many examples of reading, writing, and manipulating data, 
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
##   Enter: browseVignettes("lessR")
## 
## View lessR updates, now including time series forecasting
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()
## 
## Attaching package: 'lessR'
## The following object is masked from 'package:base':
## 
##     sort_by

reading the data

homes = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(homes)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

reading california data

california_data = subset(homes, State == "CA")
head(california_data)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

3. Analysis

Q1:using the data only for California, how much does the size of a home influence its price.

question1 = lm(Price ~ Size, data = california_data)
summary(question1)
## 
## Call:
## lm(formula = Price ~ Size, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634
plot(california_data$Size, california_data$Price, main = " price vs size", xlab = "size", ylab = " price")
abline(question1, col = "blue", lwd=2)

interpretation: Slope Estimate = 0.33919: For every 1,000 sq. ft. increase in size, the price increases by $339.19 on average.

Q2:using the data only for California, how does the number of bedrooms of a home influence its price

question1 = lm(Price ~ Beds, data = california_data)
summary(question1)
## 
## Call:
## lm(formula = Price ~ Beds, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548
plot(california_data$Beds, california_data$Price, main = " price vs NumBeds", xlab = "NumBeds", ylab = " price")
abline(question1, col = "blue", lwd=2)

interpretation: p-value = 0.2548: the effect of the number of number of bedroom is statistically significant.

Q3:using the data only for California, how does the number of bathrooms of a home influence its price

question1 = lm(Price ~ Baths, data = california_data)
summary(question1)
## 
## Call:
## lm(formula = Price ~ Baths, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092
plot(california_data$Baths, california_data$Price, main = " price vs NumBaths", xlab = "NumBaths", ylab = " price")
abline(question1, col = "blue", lwd=2)

interpretation: p-value = 0.004092: the effect of the number of number of bathrooms is statistically significant.

Q4:using the data only for California, how does the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price.

question1 = lm(Price ~ Size + Beds + Baths, data = california_data)
summary(question1)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

Q5:Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

anova_model = aov(Price ~ State, data = homes)
summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4. summary

in conclusion, by Refering to the “HomesForSale” data from https://www.lock5stat.com/datapage3e.html. We are able to explore Homes in California by answering the questions listed in the introduction section:

5. references

Dr.Shiju Zhang,project 3 exploring Homes in CA, NJ, NY, and PA https://stcloudstate.learn.minnstate.edu/d2l/lms/dropbox/user/folder_submit_files.d2l?db=14485775&grpid=0&isprv =0&bp=0&ou=6740825