by Refering to the “HomesForSale” data from https://www.lock5stat.com/datapage3e.html. We are going to explore Homes in California by answering the following questions:
-1.using the data only for California, how much does the size of a home influence its price. -2.using the data only for California, how does the number of bedrooms of a home influence its price -3.using the data only for California, how does the number of bathrooms of a home influence its price -4.using the data only for California, how does the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price. -5.Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.
library(lessR)
##
## lessR 4.3.8 feedback: gerbing@pdx.edu
## --------------------------------------------------------------
## > d <- Read("") Read text, Excel, SPSS, SAS, or R data file
## d is default data frame, data= in analysis routines optional
##
## Many examples of reading, writing, and manipulating data,
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
## Enter: browseVignettes("lessR")
##
## View lessR updates, now including time series forecasting
## Enter: news(package="lessR")
##
## Interactive data analysis
## Enter: interact()
##
## Attaching package: 'lessR'
## The following object is masked from 'package:base':
##
## sort_by
homes = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(homes)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
california_data = subset(homes, State == "CA")
head(california_data)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
question1 = lm(Price ~ Size, data = california_data)
summary(question1)
##
## Call:
## lm(formula = Price ~ Size, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
plot(california_data$Size, california_data$Price, main = " price vs size", xlab = "size", ylab = " price")
abline(question1, col = "blue", lwd=2)
interpretation: Slope Estimate = 0.33919: For every 1,000 sq. ft.
increase in size, the price increases by $339.19 on average.
question1 = lm(Price ~ Beds, data = california_data)
summary(question1)
##
## Call:
## lm(formula = Price ~ Beds, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
plot(california_data$Beds, california_data$Price, main = " price vs NumBeds", xlab = "NumBeds", ylab = " price")
abline(question1, col = "blue", lwd=2)
interpretation: p-value = 0.2548: the effect of the number of number of bedroom is statistically significant.
question1 = lm(Price ~ Baths, data = california_data)
summary(question1)
##
## Call:
## lm(formula = Price ~ Baths, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
plot(california_data$Baths, california_data$Price, main = " price vs NumBaths", xlab = "NumBaths", ylab = " price")
abline(question1, col = "blue", lwd=2)
interpretation: p-value = 0.004092: the effect of the number of number
of bathrooms is statistically significant.
question1 = lm(Price ~ Size + Beds + Baths, data = california_data)
summary(question1)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
anova_model = aov(Price ~ State, data = homes)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
in conclusion, by Refering to the “HomesForSale” data from https://www.lock5stat.com/datapage3e.html. We are able to explore Homes in California by answering the questions listed in the introduction section:
Dr.Shiju Zhang,project 3 exploring Homes in CA, NJ, NY, and PA https://stcloudstate.learn.minnstate.edu/d2l/lms/dropbox/user/folder_submit_files.d2l?db=14485775&grpid=0&isprv =0&bp=0&ou=6740825