library(readxl)
library(mosaic)
One of the biggest purchases we make in our lives is a home. As we buy a home we ask ourselves many questions such as:
How much should I spend for a home?
How many bathrooms are there?
What is the cost per square foot?
Suppose you are looking for a house near Charleston in Mount Pleasant, SC, and you have narrowed your search to three subdivisions: Carolina Park, Dunes West, and Park West.
Download the Mount Pleasant Real Estate data set from Canvas and import the data into R.
library(readxl)
load("Mount_Pleasant_Real_Estate_Data.RData")
Carolina<-subset(Mount_Pleasant_Real_Estate_Data,Subdivision == "Carolina Park")
mean(Carolina$List.Price)
## [1] 560014.9
sd(Carolina$List.Price)
## [1] 94018.37
nrow(Carolina)
## [1] 49
Dunes<-subset(Mount_Pleasant_Real_Estate_Data,Subdivision == "Dunes West")
mean(Dunes$List.Price)
## [1] 734867.6
sd(Dunes$List.Price)
## [1] 362152
nrow(Dunes)
## [1] 100
Park<-subset(Mount_Pleasant_Real_Estate_Data,Subdivision == "Park West")
mean(Park$List.Price)
## [1] 573756.3
sd(Park$List.Price)
## [1] 303392.1
nrow(Park)
## [1] 96
Summary:
| Subdivision | Mean of List Price | Standard Deviation of List Price | Sample Size |
|---|---|---|---|
| Carolina Park | 560014.9 | 94018.37 | 49 |
| Dunes West | 734867.6 | 362152 | 100 |
| Park West | 573756.3 | 303392.1 | 96 |
We should use a z interval in this situation because all of the standard deviations are know and available.
t.test(~List.Price,data = Carolina,conf.level=0.95)
##
## One Sample t-test
##
## data: List.Price
## t = 41.695, df = 48, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 533009.6 587020.1
## sample estimates:
## mean of x
## 560014.9
t.test(~List.Price,data = Dunes,conf.level=0.95)
##
## One Sample t-test
##
## data: List.Price
## t = 20.292, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 663008.8 806726.4
## sample estimates:
## mean of x
## 734867.6
t.test(~List.Price,data = Park,conf.level=0.95)
##
## One Sample t-test
##
## data: List.Price
## t = 18.529, df = 95, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 512283.4 635229.3
## sample estimates:
## mean of x
## 573756.3
It is not possible that Carolina Park and Dunes West have the same average price because the confidence intervals do not intersect and the mean values are very distant from each other. It is possible for Carolina Park and Park West to have the same average list price because the confidence intervals do overlap.
No, becuase $520,000 is a low value when observing the Carolina Park subdivision and is not in the confidence interval.
Yes, 670,000 is a reasonable value for the Dunes West subdivision because the value is relatively average for the subdivision and it is in the confidence interval.
Yes, I do believe 568,000 is a reasonable value for both the Carolina Park and Park West subdivisions because the value is within the confidence intervals of both subdivisions.
Download the Employee Satisfaction data set from Canvas and import the data into R. These are survey data obtained by the human resources department of a large company.
If the level of employee satisfaction drops below 0.60 overall, then there is a belief that there may be a serious problem with morale in that department. There have been rumors that the Human Resources department (hr in the data file) may be having just that issue. Using R, test to determine if the mean employee satisfaction level in the Human Resources department is less than 0.60. Note that you first need to subset the data set according to the “department” variable and only consider the data from the Human Resources department.
library(readxl)
load("Employee_Satisfaction.RData")
The satisfaction level is a quantitative variable.
hr<-subset(Employee_Satisfaction,department=="hr")
gf_histogram(~satisfaction_level,data = hr,binwidth = .1,breaks =seq(0,1,by=.1), color = "blue",fill = "black",ylab = "number of employees in each satisfaction level",title = "A Histogram for the amount of employees in specific levels of satisfaction" )
summary(hr $satisfaction_level)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0900 0.4300 0.6100 0.5988 0.8050 1.0000
sd(hr $satisfaction_level)
## [1] 0.247929
The null hypothesis is mu=.60. The alternative hypothesis is mu<.60
You can do the T test because the sample size of 739 is greater than 30
nrow(hr)
## [1] 739
t.test(hr$satisfaction_level,mu=.60, alternative =c("less"))
##
## One Sample t-test
##
## data: hr$satisfaction_level
## t = -0.13057, df = 738, p-value = 0.4481
## alternative hypothesis: true mean is less than 0.6
## 95 percent confidence interval:
## -Inf 0.6138295
## sample estimates:
## mean of x
## 0.5988092
Insert your code here
The test statistics is -0.13057, and the P-value is 0.4481.
We choose to reject \(H_0\)
After testing to determine if the mean employee satisfaction level in the Human Resources department is less than 0.60, we have found out that the satisfaction level in the Human Resources department is actually less than 0.60.
We could have just made a Type I error.The prectical implications of this error is that we have rejected the null hypothesis when \(H_0\) is actually true.
No, because the confidence interval gives you a range of the mean value and the hypothesis test gives you a certain direct answer for whether the mean is what we believe it to be( Whether Greater than, Less than, equal to, not equal to).