library(readxl)
library(mosaic)

Part 1

Introduction


One of the biggest purchases we make in our lives is a home. As we buy a home we ask ourselves many questions such as:

How much should I spend for a home?

How many bathrooms are there?

What is the cost per square foot?

Suppose you are looking for a house near Charleston in Mount Pleasant, SC, and you have narrowed your search to three subdivisions: Carolina Park, Dunes West, and Park West.

Download the Mount Pleasant Real Estate data set from Canvas and import the data into R.

library(readxl)

load("Mount_Pleasant_Real_Estate_Data.RData")


1. For the variable “List Price”, calculate the sample mean, the sample standard deviation, and the sample size for the three different subdivisions.

Carolina<-subset(Mount_Pleasant_Real_Estate_Data,Subdivision == "Carolina Park")
mean(Carolina$List.Price)
## [1] 560014.9
sd(Carolina$List.Price)
## [1] 94018.37
nrow(Carolina)
## [1] 49
Dunes<-subset(Mount_Pleasant_Real_Estate_Data,Subdivision == "Dunes West")
mean(Dunes$List.Price)
## [1] 734867.6
sd(Dunes$List.Price)
## [1] 362152
nrow(Dunes)
## [1] 100
Park<-subset(Mount_Pleasant_Real_Estate_Data,Subdivision == "Park West")
mean(Park$List.Price)
## [1] 573756.3
sd(Park$List.Price)
## [1] 303392.1
nrow(Park)
## [1] 96

Summary:

Subdivision Mean of List Price Standard Deviation of List Price Sample Size
Carolina Park 560014.9 94018.37 49
Dunes West 734867.6 362152 100
Park West 573756.3 303392.1 96

2. Based on the data set and the information we have, which confidence interval should we use here, a z or a t interval? Why?

We should use a z interval in this situation because all of the standard deviations are know and available.


3. Construct an interval to estimate the true average “List Price” for each subdivision with 95% confidence. Based on these confidence intervals, is it possible that Carolina Park and Dunes West have the same average List Price? How about Carolina Park and Park West? Discuss.

t.test(~List.Price,data = Carolina,conf.level=0.95)
## 
##  One Sample t-test
## 
## data:  List.Price
## t = 41.695, df = 48, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  533009.6 587020.1
## sample estimates:
## mean of x 
##  560014.9
t.test(~List.Price,data = Dunes,conf.level=0.95)
## 
##  One Sample t-test
## 
## data:  List.Price
## t = 20.292, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  663008.8 806726.4
## sample estimates:
## mean of x 
##  734867.6
t.test(~List.Price,data = Park,conf.level=0.95)
## 
##  One Sample t-test
## 
## data:  List.Price
## t = 18.529, df = 95, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  512283.4 635229.3
## sample estimates:
## mean of x 
##  573756.3

It is not possible that Carolina Park and Dunes West have the same average price because the confidence intervals do not intersect and the mean values are very distant from each other. It is possible for Carolina Park and Park West to have the same average list price because the confidence intervals do overlap.


4. Do you think a List Price of $520,000 is a reasonable value for the Carolina Park subdivision? Why?

No, becuase $520,000 is a low value when observing the Carolina Park subdivision and is not in the confidence interval.


5.Do you think a List Price of $670,000 is a reasonable value for the Dunes West subdivision? Why?

Yes, 670,000 is a reasonable value for the Dunes West subdivision because the value is relatively average for the subdivision and it is in the confidence interval.


6. Do you think a List Price of $568,000 is a reasonable value for both the Carolina Park and Park West subdivisions? Why?

Yes, I do believe 568,000 is a reasonable value for both the Carolina Park and Park West subdivisions because the value is within the confidence intervals of both subdivisions.


Part 2

Introduction

Download the Employee Satisfaction data set from Canvas and import the data into R. These are survey data obtained by the human resources department of a large company.

If the level of employee satisfaction drops below 0.60 overall, then there is a belief that there may be a serious problem with morale in that department. There have been rumors that the Human Resources department (hr in the data file) may be having just that issue. Using R, test to determine if the mean employee satisfaction level in the Human Resources department is less than 0.60. Note that you first need to subset the data set according to the “department” variable and only consider the data from the Human Resources department.

library(readxl)
load("Employee_Satisfaction.RData")

1. Is “satisfaction_level” a qualitative or quantitative variable?

The satisfaction level is a quantitative variable.


2.Graph the employee satifaction level for the Human Resources deparment with an appropriate graph and calculate statistics appropriate for this type of data.

hr<-subset(Employee_Satisfaction,department=="hr")
gf_histogram(~satisfaction_level,data = hr,binwidth = .1,breaks =seq(0,1,by=.1), color = "blue",fill = "black",ylab = "number of employees in each satisfaction level",title = "A Histogram for the amount of employees in specific levels of satisfaction" )

summary(hr $satisfaction_level)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0900  0.4300  0.6100  0.5988  0.8050  1.0000
sd(hr $satisfaction_level)
## [1] 0.247929

3. Conduct the appropriate hypothesis test using the following steps.

(a) Determine the null and alternative hypotheses. Use a significance level of \(\alpha=0.05.\)

The null hypothesis is mu=.60. The alternative hypothesis is mu<.60


(b) Validate the assumptions of the hypothesis test you plan to use.

You can do the T test because the sample size of 739 is greater than 30

nrow(hr)
## [1] 739

(c) Use R to determine the test statistics and the P-value.
t.test(hr$satisfaction_level,mu=.60, alternative =c("less"))
## 
##  One Sample t-test
## 
## data:  hr$satisfaction_level
## t = -0.13057, df = 738, p-value = 0.4481
## alternative hypothesis: true mean is less than 0.6
## 95 percent confidence interval:
##       -Inf 0.6138295
## sample estimates:
## mean of x 
## 0.5988092

Insert your code here

The test statistics is -0.13057, and the P-value is 0.4481.


(d) Make a decision to reject or fail to reject \(H_0\).

We choose to reject \(H_0\)


(d) State the conclusion in terms of the original problem.

After testing to determine if the mean employee satisfaction level in the Human Resources department is less than 0.60, we have found out that the satisfaction level in the Human Resources department is actually less than 0.60.


4. Based on our conclusion from the previous step, what type of error could we have just made (Type I or Type II)? State the practical implications of this error.

We could have just made a Type I error.The prectical implications of this error is that we have rejected the null hypothesis when \(H_0\) is actually true.

5. Would it be appropriate to compare this test to a confidence interval for the mean? Why or why not?

No, because the confidence interval gives you a range of the mean value and the hypothesis test gives you a certain direct answer for whether the mean is what we believe it to be( Whether Greater than, Less than, equal to, not equal to).