When \(\alpha\) is not given, use the p-value approach to make your conclusions. When it’s difficult to conclude, use \(\alpha\) = 0.05. For two-sample problems, use the F-test to decide which t-test to use.
- For this question, I will be using the One Sample T-test because it tests the mean of a single group against a known mean. In this question, 2160 is the mean of 20 selected light bulb and 2000 is the known population mean. I am not using Z-score because it requires the known information on the population mean, population standard deviation, the sample mean and the sample size. I don’t have any information on the population standard deviation; therefore I chose T-test. - Let we set H0: mu < 2000 (Which means the average life of the light bulbs produced by the factory is less than 2000 hours), and Ha: mu >= 2000 (Manufacturer’s claim: Which means the average life of the light bulbs produced in the factory is at least 2000 hours). Note, this is a one-sided t-test with \(\alpha\) = 0.05, and degree of freedom: n-1 = 20-1 = 19. According to the information given, the sample mean is 2160, and the sample standard deviation is 142. We use t-statistics to conduct the statistical inference.
T-stat approach: - The critical value for \(\alpha\) = 0.05 at df = 19 is 1.729 according to the t-distribution table. As 5.04 > 1.729, meaning that the sample mean (2160) falls into the rejection area. Therefore, we will reject the null in favor of the manufacturer’s claim.
P-value approach:: - Since our t-stat is 5.04. With 19 degrees of freedom, we know that the p-value for a significance level of 0.0001 is 4.590. Therefore, we know that the p-value for 5.04 is even smaller than 0.0001, which confirms the rejection of the null in favor of the manufacturer’s claim.
The firm is interested in switching to the new process only if it can be demonstrated convincingly that the new process reduces the defect rate. Is there significant evidence of that? Use \(\alpha\) = 5%; assume that the collected data represent two random samples from Normal distributions. Use the method of testing that is appropriate for this situation.
# p: The significance level to use
# df: The degrees of freedom
# lower.tail: If TRUE, the probability to the left of p in the t distribution is returned.
# If FALSE, the probability to the right is returned. Default is TRUE.
qt(0.05, 32, lower.tail = FALSE)
## [1] 1.693889
# read the dataset
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.6
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
homeSales <- read_csv("./data/HOME_SALES(1).csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## ID = col_double(),
## SALES_PRICE = col_double(),
## FINISHED_AREA = col_double(),
## BEDROOMS = col_double(),
## BATHROOMS = col_double(),
## GARAGE_SIZE = col_double(),
## YEAR_BUILT = col_double(),
## STYLE = col_double(),
## LOT_SIZE = col_double(),
## AIR_CONDITIONER = col_character(),
## POOL = col_character(),
## QUALITY = col_character(),
## HIGHWAY = col_character()
## )
attach(homeSales)
var.test(x=SALES_PRICE[AIR_CONDITIONER == "YES"], y = SALES_PRICE[AIR_CONDITIONER == "NO"])
##
## F test to compare two variances
##
## data: SALES_PRICE[AIR_CONDITIONER == "YES"] and SALES_PRICE[AIR_CONDITIONER == "NO"]
## F = 3.749, num df = 433, denom df = 87, p-value = 1.017e-11
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 2.654162 5.108169
## sample estimates:
## ratio of variances
## 3.748998
# use the two-sided t.test function in R
toutA <- t.test(x = homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "YES"],
y = homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "NO"],
alternative = "two.sided")
toutA
##
## Welch Two Sample t-test
##
## data: homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "YES"] and homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "NO"]
## t = 10.304, df = 241.5, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 85.91273 126.52249
## sample estimates:
## mean of x mean of y
## 295.8006 189.5830
# use the one-sided t.test function in R
toutB <- t.test(x = homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "YES"],
y = homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "NO"],
alternative = "greater")
toutB
##
## Welch Two Sample t-test
##
## data: homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "YES"] and homeSales$SALES_PRICE[homeSales$AIR_CONDITIONER == "NO"]
## t = 10.304, df = 241.5, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 89.19732 Inf
## sample estimates:
## mean of x mean of y
## 295.8006 189.5830
# use the one-sided t.test function in R
toutC <- t.test(x = homeSales$FINISHED_AREA[homeSales$AIR_CONDITIONER == "YES"],
y = homeSales$FINISHED_AREA[homeSales$AIR_CONDITIONER == "NO"],
alternative = "greater")
toutC
##
## Welch Two Sample t-test
##
## data: homeSales$FINISHED_AREA[homeSales$AIR_CONDITIONER == "YES"] and homeSales$FINISHED_AREA[homeSales$AIR_CONDITIONER == "NO"]
## t = 7.756, df = 160.14, p-value = 4.817e-13
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 399.9772 Inf
## sample estimates:
## mean of x mean of y
## 2346.339 1837.909
attach(homeSales)
## The following objects are masked from homeSales (pos = 3):
##
## AIR_CONDITIONER, BATHROOMS, BEDROOMS, FINISHED_AREA, GARAGE_SIZE,
## HIGHWAY, ID, LOT_SIZE, POOL, QUALITY, SALES_PRICE, STYLE,
## YEAR_BUILT
var.test(x=SALES_PRICE[HIGHWAY == "YES"], y=SALES_PRICE[HIGHWAY == "NO"])
##
## F test to compare two variances
##
## data: SALES_PRICE[HIGHWAY == "YES"] and SALES_PRICE[HIGHWAY == "NO"]
## F = 0.37562, num df = 10, denom df = 510, p-value = 0.08588
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1811525 1.1622045
## sample estimates:
## ratio of variances
## 0.3756203
# use the two-sided t.test function in R
toutD <- t.test(x = homeSales$SALES_PRICE[homeSales$HIGHWAY == "YES"],
y = homeSales$SALES_PRICE[homeSales$HIGHWAY == "NO"],
alternative = "two.sided")
toutD
##
## Welch Two Sample t-test
##
## data: homeSales$SALES_PRICE[homeSales$HIGHWAY == "YES"] and homeSales$SALES_PRICE[homeSales$HIGHWAY == "NO"]
## t = -1.8552, df = 11.178, p-value = 0.09011
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -106.795497 9.000943
## sample estimates:
## mean of x mean of y
## 230.0273 278.9245
# use the one-sided t.test function in R
toutE <- t.test(x = homeSales$SALES_PRICE[homeSales$HIGHWAY == "YES"],
y = homeSales$SALES_PRICE[homeSales$HIGHWAY == "NO"],
alternative = "less")
toutE
##
## Welch Two Sample t-test
##
## data: homeSales$SALES_PRICE[homeSales$HIGHWAY == "YES"] and homeSales$SALES_PRICE[homeSales$HIGHWAY == "NO"]
## t = -1.8552, df = 11.178, p-value = 0.04506
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -1.632504
## sample estimates:
## mean of x mean of y
## 230.0273 278.9245
# use the one-sided t.test function in R
toutF <- t.test(x = homeSales$SALES_PRICE[homeSales$HIGHWAY == "YES"],
y = homeSales$SALES_PRICE[homeSales$HIGHWAY == "NO"],
alternative = "greater")
toutF
##
## Welch Two Sample t-test
##
## data: homeSales$SALES_PRICE[homeSales$HIGHWAY == "YES"] and homeSales$SALES_PRICE[homeSales$HIGHWAY == "NO"]
## t = -1.8552, df = 11.178, p-value = 0.9549
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -96.16205 Inf
## sample estimates:
## mean of x mean of y
## 230.0273 278.9245
Fortunately, I did a similar data EDA project with R Shiny APP for the STAT-613 class, please refer to this for more details.