Statistical Inference

1 sample tests

Tests for 1 Mean

Exercise 1. Known variance, small sample.

Assume that you would like to test whether proper amount of Cola is inside of Coca-Cola 500ml. bottles. For this purpose, you bought 10 cola bottles and measure the amount of cola inside of those cans. You are given information that standard deviation of the production lane is 10ml Would you say that mean value of amount of cola from your sample is 500 ml with (alpha=0.05) level of significance?

Please notice that in this case we know what is the sigma for the population - it is a very rare case, almost theoretical. That’s why we will use the so called Z-test/t-test with known variance here:

## [1] -0.3794733
## [1] 0.3565676

Exercise 2. Unknown variance, small sample.

It is known that the average life time of Philips LEDs is 5000 hours. Your claim is that life time of Philips LEDs is less than 5000 hours. For this purpose, you bought 21 Philips LEDs and switch them on and observe the time when each of them stops working. Please test this claim with alpha = 0.05 level of significance.

## [1] 4965.238

## 
##  Shapiro-Wilk normality test
## 
## data:  leds
## W = 0.9772, p-value = 0.8803
## 
##  One Sample t-test
## 
## data:  leds
## t = -0.6114, df = 20, p-value = 0.2739
## alternative hypothesis: true mean is less than 5000
## 95 percent confidence interval:
##    -Inf 5063.3
## sample estimates:
## mean of x 
##  4965.238
## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The One Sample t-test testing the difference between leds (mean = 4965.24) and
## mu = 5000 suggests that the effect is negative, statistically not significant,
## and very small (difference = -34.76, 95% CI [-Inf, 5063.30], t(20) = -0.61, p =
## 0.274; Cohen's d = -0.13, 95% CI [-Inf, 0.23])

What is the final conclusion here???

Let’s try to perform the power analysis.

For 1-sample t-tests, use the following function:

pwr.t.test(n = , d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “paired”))

d is the effect size (Cohen’s d) - difference between the means divided by the pooled standard deviation (observed-hypothetical).

std.deviation <- sd(leds)
d <- (mean(leds)-5000)/std.deviation
p<-pwr.t.test(n=21, d=d, sig.level=0.05,type="one.sample",alternative="less")
plot(p)

Ok, we may see the really poor power of this test. So what we need to change to increase the power significantly (i.e. 90%)?

p<-pwr.t.test(power=0.9, d=d, sig.level=0.05,type="one.sample",alternative="less")
plot(p)

If you would like to have some impressive graphical summary of statistical tests - you may use the ‘ggstatsplot’ package. It is the whole collection of tests and plots.

See more here: https://indrajeetpatil.github.io/ggstatsplot/

For 1 sample tests - only test for 1 mean is available here:

leds<-as.data.frame(leds) # must be a df object!

gghistostats(
  data       = leds,
  x          = leds,
  title      = "Life time of Philips LEDs",
  test.value = 5000,
  type = "parametric",
  conf.level=0.95
)

Our p-value = 0.2739, ggstatsplot does not allow to use alternative=“less”!!!

Nonparametric approach

In case of small sample, unknown variance and quite skewed / not normal distribution of the variable (qqplots and Shapiro tests suggest rejection of normality) - we should use the less powerful substitute of t-tests: Wilcoxon Rank-based Test for 1 sample mean.

# t-test
t.test(leds, mu=5000, conf.level = 0.95,alternative = "less")
## 
##  One Sample t-test
## 
## data:  leds
## t = -0.6114, df = 20, p-value = 0.2739
## alternative hypothesis: true mean is less than 5000
## 95 percent confidence interval:
##    -Inf 5063.3
## sample estimates:
## mean of x 
##  4965.238
# Wilcoxon test
wilcox.test(leds$leds,alternative="less",mu=5000,conf.level=0.95,exact=FALSE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  leds$leds
## V = 98.5, p-value = 0.2831
## alternative hypothesis: true location is less than 5000

As you can see in our case both tests give the same results (our distribution of “leds” was close to normal, that’s why).

leds<-as.data.frame(leds) # must be a df object!

gghistostats(
  data       = leds,
  x          = leds,
  title      = "Life time of Philips LEDs",
  test.value = 5000,
  type = "nonparametric",
  conf.level=0.95
)

Our p-value = 0.2831, ggstatsplot does not allow to use alternative=“less”!!!

Your turn

The dataset contains data of a pizza delivery service in London, delivering pizzas to three areas. Every record defines one order/delivery and the according properties. A pizza is supposed to taste good, if its temperature is high enough, say 45 Celsius. So it might be interesting for the pizza delivery service to minimize the delivery time.

Please verify if the mean delivery time for medium quality pizzas is significantly lower than 25 minutes. Assume 0.01 level of significance. Check if the power of this test is satisfactory.

library(tigerstats)
## Ładowanie wymaganego pakietu: abd
## Ładowanie wymaganego pakietu: nlme
## 
## Dołączanie pakietu: 'nlme'
## Następujący obiekt został zakryty z 'package:dplyr':
## 
##     collapse
## Ładowanie wymaganego pakietu: lattice
## Ładowanie wymaganego pakietu: grid
## Ładowanie wymaganego pakietu: mosaic
## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Dołączanie pakietu: 'mosaic'
## Następujący obiekt został zakryty z 'package:Matrix':
## 
##     mean
## Następujący obiekt został zakryty z 'package:ggplot2':
## 
##     stat
## Następujące obiekty zostały zakryte z 'package:dplyr':
## 
##     count, do, tally
## Następujący obiekt został zakryty z 'package:DescTools':
## 
##     MAD
## Następujące obiekty zostały zakryte z 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## Następujące obiekty zostały zakryte z 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
## Welcome to tigerstats!
## To learn more about this package, consult its website:
##  http://homerhanumat.github.io/tigerstats
data(d.pizza)
help(d.pizza)
## uruchamianie serwera httpd dla pomocy ...
##  wykonano
signif.level = 0.01
mu = mean(d.pizza$delivery_min, na.rm = TRUE)
sd = sd(d.pizza$delivery_min, na.rm = TRUE)
d <- (mean(d.pizza$delivery_min)-25)/sd

#H0 mu = 25
#Ha mu < 25

qqnorm(d.pizza$delivery_min)
qqline(d.pizza$delivery_min)

# check normality - Shapiro Test, what to do if not normal?
# t-test
results<-t.test(d.pizza$delivery_min, mu=25, conf.level = 0.99,alternative = "less")
results
## 
##  One Sample t-test
## 
## data:  d.pizza$delivery_min
## t = 2.0933, df = 1208, p-value = 0.9817
## alternative hypothesis: true mean is less than 25
## 99 percent confidence interval:
##      -Inf 26.37918
## sample estimates:
## mean of x 
##  25.65277
report(results)
## Effect sizes were labelled following Cohen's (1988) recommendations.
## 
## The One Sample t-test testing the difference between d.pizza$delivery_min (mean
## = 25.65) and mu = 25 suggests that the effect is positive, statistically not
## significant, and very small (difference = 0.65, 99% CI [-Inf, 26.38], t(1208) =
## 2.09, p = 0.982; Cohen's d = 0.06, 99% CI [-Inf, 0.13])
p<-pwr.t.test(n = length(d.pizza$delivery_min), d=d, sig.level=0.01,type="one.sample",alternative="less")
plot(p)

Tests for 1 Variance and SD

1-sample variance and SD tests are not so popular. There is no built-in base-R function for that. We will use TeachingDemos packages with sigma.test for 1 SD.

?sigma.test
students<-c(12,15,4,9,6,7,11,10,13,10)
sd(students)
## [1] 3.335
sigma.test(students,sigma=3,alternative="less",conf.level=0.99)
## 
##  One sample Chi-squared test for variance
## 
## data:  students
## X-squared = 11.122, df = 9, p-value = 0.7326
## alternative hypothesis: true variance is less than 9
## 99 percent confidence interval:
##   0.00000 47.94289
## sample estimates:
## var of students 
##        11.12222

Tests for 1 Proportion

Exercise 3.

In case of large samples (np>5) we may use Z-tests for testing 1 proportions, but there is no built-in function for that (we may write our own…).

An auditor for the Online Service wants to examine its special two-hour priority order delivery to determine the proportion of the orders that actually arrive within the promised two-hour period. A randomly selected sample of 1500 such orders is found to contain 1150 that were delivered on time.

Does the sample data provide evidence to conclude that the percentage of on-time orders is more than 75%? Test at 5% level of significance.

prop.test(1150,1500,0.75,alternative='greater',conf.level=0.95)
## 
##  1-sample proportions test with continuity correction
## 
## data:  1150 out of 1500
## X-squared = 2.1342, df = 1, p-value = 0.07202
## alternative hypothesis: true p is greater than 0.75
## 95 percent confidence interval:
##  0.7478919 1.0000000
## sample estimates:
##         p 
## 0.7666667

If the sample is just too small (np<5) please use the original binomial distribution instead:

binom.test(1150,1500,p=0.75,alternative="greater",0.95)
## 
## 
## 
## data:  1150 out of 1500
## number of successes = 1150, number of trials = 1500, p-value = 0.07122
## alternative hypothesis: true probability of success is greater than 0.75
## 95 percent confidence interval:
##  0.7479911 1.0000000
## sample estimates:
## probability of success 
##              0.7666667

For 1 proportion tests there is a power function available:

pwr.p.test(h = NULL, n = NULL, sig.level = 0.05, power = NULL, alternative = c(“two.sided”,“less”,“greater”))

These calculations use arcsine transformation of the proportion (see Cohen (1988))

Exactly one of the parameters ‘h’,‘n’,‘power’ and ‘sig.level’ must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it.

h<-ES.h(1150/1500,0.75) #Compute effect size h for two proportions
h
## [1] 0.03893747
p<-pwr.p.test(h=h,n=1500,sig.level=0.05,alternative="greater")
plot(p)

How many orders we should control to increase the power to 90%?

p<-pwr.p.test(h=h,power=0.9,sig.level=0.05,alternative="greater")
plot(p)

Your turn

The dataset contains data of a pizza delivery service in London, delivering pizzas to three areas. Every record defines one order/delivery and the according properties. A pizza is supposed to taste good, if its temperature is high enough, say 45 Celsius. So it might be interesting for the pizza delivery service to minimize the delivery time.

Please test the claim if the proportion of the high quality pizzas (when delivered) is significantly different from 0.5. Assume significance level = 0.05.

data(d.pizza)
help(d.pizza)
d.pizza %>%
 group_by(quality)%>%
  summarize(counts = n()) 
## # A tibble: 4 × 2
##   quality counts
##   <ord>    <int>
## 1 low        156
## 2 medium     356
## 3 high       496
## 4 <NA>       201
prop.test(496,512,0.5,alternative='greater',conf.level=0.95)
## 
##  1-sample proportions test with continuity correction
## 
## data:  496 out of 512
## X-squared = 448.13, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.9522768 1.0000000
## sample estimates:
##       p 
## 0.96875
h <- ES.h(496/512, 0.5)
p<-pwr.p.test(h=h,n=1008,sig.level=0.05,alternative="greater")
plot(p)

Please write down your conclusions here. Perform the power analysis for this test and provide a sample size for a power = 90% if not achieved here.