An R Learner’s Diary: Estimation and Hypothesis Testing

Confidence Intervals

Let’s generate some fake data and create confidence interval estimates for the population mean.

set.seed(12345)
x <- rnorm(100)
(avg_x <- mean(x))

## [1] 0.2451972

(n_x  <- length(x))

## [1] 100

(sd_x <- sd(x))

## [1] 1.114731

(se_x <- sd_x/sqrt(n_x))

## [1] 0.1114731

alpha5 <- 0.05        # Significance
(t5_x <- qt(1-(alpha5/2), n_x - 1))   # t value for 95 percent confidence interval

## [1] 1.984217

(CI5_x <- c( avg_x - t5_x*se_x, avg_x + t5_x*se_x ))   # 95 percent Confidence interval

## [1] 0.02401042 0.46638398

(CI5b_x <- avg_x + t5_x * c(-se_x,+se_x))    # Same as above; more concise

## [1] 0.02401042 0.46638398

alpha1 <- 0.01        # Significance
(t1_x <- qt(1-(alpha1/2), n_x - 1))   # t value for 99 percent confidence interval

## [1] 2.626405

(CI1_x <- c( avg_x - t1_x*se_x, avg_x + t1_x*se_x ))   # 99 percent Confidence interval

## [1] -0.04757632  0.53797071

One could use the above procedure to generate the confidence interval estimate for \(\mu \equiv \mu_1-\mu_2\) by taking two equal-length samples from populations 1 and 2 — \(x_1\) and \(x_2\) — and defining \(x\equiv x_1-x_2\) for the example above.

Hypothesis Testing

We want to test the null hypthesys \(H_0: \mu=\mu_0\) against three alternative hypotheses:

\(H_1\) two-tailed: \(\mu\neq\mu_0\)
\(H_1\) one-tailed, greater than: \(\mu>\mu_0\)
\(H_1\) one-tailed, less than: \(\mu<\mu_0\)

mu0 <- 0     # Hypothesized value of the population mean
(tstat_x <- (avg_x-mu0)/se_x)    # t-statistic obtained from the data

## [1] 2.199609

# degrees of freedom = n-1:
(df_x <- n_x - 1)

## [1] 99

Two-tailed test

Keep in mind that the critical value of the t-statistic would be the same for:

a two-tailed test with ten percent significance, and
a one-tailed test with five percent significance

alpha <- 0.05
(critical_two <- qt(1 - (alpha/2), df_x))

## [1] 1.984217

abs(tstat_x) > critical_two  # Reject Null Hypothesis if TRUE

## [1] TRUE

Here’s the short-cut method:

t.test(x, mu = mu0, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  x
## t = 2.1996, df = 99, p-value = 0.03016
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.02401042 0.46638398
## sample estimates:
## mean of x 
## 0.2451972

Note that the automatic t-test command prints out the p-value for the data. The p-value can then be readily compared to the significance level. The null hypothesis is then rejected if the p-value is less than the significance level.

Here’s a way to calculate the p-value:

(p <- 2 * (1 - pt(abs(tstat_x), n_x - 1)))

## [1] 0.03016145

Also, note the 95 percent confidence interval generated by the t.test() command. It is the same as the 95 percent confidence interval that was generated laboriously earlier on this page.

The output produced by the t.test() command can be extracted individual component by individual component as follows:

t2tail <- t.test(x, mu = mu0, conf.level = 0.95)
names(t2tail)

## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"

t2tail$statistic

##        t 
## 2.199609

t2tail$p.value

## [1] 0.03016145

t2tail$conf.int

## [1] 0.02401042 0.46638398
## attr(,"conf.level")
## [1] 0.95

t2tail$null.value

## mean 
##    0

This could be useful in writing reports.

One-tailed test; greater than

(critical_one <- qt(1 - alpha, df_x))

## [1] 1.660391

tstat_x > critical_one  # Reject Null Hypothesis if TRUE

## [1] TRUE

Here’s the short-cut method:

t.test(x, mu = mu0, alternative = "greater", conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  x
## t = 2.1996, df = 99, p-value = 0.01508
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  0.06010828        Inf
## sample estimates:
## mean of x 
## 0.2451972

As before, the automatic t-test command prints out the p-value for the data. The p-value can then be readily compared to the significance level. The null hypothesis is then rejected if the p-value is less than the significance level.

Here’s a way to calculate the p-value:

(p <- 1 - pt(abs(tstat_x), n_x - 1))

## [1] 0.01508072

One-tailed test; less than

tstat_x < -critical_one  # Reject Null Hypothesis if TRUE

## [1] FALSE

Here’s the short-cut method:

t.test(x, mu = mu0, alternative = "less", conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  x
## t = 2.1996, df = 99, p-value = 0.9849
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
##       -Inf 0.4302861
## sample estimates:
## mean of x 
## 0.2451972

Again, the automatic t-test command prints out the p-value for the data.

Here’s a way to calculate the p-value:

(p <- pt(abs(tstat_x), n_x - 1))

## [1] 0.9849193

An R Learner’s Diary: Estimation and Hypothesis Testing

Udayan Roy

February 18, 2019

Confidence Intervals

Hypothesis Testing

Two-tailed test

One-tailed test; greater than

One-tailed test; less than