Student's t-test

Notes while reading the wikipedia article on Student's t-test:

library(ggplot2)
library(reshape2)
set.seed(481)

Uses

One sample location test

coin <- sample(c(0, 1), size = 1)  # flip a coin to determine if sample has zero mean
sample <- data.frame(value = rnorm(100, mean = coin, sd = 1))
ggplot(data = sample, aes(x = value)) + geom_histogram(binwidth = 0.5)

plot of chunk unnamed-chunk-1

Now we have a sample that comes from a population with mean \( 0 \) or \( 1 \), where we don't know the mean (though the fact of the matter is stored in the variable coin).

A reasonable guess for the mean of the population the sample is from is the mean of the sample -0.0364.

The Null Hypothesis in the test will be that the population mean is \( 0 \).

The following histograms give a good indication how much the reliability of using the sample mean as an estimator for the population mean depends on the size of the sample.

sample.draw <- function(sample.no = 1000, sample.size = 100) {
    rep.exper <- replicate(n = sample.no, expr = mean(rnorm(sample.size, mean = 0, 
        sd = 1)))
    ggplot(data = data.frame(x = rep.exper), aes(x = x)) + geom_histogram(binwidth = 0.005) + 
        xlim(-0.1, 0.1)
}
sample.draw(sample.size = 100)

plot of chunk unnamed-chunk-2

sample.draw(sample.size = 10000)

plot of chunk unnamed-chunk-2

sample.draw(sample.size = 1e+05)

plot of chunk unnamed-chunk-2

Two sample location test

coin <- sample(c(0, 1), size = 1)  # flip a coin to determine the means are equal
samples <- data.frame(sample = as.factor(c(rep(1, 100), rep(2, 100))), value = c(rnorm(100, 
    mean = 0, sd = 1), rnorm(100, mean = coin, sd = 1)))
ggplot(samples, aes(x = value, fill = sample)) + geom_histogram(binwidth = 0.2, 
    alpha = 0.7, position = "identity")

plot of chunk unnamed-chunk-3

The Null Hypothesis is that the means of the populations are equal.

Paired Difference Test

coin <- sample(c(0, 1), size = 1)  # flip a coin to determine the means are equal
first <- runif(100, min = 0, max = 20)
second <- first + rnorm(100, mean = coin, sd = 1)
samples <- data.frame(first, second)
ggplot(melt(samples), aes(x = value, fill = variable)) + geom_histogram(binwidth = 1, 
    alpha = 0.7, position = "identity")

## Using as id variables

plot of chunk unnamed-chunk-4

Slope of regression line

coin <- sample(c(0, 1), size = 1)  # flip a coin to determine the means are equal
x <- seq(from = 0, to = 4, by = 0.2)
y <- coin * x + rnorm(length(x), mean = 1, sd = 2)
sample <- data.frame(x, y)
line <- lm(y ~ x, sample)
ggplot(sample, aes(x = x, y = y)) + geom_point() + geom_smooth(method = "lm", 
    se = FALSE)

plot of chunk unnamed-chunk-5