Notes while reading the wikipedia article on Student's t-test:
library(ggplot2)
library(reshape2)
set.seed(481)
coin <- sample(c(0, 1), size = 1) # flip a coin to determine if sample has zero mean
sample <- data.frame(value = rnorm(100, mean = coin, sd = 1))
ggplot(data = sample, aes(x = value)) + geom_histogram(binwidth = 0.5)
Now we have a sample that comes from a population with mean \( 0 \) or \( 1 \), where we don't know the mean (though the fact of the matter is stored in the variable coin).
A reasonable guess for the mean of the population the sample is from is the mean of the sample -0.0364.
The Null Hypothesis in the test will be that the population mean is \( 0 \).
The following histograms give a good indication how much the reliability of using the sample mean as an estimator for the population mean depends on the size of the sample.
sample.draw <- function(sample.no = 1000, sample.size = 100) {
rep.exper <- replicate(n = sample.no, expr = mean(rnorm(sample.size, mean = 0,
sd = 1)))
ggplot(data = data.frame(x = rep.exper), aes(x = x)) + geom_histogram(binwidth = 0.005) +
xlim(-0.1, 0.1)
}
sample.draw(sample.size = 100)
sample.draw(sample.size = 10000)
sample.draw(sample.size = 1e+05)
coin <- sample(c(0, 1), size = 1) # flip a coin to determine the means are equal
samples <- data.frame(sample = as.factor(c(rep(1, 100), rep(2, 100))), value = c(rnorm(100,
mean = 0, sd = 1), rnorm(100, mean = coin, sd = 1)))
ggplot(samples, aes(x = value, fill = sample)) + geom_histogram(binwidth = 0.2,
alpha = 0.7, position = "identity")
The Null Hypothesis is that the means of the populations are equal.
coin <- sample(c(0, 1), size = 1) # flip a coin to determine the means are equal
first <- runif(100, min = 0, max = 20)
second <- first + rnorm(100, mean = coin, sd = 1)
samples <- data.frame(first, second)
ggplot(melt(samples), aes(x = value, fill = variable)) + geom_histogram(binwidth = 1,
alpha = 0.7, position = "identity")
## Using as id variables
coin <- sample(c(0, 1), size = 1) # flip a coin to determine the means are equal
x <- seq(from = 0, to = 4, by = 0.2)
y <- coin * x + rnorm(length(x), mean = 1, sd = 2)
sample <- data.frame(x, y)
line <- lm(y ~ x, sample)
ggplot(sample, aes(x = x, y = y)) + geom_point() + geom_smooth(method = "lm",
se = FALSE)