| iris |
|---|
| is data set gives the measurements in centimeters of the variables sepal length, sepal width, petal length and petal width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. |
data("iris")
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# ? iris
| ToothGrowth |
|---|
| oothGrowth data set contains the result from an experiment studying the effect of vitamin C on tooth growth in 60 Guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC). |
len: Tooth length
supp: Supplement type (VC or OJ).
dose: numeric Dose in milligrams/day
data("ToothGrowth")
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
| PlantGrowth |
|---|
| esults obtained from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment condition. |
data("PlantGrowth")
head(PlantGrowth)
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
| Many More Real Datasets in R |
|---|
| The R Datasets Package |
| ———————– |
This package contains a variety of datasets. For a complete list, use library(help = “datasets”).
Author(s) R Core Team and contributors worldwide Reference:
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
#install.packages(datasets)
==================================================================
| One Sample T-test |
|---|
| he One-Sample T-Test is used to test the statistical difference between a sample mean and a known or assumed/hypothesized value of the mean in the population. |
| For performing a one-sample t-test in R, we would use the syntax t.test(y, mu = 0) where x is the name of the variable of interest and mu is set equal to the mean specified by the null hypothesis |
sweetSold <- c(rnorm(50, mean = 10, sd = 3))
t.test(sweetSold, mu = 15) # Ho: mu = 150
##
## One Sample t-test
##
## data: sweetSold
## t = -11.362, df = 49, p-value = 2.458e-15
## alternative hypothesis: true mean is not equal to 15
## 95 percent confidence interval:
## 9.217681 10.955745
## sample estimates:
## mean of x
## 10.08671
It is used to help us to understand that the difference between the two means is real or simply by chance. The general form of the test is t.test(y1, y2, paired=FALSE). By default, R assumes that the variances of y1 and y2 are unequal, thus defaulting to Welch’s test. To bypass this, we use the flag var.equal=TRUE.
shopOne <- rnorm(50, mean = 140, sd = 4.5)
shopTwo <- rnorm(50, mean = 150, sd = 4)
t.test(shopOne, shopTwo, var.equal = TRUE)
##
## Two Sample t-test
##
## data: shopOne and shopTwo
## t = -13.206, df = 98, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.258761 -8.317024
## sample estimates:
## mean of x mean of y
## 140.4616 150.2495
Paired Sample T-test ———————
This is a statistical procedure that is used to determine whether the mean difference between two sets of observations is zero. In a paired sample t-test, each subject is measured two times, resulting in pairs of observations.
The test is run using the syntax t.test(y1, y2, paired=TRUE)
sweetOne <- c(rnorm(100, mean = 14, sd = 0.3))
sweetTwo <- c(rnorm(100, mean = 13, sd = 0.2))
t.test(sweetOne, sweetTwo, paired = TRUE)
##
## Paired t-test
##
## data: sweetOne and sweetTwo
## t = 28.778, df = 99, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.9656321 1.1086519
## sample estimates:
## mean of the differences
## 1.037142
set.seed(101)
my_data <- data.frame(
name = paste0(rep("M_", 10), 1:10),
weight = round(rnorm(10, 20, 2), 1)
)
summary(my_data$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.70 19.65 20.50 20.48 21.18 22.30
library(ggpubr)
## Loading required package: ggplot2
ggboxplot(my_data$weight,
ylab = "Weight (g)", xlab = FALSE,
ggtheme = theme_minimal())
Shapiro-Wilk normality test and to look at the normality plot.
Shapiro-Wilk test: Null hypothesis: the data are normally distributed Alternative hypothesis: the data are not normally distributed
shapiro.test(my_data$weight)
##
## Shapiro-Wilk normality test
##
## data: my_data$weight
## W = 0.98074, p-value = 0.969
Visual inspection of the data normality using Q-Q plots (quantile-quantile plots). Q-Q plot draws the correlation between a given sample and the normal distribution.
library("ggpubr")
ggqqplot(my_data$weight, ylab = "Men's weight",
ggtheme = theme_minimal())
# One-sample t-test
res <- t.test(my_data$weight, mu = 25)
# Printing the results
res
##
## One Sample t-test
##
## data: my_data$weight
## t = -12.496, df = 9, p-value = 5.45e-07
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
## 19.66172 21.29828
## sample estimates:
## mean of x
## 20.48
t.test(my_data$weight, mu = 25,
alternative = "less")
##
## One Sample t-test
##
## data: my_data$weight
## t = -12.496, df = 9, p-value = 2.725e-07
## alternative hypothesis: true mean is less than 25
## 95 percent confidence interval:
## -Inf 21.14308
## sample estimates:
## mean of x
## 20.48
t.test(my_data$weight, mu = 25,
alternative = "greater")
##
## One Sample t-test
##
## data: my_data$weight
## t = -12.496, df = 9, p-value = 1
## alternative hypothesis: true mean is greater than 25
## 95 percent confidence interval:
## 19.81692 Inf
## sample estimates:
## mean of x
## 20.48
If, The p-value of the test less than the significance level alpha = 0.05. We can conclude that the mean weight of the mice is significantly different from 25g .
========================================================
Simulated data Example: Paired T Test ————————————–
Suppose a training program was conducted to improve the participants’ knowledge of statistics. Data were collected from a selected sample of 10 individuals before and after the statistics training program. Test the hypothesis that the training is effective to improve the participants’ knowledge of ICT at 95% level of significance.
H0: there is no difference in participants’ knowledge before and after the ICT training
H1: ICT training affected the participant’s knowledge
We shall test this hypothesis against the alternative hypothesis.
Statistics training data ————————
Let’s create this data set in R. First, create before and after as objects containing the scores of statistics training.
before <- c(11.2, 14.5, 12.4, 12.2, 12.7, 10.4, 15.8, 13.8, 8.5, 14.1)
after <- c(12.5, 15.3, 13.6, 12.7, 13.9, 11.3, 16.6, 14.5, 8.2, 14.7)
stdata <- data.frame(subject = rep(c(1:10), 2),
time = rep(c("before", "after"), each = 10),
score = c(before, after))
print(stdata)
## subject time score
## 1 1 before 11.2
## 2 2 before 14.5
## 3 3 before 12.4
## 4 4 before 12.2
## 5 5 before 12.7
## 6 6 before 10.4
## 7 7 before 15.8
## 8 8 before 13.8
## 9 9 before 8.5
## 10 10 before 14.1
## 11 1 after 12.5
## 12 2 after 15.3
## 13 3 after 13.6
## 14 4 after 12.7
## 15 5 after 13.9
## 16 6 after 11.3
## 17 7 after 16.6
## 18 8 after 14.5
## 19 9 after 8.2
## 20 10 after 14.7
Visualization of the Data:
par(pty = "s")
boxplot(stdata$score ~ stdata$time)
Association or correlation test:
Now let’s see the association or correlation between the paired samples. Use cor.test() function to test this association. Type the components of the time variable in x and y arguments. In the method argument, we shall use Pearson which is the most commonly used method. Let’s test this relationship at 0.95 confidence level.
library(stats)
cor.test(x = before, y = after,
method = c("pearson"),
conf.level = 0.95)
##
## Pearson's product-moment correlation
##
## data: before and after
## t = 15.187, df = 8, p-value = 3.5e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9277019 0.9961329
## sample estimates:
## cor
## 0.9830963
Interpretation of the Reult:
The results show that there is a significantly strong relationship between before and after training scores. The correlation coefficient is 0.936 which is very close to one. It reflects a strong positive relationship or association in before and after ICT training scores.
======================================================== Paired sample t-test ——————–
t.test(formula = stdata$score ~ stdata$time,
alternative = "greater",
mu = 0,
paired = TRUE,
var.equal = TRUE,
conf.level = 0.95)
##
## Paired t-test
##
## data: stdata$score by stdata$time
## t = 5.2705, df = 9, p-value = 0.0002568
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.502187 Inf
## sample estimates:
## mean of the differences
## 0.77
Interpretation of the result:
The results showed that the probability value is lower than 0.05. Lower the P-value, lower the evidence we have to support the null hypothesis. Based on this result, we shall reject the null hypothesis of no difference. It means ICT training significantly improved the participants’ knowledge.
EXRCISES / H.W
-----------------
For the above real-life datasets, Perform Hypothesis Test of significnce and confidence Intervals for single mean and difference of Two means and Paired Tests in R.
| References: |
http://www.sthda.com/english/wiki/one-sample-t-test-in-r
https://www.r-bloggers.com/2021/10/paired-sample-t-test-using-r/
http://www.sthda.com/english/wiki/f-test-compare-two-variances-in-r
https://www.r-bloggers.com/2021/10/paired-sample-t-test-using-r/
https://www.geeksforgeeks.org/t-test-approach-in-r-programming/
http://www.sthda.com/english/wiki/r-built-in-data-sets
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html