Set up Rstudio

Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.

Z scores (Critical values)

To get critical values of the z-score in R for one-tailed and two-tailed tests, you can use the qnorm() function.

For a one-tailed test with a significance level of α, the critical value is the z-score that corresponds to a cumulative probability of 1-α in the standard normal distribution.

For example, to get the critical value of the z-score for a one-tailed test with a significance level of 0.05 (i.e., α=0.05) in R, you can use the following code:

One tail

# One-tailed test with significance level of 0.05 (to the right)
critical_value <- qnorm(1-0.05, lower.tail=TRUE)
critical_value
[1] 1.644854
# One-tailed test with significance level of 0.05 (to the left)
critical_value <- qnorm(0.05, lower.tail=TRUE)
critical_value
[1] -1.644854
## Alternatively
critical_value <- qnorm(1-0.05, lower.tail=FALSE)
critical_value
[1] -1.644854

Two tail

# Two-tailed test with significance level of 0.05
critical_value1 <- qnorm(0.025, lower.tail=TRUE)
critical_value2 <- qnorm(0.025, lower.tail=FALSE)

critical_value1
[1] -1.959964
critical_value2
[1] 1.959964

Calculating Z-score standardized (For proportions)

p1 = 0.232 
n1 = 3000
p2 = 0.267
n2 = 3000

## p_hat = (x1 + x2) / (n1 + n2) 
p_hat = (0.232*3000 + 0.267*3000) / (3000 + 3000)

z = (p1 - p2) / sqrt(p_hat * (1 - p_hat) * (1/n1 + 1/n2))
z
[1] -3.132586

Calculating Z-score standardized (For difference in means)

x <- c(34,45,30,50,36,45)
y <- c(30,33,40,34,40,54)
# Calculate the z-score standardized for the difference in means
z_score <- (mean(x) - mean(y)) / sqrt(var(x)/length(x) + var(y)/length(y))
z_score
[1] 0.317524

Z-score for one sample

x_bar <- 45
mu <- 48
n <- 50
sig <- 12


zscore <- ((x_bar - mu)/(sig/sqrt(n)))
zscore
[1] -1.767767

getting the p-value for zsore

# One-tailed test with z-score of 1.5
p_value <- pnorm(1.5, lower.tail=FALSE)
p_value
[1] 0.0668072
# Two-tailed test with z-score of -2.0
p_value <- 2*pnorm(-2.0, lower.tail=TRUE)
p_value
[1] 0.04550026

Critical value for t-tes

To get the critical values for a t-test in R, we can use the qt() function. The qt() function takes three arguments: p, df, and lower.tail. The p argument specifies the probability of the tail, df specifies the degrees of freedom, and lower.tail specifies whether to calculate the critical value for the lower tail or upper tail of the t-distribution.

If we want to find the t critical value for a left-tailed test with a significance level of .05 and degrees of freedom = 22, we can use the following example code

#find t critical value
qt(p=.05, df=22, lower.tail=TRUE)
[1] -1.717144

The t critical value for a left-tailed test is -1.7171. Similarly, if we want to find the t critical value for a right-tailed test with a significance level of .05 and degrees of freedom = 22, we can use the following example code

#find t critical value
qt(p=.05, df=22, lower.tail=FALSE)
[1] 1.717144

If we want to find the t critical values for a two-tailed test with a significance level of .05 and degrees of freedom = 22, we can use the following example code

#find two-tailed t critical values
qt(p=.05/2, df=22, lower.tail=FALSE)
[1] 2.073873

P-value for t-score

For example, if we want to find the p-value associated with a t-score of -0.77 and df = 15 in a left-tailed hypothesis test, we can use the following code:

#find p-value
pt(q=-.77, df=15, lower.tail=TRUE)
[1] 0.2266283
# find two-tailed p-value
p_value <- 2 * pt(q = abs(2.5), df = 10, lower.tail = FALSE)
p_value
[1] 0.03144684

Empirical Example

A researcher wants to test whether there is a significant difference in the mean scores of two groups of students (Group A and Group B) on a math test. The mean score for Group A is 85 with a standard deviation of 6, while the mean score for Group B is 90 with a standard deviation of 7. The sample size for both groups is 30. The researcher decides to use a significance level of α = 0.05.

What is the calculated t-value and p-value for this experiment?

x1 = 85
x2 = 90
s1 = 6
s2 = 7
n1 = 30 
n2 = 30

Get the t-statistics

t = (x1 - x2) / (s1 /sqrt(n1)) + (s1/sqrt(n2))
t
[1] -3.46891

Get the critical value (two tailed test)

qt(p=.05/2, df=58, lower.tail=FALSE)
[1] 2.001717

get the p-value

p_value <- 2 * pt(q = abs(-3.46891), df = 58, lower.tail = FALSE)
p_value
[1] 0.0009920902

This brings us to the same decision.

Manual calculation of dependent t-test

# Create a vector of IDs
id <- 1:20

# Create a vector of weights before
weight_before <- c(60, 65, 70, 72, 68, 74, 80, 76, 62, 68, 75, 79, 72, 70, 67, 63, 65, 71, 73, 69)

# Create a vector of weights after
weight_after <- c(61, 64, 72, 70, 70, 75, 81, 78, 63, 67, 74, 80, 73, 71, 68, 64, 66, 72, 75, 71)

# Create a data frame with the three vectors
weight_data <- data.frame(id, weight_before, weight_after)
head(weight_data,5)
  id weight_before weight_after
1  1            60           61
2  2            65           64
3  3            70           72
4  4            72           70
5  5            68           70
attach(weight_data)

To calculate the dependent t-test manually, you can follow these steps:

Calculate the differences between the two samples:

weight_data$dif <-weight_before-weight_after
head(weight_data,5)
  id weight_before weight_after dif
1  1            60           61  -1
2  2            65           64   1
3  3            70           72  -2
4  4            72           70   2
5  5            68           70  -2

Calculate the mean and standard deviation of the differences:

mean_dif <- mean(weight_data$dif)
sd_dif <- sd(weight_data$dif)

Calculate the standard error of the mean difference:

se_dif <- sd_dif / sqrt(length(weight_data$dif))
se_dif
[1] 0.2575185

Calculate the t-statistic:

t_stat <- mean_dif / se_dif
t_stat
[1] -3.106573

Calculate the degrees of freedom:

dof <- length(weight_data$dif) - 1
dof
[1] 19

Get the critical value (two tailed test)

qt(p=.05/2, df=19, lower.tail=FALSE)
[1] 2.093024

Calculate the p-value using the cumulative distribution function of the t-distribution:

# find two-tailed p-value
p_value <- 2 * pt(q = abs(-3.106573), df = 19, lower.tail = FALSE)
p_value
[1] 0.005809366

Calculate the confidence interval:

lower <- mean_dif - qt(0.025, df = dof) * se_dif
upper <- mean_dif + qt(0.025, df = dof) * se_dif

lower
[1] -0.2610075
upper
[1] -1.338992