You are investigating the proportion of customers satisfied with a product. From a sample of 100 customers, 70 express satisfaction. Calculate a 90% confidence interval for the true proportion of satisfied customers.
prop.test(n = 100, x = 70, conf.level = 0.90)
##
## 1-sample proportions test with continuity correction
##
## data: 70 out of 100, null probability 0.5
## X-squared = 15.21, df = 1, p-value = 9.619e-05
## alternative hypothesis: true p is not equal to 0.5
## 90 percent confidence interval:
## 0.6149607 0.7738142
## sample estimates:
## p
## 0.7
A 90% Confidence Interval is: 0.615 to 0.774, meaning that you are 95% confident that at least 61.5% and at most 77.4% of customers satisfied with the product.
In a study comparing the effectiveness of two advertising strategies, you collect individual responses from 80 people exposed to Strategy A and 90 people exposed to Strategy B. Among those exposed to Strategy A, 30 purchase the product, while among those exposed to Strategy B, 45 purchase the product. Test whether there is a significant difference in the proportion of people making a purchase between the two strategies.
n1 = 80
x1 = 30
n2 = 90
x2 = 45
prop.test(n = c(n1, n2), x = c(x1, x2), conf.level = 0.95)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(x1, x2) out of c(n1, n2)
## X-squared = 2.2011, df = 1, p-value = 0.1379
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.28487646 0.03487646
## sample estimates:
## prop 1 prop 2
## 0.375 0.500
95% Confidence Interval for the Difference in the proportion of people making a purchase between strategy A and strategy B is: -0.285 to 0.035
Since 0 is is in the interval, there is no evidence that there is a difference between the two strategies. The p-value 0.1379 (larger than commomly used significance levels such as 0.05) also suggests that there is NO Significant difference between the two strategies.
Suppose a pharmaceutical company has developed a new drug that they claim is effective in treating a particular condition. They claim that more than 70% of patients who take the drug will experience an improvement. To test this claim, they take a sample of 200 patients and find that 150 of them show improvement. Show the test details.
no_successes = 150 # Number of successes (e.g., patients with a specific condition)
sample_size = 200 # Sample size
# Null hypotheses
p0 = 0.70 # Null hypothesis H0: p = 0.70
# Perform the one-sample proportion test with alternative hypothesis Ha: p > 0.70
prop.test(n = sample_size, x = no_successes, p = p0, alternative = "greater")
##
## 1-sample proportions test with continuity correction
##
## data: no_successes out of sample_size, null probability p0
## X-squared = 2.1488, df = 1, p-value = 0.07134
## alternative hypothesis: true p is greater than 0.7
## 95 percent confidence interval:
## 0.6938964 1.0000000
## sample estimates:
## p
## 0.75
# Reject H0
The p-value (0.07134) suggests there is NO significant evidence provided by the data that more than 70% of patients who take the drug would experience an improvement.
A company has two sales teams, Team A and Team B, and they want to know if there is a significant difference in the success rates of closing deals between the two teams. Team A had 100 sales attempts with 60 successes, while Team B had 120 sales attempts with 72 successes. Show the test details.
# Sample data: Numbers of successes and sample sizes
no_successes_1 = 60 # Number of successes in sample 1
sample_size_1 = 100 # Sample size in sample 1
no_successes_2 = 72 # Number of successes in sample 2
sample_size_2 = 120 # Sample size in sample 2
# Perform the 2-sample proportion test with alternative hypothesis Ha: p1 not equal to p2
prop.test(n = c(sample_size_1, sample_size_2), x = c(no_successes_1, no_successes_2), alternative = "two.sided")
##
## 2-sample test for equality of proportions without continuity correction
##
## data: c(no_successes_1, no_successes_2) out of c(sample_size_1, sample_size_2)
## X-squared = 0, df = 1, p-value = 1
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.1300093 0.1300093
## sample estimates:
## prop 1 prop 2
## 0.6 0.6
# Don't reject H0
The p-value (1) indicates that there is no evidence that there is a significant difference in the success rates of closing deals between the two teams.
You want to estimate the average time students spend commuting to campus. From a random sample of 9 students, you collect the following data on their daily commuting times (in minutes):
[20, 25, 22, 18, 24, 21, 23, 19, 20].
Calculate a 95% confidence interval for the true mean commuting time.
time = c(20, 25, 22, 18, 24, 21, 23, 19, 20)
t.test(time, conf.level = 0.95)
##
## One Sample t-test
##
## data: time
## t = 27.29, df = 8, p-value = 3.504e-09
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 19.53065 23.13602
## sample estimates:
## mean of x
## 21.33333
95% Confidence Interval is 19.53 to 23.14 minutes.
In a study comparing the effectiveness of two diets on weight loss, you collect individual data on weight loss for each participant.
Diet X group: [2, 3, 1, 2, 1, 3, 2, 4, 1]
Diet Y group: [3, 2, 4, 1, 3, 2, 4, 2, 3]
Test whether there is a significant difference in the mean weight loss between the two diets.
x = c(2, 3, 1, 2, 1, 3, 2, 4, 1)
y = c(3, 2, 4, 1, 3, 2, 4, 2, 3)
t.test(x, y, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: x and y
## t = -1.1471, df = 15.956, p-value = 0.2683
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.5825037 0.4713926
## sample estimates:
## mean of x mean of y
## 2.111111 2.666667
There is no significant difference (p-value = 0.2683) in the mean weight loss between the two diets.
Also, 0 is in the 95% confidence interval, which shows no evidence of a difference in the mean weight loss between the two diets.
A company claims that the average response time for their customer service is less than 5 minutes. You collect individual data from a sample of 10 customer service responses, and the response times are:
[4, 5, 6, 4, 5, 6, 4, 5, 6, 4].
Test the company’s claim at a 5% significance level.
x = c(4, 5, 6, 4, 5, 6, 4, 5, 6, 4)
hypothesized_mean = 5
# Perform a one-sample t-test with alternative hypothesis Ha: mu < 5
t.test(x, mu = hypothesized_mean, alternative = "less")
##
## One Sample t-test
##
## data: x
## t = -0.36116, df = 9, p-value = 0.3632
## alternative hypothesis: true mean is less than 5
## 95 percent confidence interval:
## -Inf 5.407566
## sample estimates:
## mean of x
## 4.9
# Don't reject H0
The p-value (0.3632) suggests there is no evidence that the average response time for their customer service is less than 5 minutes.
Consider a hypothetical biological study where researchers are investigating the effect of a new drug on the average lifespan of two different species of laboratory mice. The study involves two independent groups: one group treated with the new drug (Group A) and another group receiving a placebo (Group B).
Group A (Drug Treatment):
Lifespans of mice: 800, 820, 810, 825, 830
Group B (Placebo):
Lifespans of mice: 790, 805, 800, 795, 810
Test whether there is a significant increase in the average lifespan of mice in the treatment group compared to the placebo group.
group1 = c(800, 820, 810, 825, 830)
group2 = c(790, 805, 800, 795, 810)
# Perform a two-sample t-test with alternative hypothesis Ha: mu1 > mu2
t.test(group1, group2, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: group1 and group2
## t = 2.6389, df = 6.908, p-value = 0.01694
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 4.770531 Inf
## sample estimates:
## mean of x mean of y
## 817 800
The p-value (0.0169) suggests that there is a significant increase in the average lifespan of mice in the treatment group compared to the placebo group.
Let’s say we are investigating the effectiveness of a training program designed to improve exam scores. We have a group of 10 students, and we measure their scores before and after the training. The data are
before_training: 75, 68, 82, 90, 78, 65, 88, 72, 95, 80
after_training: 82, 75, 88, 92, 85, 70, 92, 78, 98, 86
Test whether there is a significant difference in scores before and after training.
Give a 95% confidence interval for difference in scores before and after training (after - before).
before_training = c(75, 68, 82, 90, 78, 65, 88, 72, 95, 80)
after_training = c(82, 75, 88, 92, 85, 70, 92, 78, 98, 86)
d = after_training - before_training
t.test(d, conf.level = 0.95)
##
## One Sample t-test
##
## data: d
## t = 9.4851, df = 9, p-value = 5.546e-06
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 4.035978 6.564022
## sample estimates:
## mean of x
## 5.3
The p-value (almost 0) suggests that there is a significant difference in scores before and after training.
95% confidence interval for difference in scores before and after training is 4.03 to 6.56.
You are analyzing survey data to examine whether there is an association between occupation and preferred mode of transportation (Car, Public Transit, Bicycle). You collect data from 150 individuals, and the observed frequencies are as follows:
# Make a data matrix of 2 rows (occupations) and 3 columns (transportation mode)
chisq.test(x=matrix(c(50, 20, 30, 40, 10, 20), 2, 3))
##
## Pearson's Chi-squared test
##
## data: matrix(c(50, 20, 30, 40, 10, 20), 2, 3)
## X-squared = 17.09, df = 2, p-value = 0.0001945
The small p-value (0.00019) suggests a strong evidence that there is an association between occupation and method of transport
Regarding favorite ice cream flavors, you expect 50% of all individuals like vanilla, 30% like chocolate, and 20% like strawberry (the instructor added this missing info). You collect data on favorite ice cream flavors in a group of 60 individuals. The observed frequencies are as follows:
Conduct a chi-square goodness-of-fit test to determine if the observed distribution of ice cream flavors matches the expected distribution.
chisq.test(x = c(25, 20, 15), p = c(0.50, 0.30, 0.20))
##
## Chi-squared test for given probabilities
##
## data: c(25, 20, 15)
## X-squared = 1.8056, df = 2, p-value = 0.4054
The large p-value suggests that there is no evidence that the observed distribution of ice cream flavors matches the expected distribution.
You are conducting a study to investigate the relationship between the number of hours individuals spend jogging per week and their cardiovascular fitness levels, measured in terms of the maximum oxygen consumption (VO2 max). Collect data from 8 participants and record both the weekly jogging hours and their corresponding VO2 max levels.
Perform a simple linear regression analysis to predict cardiovascular fitness levels (VO2 max) based on the number of hours spent jogging per week. What is the regression equation? Is the slope significantly different from 0 at the 0.05 significance level?
jogging = c(3, 4, 2, 5, 3, 6, 4, 5)
vo2 = c(40, 45, 38, 50, 42, 55, 48, 52)
mydata = data.frame(V02_Max = vo2, Jogging = jogging)
model = lm(V02_Max ~ Jogging, data = mydata)
summary(model)
##
## Call:
## lm(formula = V02_Max ~ Jogging, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.750 -0.875 0.000 0.875 1.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.2500 1.5975 17.68 2.10e-06 ***
## Jogging 4.5000 0.3819 11.78 2.26e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.323 on 6 degrees of freedom
## Multiple R-squared: 0.9586, Adjusted R-squared: 0.9517
## F-statistic: 138.9 on 1 and 6 DF, p-value: 2.256e-05
The regression equation is V02_Max = 28.25 - 4.5Jogging. The p-value (basically 0) suggests that the slope is significantly different from 0 at the 0.05 significance level.
You want to test whether there is a significant difference in test scores among three different teaching methods. Collect data from 4 groups, with 10 students in each group:
Method A: 75, 78, 80, 82, 85, 88, 90, 92, 95, 98 Method B: 72, 75, 78, 80, 82, 85, 88, 90, 92, 95 Method C: 70, 72, 75, 78, 80, 82, 85, 88, 90, 92
Perform an ANOVA to test for a significant difference in test scores among the four teaching methods.
a = c(75, 78, 80, 82, 85, 88, 90, 92, 95, 98)
b = c(72, 75, 78, 80, 82, 85, 88, 90, 92, 95)
c = c(70, 72, 75, 78, 80, 82, 85, 88, 90, 92)
teach = data.frame(Method = rep(c('A', 'B', 'C'), each = 10),
Scores = c(a, b, c))
anova_result = aov(Scores~Method, data = teach)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Method 2 130.1 65.03 1.132 0.337
## Residuals 27 1551.8 57.47
The p-value (0.337) suggests that there is no difference in the test scores among the three different teaching methods.
TukeyHSD(anova_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Scores ~ Method, data = teach)
##
## $Method
## diff lwr upr p adj
## B-A -2.6 -11.00622 5.806219 0.7261563
## C-A -5.1 -13.50622 3.306219 0.3048325
## C-B -2.5 -10.90622 5.906219 0.7436839
With the p-values of all pair-wise comparison not small, the post-hoc test confirms the conclusion just made.