data <- data.frame(
Temperature = factor(
c(rep("24C", 8), rep("30C", 8)),
levels = c("24C", "30C")
),
LoafVolume_mL = c(
812, 775, 840, 798, 821, 790, 835, 806,
742, 760, 735, 718, 755, 729, 748, 739
)
)STAT 4160 — Homework #1
1 Honor Policy
Homework 1 is to be submitted individually. You may work with other students in the class, but you must write up your own responses (and submit the HW individually).
You MUST include the names of the students you worked with.
Worked with: None
2 Problem 1
2.1 Background
A food scientist is studying how dough resting temperature affects the final loaf volume of a yeasted bread. After mixing, each dough batch is rested (proofed) at one of two temperatures before baking. The response is the loaf volume (in mL). Assume all runs were conducted in a random order.
Temperature LoafVolume_mL
1 24C 812
2 24C 775
3 24C 840
4 24C 798
5 24C 821
6 24C 790
7 24C 835
8 24C 806
9 30C 742
10 30C 760
11 30C 735
12 30C 718
13 30C 755
14 30C 729
15 30C 748
16 30C 739
Is there significant evidence that the mean loaf volume is greater in the lower temperature group? We will perform a two-sample t-test to determine.
2.2 1(a)
What is the appropriate statement of hypotheses for this problem?
If the termperature that the bread is baked at increases, the mean loaf volume will also increase.
2.3 1(b)
Which t-test should we perform? Carry out an F-test for equal variance to determine. var.test(LoafVolume_mL ~ Temperature, data = data)
We find that there the p value of this test is 0.23, which means we fail to reject the null of equal variances between groups. Therefore,we assuming that the groups have equal variances. So when we are testing for the differenes in group means via the t statistic, we would use the pooled t test.
2.4 1(c)
Solve for the appropriate t-statistic. t.test(LoafVolume_mL ~ Temperature, data = data, var.equal = TRUE)
The formula to generate a t value for a two sample, pooled t test in this situation would be the difference between the Mean Temp of the bread at 24 degrees (group a) and the mean temp at 30 degrees (group b) divided by the pooled standard deviation times the square root of 1/Na + 1/Nb.
I use the following command to calculate mean loaf volume for the bread baked at 24 degrees: mean(data\(LoafVolume_mL[as.character(data\)Temperature) == “24C”], na.rm = TRUE) = 809.625 and 30 degrees mean(data\(LoafVolume_mL[as.character(data\)Temperature) == “30C”], na.rm = TRUE) = 740.75 I also calculate sample variance at 24 degrees: var(data\(LoafVolume_mL[as.character(data\)Temperature) == “24C”], na.rm = TRUE) = 490.55 and 30 degrees: var(data\(LoafVolume_mL[as.character(data\)Temperature) == “30C”], na.rm = TRUE) = 188.5 which helps us get pooled variance: (8-1)(490.55) + (8-1)(188.5) / (14) = 339.525
Finally we can calculate the t stat = (809.625-740.75) / (sqrt(339.525) * sqrt(1/8 +1/8)) t = 7.457 The t statistic for these two samples with 14 degrees of freedom is 7.457.
2.5 1(d)
What is the p-value of this test? Interpret the probability in the context of the problem.
The p value (using the command: pt(7.47567, 14, lower.tail=FALSE, log.p = FALSE) * 2, or a t table) is 2.987 x 10^-6, which indicates there is an extremely low probability (less than 0.001) given that the null hypothesis, which is that the the loaf volume of bread heated at 24 degrees and the loaf volume of bread heated at 30 degrees, was true, that we would’ve generted a t value as extreme 7.4757, given that we have 14 degrees of freedom. In other words, there is a very high probability based on our sample results that, all else equal, that increasing the temperature of the bread caused the loaf volume of the bread to increase. This provides very strong evidence for the alternative hypothesis, that is, that varying the bread temperature causes the loaf volume of the bread to change.
We can confirm this result and the above result with the following command and table
t.test(LoafVolume_mL ~ Temperature, data = data, var.equal = TRUE)
Two Sample t-test
data: LoafVolume_mL by Temperature
t = 7.4757, df = 14, p-value = 2.987e-06
alternative hypothesis: true difference in means between group 24C and group 30C is not equal to 0
95 percent confidence interval: 49.11481 88.63519
sample estimates: mean in group 24C mean in group 30C 809.625 740.750
Readable interpretation: t = 7.4757 df= 14 p value 2.987 * 10^-6. mean in 24C = 809.625 mean in 30C = 740.750
3 Problem 2
3.1 Background
A randomized, double-blind, placebo-controlled study was conducted to examine whether the daily duration of a guided breathing intervention affects reductions in systolic blood pressure. Participants with mildly elevated blood pressure were randomly assigned to one of four groups: a placebo audio track (Control) or an active guided breathing session lasting 5, 10, or 20 minutes per day. The intervention was followed for 10 weeks.
The response variable is the reduction in systolic blood pressure (in mmHg) measured at the end of the study. Larger values indicate greater improvement.
bp_data <- data.frame(
Group = factor(
rep(
c("Control",
"Breathing_5min",
"Breathing_10min",
"Breathing_20min"),
each = 20
),
levels = c(
"Control",
"Breathing_5min",
"Breathing_10min",
"Breathing_20min"
)
),
SBP_Reduction = c(
## Control (placebo)
3.2, 4.7, 2.9, 5.1, 4.0, 3.8, 5.6, 4.9, 6.1, 3.5,
4.3, 5.0, 3.9, 4.6, 2.8, 6.4, 4.1, 5.3, 3.7, 4.8,
## Breathing – 5 minutes/day
5.1, 6.4, 4.9, 6.7, 5.8, 6.2, 4.6, 7.1, 6.9, 5.3,
7.5, 4.8, 5.6, 6.0, 5.9, 6.8, 7.3, 5.4, 6.6, 5.7,
## Breathing – 10 minutes/day
6.8, 7.2, 6.5, 7.9, 8.1, 7.4, 6.1, 8.5, 7.6, 6.9,
9.0, 6.7, 7.8, 8.2, 7.1, 8.4, 6.3, 7.5, 8.7, 7.0,
## Breathing – 20 minutes/day
7.4, 8.9, 7.8, 9.6, 8.1, 9.2, 7.6, 10.1, 8.7, 9.4,
10.3, 7.9, 8.5, 9.0, 8.8, 9.7, 10.0, 8.3, 9.5, 9.1
)
)str(bp_data)'data.frame': 80 obs. of 2 variables:
$ Group : Factor w/ 4 levels "Control","Breathing_5min",..: 1 1 1 1 1 1 1 1 1 1 ...
$ SBP_Reduction: num 3.2 4.7 2.9 5.1 4 3.8 5.6 4.9 6.1 3.5 ...
head(bp_data) Group SBP_Reduction
1 Control 3.2
2 Control 4.7
3 Control 2.9
4 Control 5.1
5 Control 4.0
6 Control 3.8
Is there significant evidence that the mean reduction in systolic blood pressure differs between any of the treatment groups? We will perform an analysis of variance (ANOVA) to determine.
Was told to disregard this part
The data are available in HW1_PEMF.csv posted in the Modules tab under the “Homework Data Files” section. fileciteturn1file0
Is there significant evidence that the mean BMD differs between any treatment groups? We will perform an analysis of variance (ANOVA) to determine. fileciteturn1file0
3.2 2(a)
What is the appropriate statement of hypotheses for this problem?
As the number of minutes of the guided breathing exercises increases from 0 to 5 to 10 to 15, the reduction of systolic blood pressure will increase.
3.3 2(b)
Complete the ANOVA table: sums of squares, degrees of freedom, mean squares, the F-statistic, and the p-value. aov(SBP_Reduction ~ Group, data = bp_data)
Call: aov(formula = SBP_Reduction ~ Group, data = bp_data)
Terms: Group Residuals Sum of Squares 220.2574 59.9825 Deg. of Freedom 3 76
Residual standard error: 0.8883937
Readable interpretation: SSGroup (Treatment) = 220.2574 dfGroup (Treatment) = 3 SSResiduals = 59.9825 dfResiduals = 76
The above table generates the SStreatment and SSresiduals which is calculated , it also provides us with the Df of the treatment (k-1 = 4-1 = 3) and the Df of the residuals (n-k = 80-4 = 76) we can calculate the F stat by taking the ratio of the mean squared between treatment groups (SStreatment/Df treatment) and the mean squared error (SSerror/Df treatment). This would be (220.257/3) / (59.9825/76) and we find that the F stat is 93.02. We can then calculate the p value by using the following command: pf(93.02, 3, 76, lower.tail=FALSE, log.p=FALSE), and we find that the p value is 2.26x10^-25, which indicates a very small value that is extremely close to zero. We can confirm this by storing the previous table as an object and using the summary command.
anova <- aov(formula = SBP_Reduction ~ Group, data = bp_data) summary(anova)
summary(anova) Df Sum Sq Mean Sq F value Pr(>F)
Group 3 220.26 73.42 93.03 <2e-16 *** Residuals 76 59.98 0.79
Readable interpretation: SSGroup (Treatment), dfGroup, SSresiduals, and dfresiduals, all the same as above.F value is 93.03, and p value is reported as being less than <2 * 10^-16. This is consistent with thev value calculated using the pf command.
3.4 2(c)
Interpret the p-value in the context of the problem.
The p value indicates that if the null, which is that there is no difference in systoic blooc pressure between any of the treatment groups (different levels of guided breathing exercise), is true, the probability that we would’ve generated an F value as extreme or more extreme than the one we generated (i.e. a ratio between the mean squared differences between treatments to residuals) is 2.26^10-25, i.e. close to zero.