STAT 4160 — Homework #1

1 Honor Policy

Homework 1 is to be submitted individually. You may work with other students in the class, but you must write up your own responses (and submit the HW individually).

You MUST include the names of the students you worked with.

Worked with: None

2 Problem 1

2.1 Background

A food scientist is studying how dough resting temperature affects the final loaf volume of a yeasted bread. After mixing, each dough batch is rested (proofed) at one of two temperatures before baking. The response is the loaf volume (in mL). Assume all runs were conducted in a random order.

data <- data.frame(
  Temperature = factor(
    c(rep("24C", 8), rep("30C", 8)),
    levels = c("24C", "30C")
  ),
  LoafVolume_mL = c(
    812, 775, 840, 798, 821, 790, 835, 806,
    742, 760, 735, 718, 755, 729, 748, 739
  )
)

   Temperature LoafVolume_mL
1          24C           812
2          24C           775
3          24C           840
4          24C           798
5          24C           821
6          24C           790
7          24C           835
8          24C           806
9          30C           742
10         30C           760
11         30C           735
12         30C           718
13         30C           755
14         30C           729
15         30C           748
16         30C           739

Is there significant evidence that the mean loaf volume is greater in the lower temperature group? We will perform a two-sample t-test to determine.

2.2 1(a)

What is the appropriate statement of hypotheses for this problem?

Answer (Problem 1a)

If the termperature that the bread is baked at increases, the mean loaf volume will also increase.

2.3 1(b)

Which t-test should we perform? Carry out an F-test for equal variance to determine. var.test(LoafVolume_mL ~ Temperature, data = data)

Answer (Problem 1b)

We find that there the p value of this test is 0.23, which means we fail to reject the null of equal variances between groups. Therefore,we assuming that the groups have equal variances. So when we are testing for the differenes in group means via the t statistic, we would use the pooled t test.

2.4 1(c)

Solve for the appropriate t-statistic. t.test(LoafVolume_mL ~ Temperature, data = data, var.equal = TRUE)

Answer (Problem 1c)

The formula to generate a t value for a two sample, pooled t test in this situation would be the difference between the Mean Temp of the bread at 24 degrees (group a) and the mean temp at 30 degrees (group b) divided by the pooled standard deviation times the square root of 1/Na + 1/Nb.

I use the following command to calculate mean loaf volume for the bread baked at 24 degrees: mean(data\(LoafVolume_mL[as.character(data\)Temperature) == “24C”], na.rm = TRUE) = 809.625 and 30 degrees mean(data\(LoafVolume_mL[as.character(data\)Temperature) == “30C”], na.rm = TRUE) = 740.75 I also calculate sample variance at 24 degrees: var(data\(LoafVolume_mL[as.character(data\)Temperature) == “24C”], na.rm = TRUE) = 490.55 and 30 degrees: var(data\(LoafVolume_mL[as.character(data\)Temperature) == “30C”], na.rm = TRUE) = 188.5 which helps us get pooled variance: (8-1)(490.55) + (8-1)(188.5) / (14) = 339.525

Finally we can calculate the t stat = (809.625-740.75) / (sqrt(339.525) * sqrt(1/8 +1/8)) t = 7.457 The t statistic for these two samples with 14 degrees of freedom is 7.457.

2.5 1(d)

What is the p-value of this test? Interpret the probability in the context of the problem.

Answer (Problem 1d)

The p value (using the command: pt(7.47567, 14, lower.tail=FALSE, log.p = FALSE) * 2, or a t table) is 2.987 x 10^-6, which indicates there is an extremely low probability (less than 0.001) given that the null hypothesis, which is that the the loaf volume of bread heated at 24 degrees and the loaf volume of bread heated at 30 degrees, was true, that we would’ve generted a t value as extreme 7.4757, given that we have 14 degrees of freedom. In other words, there is a very high probability based on our sample results that, all else equal, that increasing the temperature of the bread caused the loaf volume of the bread to increase. This provides very strong evidence for the alternative hypothesis, that is, that varying the bread temperature causes the loaf volume of the bread to change.

We can confirm this result and the above result with the following command and table

t.test(LoafVolume_mL ~ Temperature, data = data, var.equal = TRUE)

Two Sample t-test

data: LoafVolume_mL by Temperature

t = 7.4757, df = 14, p-value = 2.987e-06

alternative hypothesis: true difference in means between group 24C and group 30C is not equal to 0

95 percent confidence interval: 49.11481 88.63519

sample estimates: mean in group 24C mean in group 30C 809.625 740.750

Readable interpretation: t = 7.4757 df= 14 p value 2.987 * 10^-6. mean in 24C = 809.625 mean in 30C = 740.750

3 Problem 2

3.1 Background

A randomized, double-blind, placebo-controlled study was conducted to examine whether the daily duration of a guided breathing intervention affects reductions in systolic blood pressure. Participants with mildly elevated blood pressure were randomly assigned to one of four groups: a placebo audio track (Control) or an active guided breathing session lasting 5, 10, or 20 minutes per day. The intervention was followed for 10 weeks.

The response variable is the reduction in systolic blood pressure (in mmHg) measured at the end of the study. Larger values indicate greater improvement.

bp_data <- data.frame(
  Group = factor(
    rep(
      c("Control",
        "Breathing_5min",
        "Breathing_10min",
        "Breathing_20min"),
      each = 20
    ),
    levels = c(
      "Control",
      "Breathing_5min",
      "Breathing_10min",
      "Breathing_20min"
    )
  ),
  SBP_Reduction = c(
    ## Control (placebo)
    3.2, 4.7, 2.9, 5.1, 4.0, 3.8, 5.6, 4.9, 6.1, 3.5,
    4.3, 5.0, 3.9, 4.6, 2.8, 6.4, 4.1, 5.3, 3.7, 4.8,

    ## Breathing – 5 minutes/day
    5.1, 6.4, 4.9, 6.7, 5.8, 6.2, 4.6, 7.1, 6.9, 5.3,
    7.5, 4.8, 5.6, 6.0, 5.9, 6.8, 7.3, 5.4, 6.6, 5.7,

    ## Breathing – 10 minutes/day
    6.8, 7.2, 6.5, 7.9, 8.1, 7.4, 6.1, 8.5, 7.6, 6.9,
    9.0, 6.7, 7.8, 8.2, 7.1, 8.4, 6.3, 7.5, 8.7, 7.0,

    ## Breathing – 20 minutes/day
    7.4, 8.9, 7.8, 9.6, 8.1, 9.2, 7.6, 10.1, 8.7, 9.4,
    10.3, 7.9, 8.5, 9.0, 8.8, 9.7, 10.0, 8.3, 9.5, 9.1
  )
)

str(bp_data)

'data.frame':   80 obs. of  2 variables:
 $ Group        : Factor w/ 4 levels "Control","Breathing_5min",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ SBP_Reduction: num  3.2 4.7 2.9 5.1 4 3.8 5.6 4.9 6.1 3.5 ...

head(bp_data)

    Group SBP_Reduction
1 Control           3.2
2 Control           4.7
3 Control           2.9
4 Control           5.1
5 Control           4.0
6 Control           3.8

Is there significant evidence that the mean reduction in systolic blood pressure differs between any of the treatment groups? We will perform an analysis of variance (ANOVA) to determine.

Was told to disregard this part

The data are available in HW1_PEMF.csv posted in the Modules tab under the “Homework Data Files” section. fileciteturn1file0

Is there significant evidence that the mean BMD differs between any treatment groups? We will perform an analysis of variance (ANOVA) to determine. fileciteturn1file0

3.2 2(a)

What is the appropriate statement of hypotheses for this problem?

Answer (Problem 2a)

As the number of minutes of the guided breathing exercises increases from 0 to 5 to 10 to 15, the reduction of systolic blood pressure will increase.

3.3 2(b)

Complete the ANOVA table: sums of squares, degrees of freedom, mean squares, the F-statistic, and the p-value. aov(SBP_Reduction ~ Group, data = bp_data)

Answer (Problem 2b)

Call: aov(formula = SBP_Reduction ~ Group, data = bp_data)

Terms: Group Residuals Sum of Squares 220.2574 59.9825 Deg. of Freedom 3 76

Residual standard error: 0.8883937

Readable interpretation: SSGroup (Treatment) = 220.2574 dfGroup (Treatment) = 3 SSResiduals = 59.9825 dfResiduals = 76

The above table generates the SStreatment and SSresiduals which is calculated , it also provides us with the Df of the treatment (k-1 = 4-1 = 3) and the Df of the residuals (n-k = 80-4 = 76) we can calculate the F stat by taking the ratio of the mean squared between treatment groups (SStreatment/Df treatment) and the mean squared error (SSerror/Df treatment). This would be (220.257/3) / (59.9825/76) and we find that the F stat is 93.02. We can then calculate the p value by using the following command: pf(93.02, 3, 76, lower.tail=FALSE, log.p=FALSE), and we find that the p value is 2.26x10^-25, which indicates a very small value that is extremely close to zero. We can confirm this by storing the previous table as an object and using the summary command.

anova <- aov(formula = SBP_Reduction ~ Group, data = bp_data) summary(anova)

summary(anova) Df Sum Sq Mean Sq F value Pr(>F)
Group 3 220.26 73.42 93.03 <2e-16 *** Residuals 76 59.98 0.79

Readable interpretation: SSGroup (Treatment), dfGroup, SSresiduals, and dfresiduals, all the same as above.F value is 93.03, and p value is reported as being less than <2 * 10^-16. This is consistent with thev value calculated using the pf command.

3.4 2(c)

Interpret the p-value in the context of the problem.

Answer (Problem 2c)

The p value indicates that if the null, which is that there is no difference in systoic blooc pressure between any of the treatment groups (different levels of guided breathing exercise), is true, the probability that we would’ve generated an F value as extreme or more extreme than the one we generated (i.e. a ratio between the mean squared differences between treatments to residuals) is 2.26^10-25, i.e. close to zero.