- TO DO:

Double check degrees of freedom vs. sample size in equations

- The difference between 2 means is used to calculate the t-statistic
- Another component is the standard error (SE) for the difference between 2 means
- The difference between 2 means is the effect size for a t-test (and any comparison of two groups; technically, an unstandardized effect size)
- The SE of the difference tells us about the uncertainty around the estimate of the effect size

- 1st, I’ll build up the components need to calculate the difference between 2 means using data from Motulsky’s Intuitive Biostats, Chapter 30.
- 2nd, I’ll make some functions that will help me do the calculations
- 3rd, I’ll calcuale the SE of the difference

Motulsky 2nd Ed, Chapter 30, page 220, Table 30.1. Maximal relaxaction of muscle strips of old and young rat bladders stimualted w/ high concentrations of nonrepinephrine (Frazier et al 2006). Response variable is %E.max

This is not a standard R format but just used for illustartion and displaying data

```
# The data
df.rats <- data.frame(old = c(20.8,2.8,50.0,33.3,29.4,38.9, 29.4,52.6,14.3),
young = c(45.5,55.0, 60.7, 61.5, 61.1, 65.5,42.9,37.5, NA))
#Means
## Calcualte means
mean.old <- with(df.rats, mean(old))
mean.young <- with(df.rats, mean(young, na.rm = T))
##put means into a vector
means.rats <- c(mean.old,mean.young)
## add means to dataframe
df.rats2 <- rbind(df.rats, means.rats)
##name the row that the means are in
row.names(df.rats2)[dim(df.rats2)[1]] <- "means"
#Standard deviation
##calc sd and add it to the dataframe
sd.old <- with(df.rats, sd(old))
sd.young <- with(df.rats, sd(young, na.rm = T))
sd.rats <- c(sd.old,sd.young)
df.rats2 <- rbind(df.rats2, sd.rats)
row.names(df.rats2)[dim(df.rats2)[1]] <- "SD"
#Sample size
n.old <- 9
n.young <- 8
n.rats <- c(n.old,n.young)
df.rats2 <- rbind(df.rats2, n.rats)
row.names(df.rats2)[dim(df.rats2)[1]] <- "N"
#Difference between means
## Calculate difference btween the means
diff.means <- df.rats2["means",2]-
df.rats2["means",1]
## Add difference two dataframe
df.rats2 <- rbind(df.rats2, c(diff.means,NA))
##Label row
row.names(df.rats2)[dim(df.rats2)[1]] <- "Diff. means"
```

- Here’s the finished datatable
- Again, this is NOT a standard R format for working with data; this was just done to display the data like one might do in a spreadsheet

```
#pander package makes nice tables
library(pander)
```

`## Warning: package 'pander' was built under R version 3.3.2`

`pander(df.rats2)`

old | young | |
---|---|---|

1 |
20.8 | 45.5 |

2 |
2.8 | 55 |

3 |
50 | 60.7 |

4 |
33.3 | 61.5 |

5 |
29.4 | 61.1 |

6 |
38.9 | 65.5 |

7 |
29.4 | 42.9 |

8 |
52.6 | 37.5 |

9 |
14.3 | NA |

means |
30.17 | 53.71 |

SD |
16.09 | 10.36 |

N |
9 | 8 |

Diff. means |
23.55 | NA |

**Steps**

- Pool the variances of each sample
- Calc. SE of the difference using pooled variance

I’ll make some functions that do this

(need to double check this; am not very good at code for equations eyt)

In words in terms of variance: \[var_p = {\frac{df_1*var_1 + df_2*var_2}{df_1 + df_2}}\]

In words in terms of standard deviation (SD) \[var_p = {\frac{df_1*SD^2_1 + df_2*SD^2_2}{df_1 + df_2}}\]

In symbols, in terms of the variance (sigma^2) \[\sigma^2_p = {\frac{df_1*\sigma^2_1 + df_2*\sigma^2_1}{df_1 + df_2}}\]

Give this funtion the two sets of degrees of freedom and the standard deviations, and it returns the pooled variance.

```
#Formula for POOLED standard deviation
## Note the formulas squares SD to get variance
var.pooled <- function(df1,df2,SD1,SD2){
(df1*SD1^2 + df2*SD2^2)/(df1+df2)
}
```

### Calcualte pooled variance using function

Using data from our data frame df.rats2

- 1st, get the degrees of freedom, df, into a vector w/ 2 element
- 2nd, get the standard deviations (SD) into a vector

```
# Calculate pooled variance
## extract N and SD for easy access
dfs <- df.rats2["N", ]
SDs <- df.rats2["SD",]
dfs
```

```
## old young
## N 9 8
```

`SDs`

```
## old young
## SD 16.09464 10.36373
```

```
## Apply function for pooled sd
### Note: df = sample size - 1
var.pool <-var.pooled(df1 = dfs[1]-1,
df2 = dfs[2]-1,
SD1 = SDs[1],
SD2 = SDs[2])
## Add to pooled variance dataframe
df.rats2["var.pooled",] <- c(var.pool,NA)
```

\[SE_d = \sqrt{{var^2_p}(\frac{1}{n_1} + \frac{1}{n_2})} \]

\[SE_d = \sqrt{{\sigma_p^2}(\frac{1}{n_1} + \frac{1}{n_2})} \]

- SE of the difference aka SE of the effect size
- Based on pooled variance, var.p
- Weighted by each sample size

This function takes the pooled variance and the two sample sizes.

```
# Standard error of difference
## Note that this uses sample size, NOT degrees of freedom (df)
SE.diff <- function(var.pool, n1,n2){
sqrt(var.pool*(1/n1 + 1/n2))
}
```

```
#Apply function
## Note: uses sample size, NOT df
se.dif <- SE.diff(var.pool,
n1 = dfs[1],
n2 = dfs[2])
#Update df
df.rats2["SE.diff",] <- c(se.dif, NA)
```

**Table 30.2: Results of unpaired t-test.** Motulsky Chapter 30, page 221.

`pander(round(df.rats2,3))`

old | young | |
---|---|---|

1 |
20.8 | 45.5 |

2 |
2.8 | 55 |

3 |
50 | 60.7 |

4 |
33.3 | 61.5 |

5 |
29.4 | 61.1 |

6 |
38.9 | 65.5 |

7 |
29.4 | 42.9 |

8 |
52.6 | 37.5 |

9 |
14.3 | NA |

means |
30.17 | 53.71 |

SD |
16.09 | 10.36 |

N |
9 | 8 |

Diff. means |
23.55 | NA |

var.pooled |
188.3 | NA |

SE.diff |
6.667 | NA |