• TO DO:

Double check degrees of freedom vs. sample size in equations

Standard error for the difference between 2 means

  • The difference between 2 means is used to calculate the t-statistic
  • Another component is the standard error (SE) for the difference between 2 means
  • The difference between 2 means is the effect size for a t-test (and any comparison of two groups; technically, an unstandardized effect size)
  • The SE of the difference tells us about the uncertainty around the estimate of the effect size



Exmple data: rat bladders

  • 1st, I’ll build up the components need to calculate the difference between 2 means using data from Motulsky’s Intuitive Biostats, Chapter 30.
  • 2nd, I’ll make some functions that will help me do the calculations
  • 3rd, I’ll calcuale the SE of the difference

1) Make data table & calculate summary stats

The Data

Motulsky 2nd Ed, Chapter 30, page 220, Table 30.1. Maximal relaxaction of muscle strips of old and young rat bladders stimualted w/ high concentrations of nonrepinephrine (Frazier et al 2006). Response variable is %E.max

Create table of data and summary stats

This is not a standard R format but just used for illustartion and displaying data

# The data
df.rats <- data.frame(old = c(20.8,2.8,50.0,33.3,29.4,38.9, 29.4,52.6,14.3),
           young = c(45.5,55.0, 60.7, 61.5, 61.1, 65.5,42.9,37.5, NA))


#Means
## Calcualte means
mean.old <- with(df.rats, mean(old))
mean.young <- with(df.rats, mean(young, na.rm = T))

##put means into a vector
means.rats <- c(mean.old,mean.young)

## add means to dataframe
df.rats2 <- rbind(df.rats, means.rats)

##name the row that the means are in
row.names(df.rats2)[dim(df.rats2)[1]] <- "means"

#Standard deviation
##calc sd and add it to the dataframe
sd.old <- with(df.rats, sd(old))
sd.young <- with(df.rats, sd(young, na.rm = T))
sd.rats <- c(sd.old,sd.young)
df.rats2 <- rbind(df.rats2, sd.rats)
row.names(df.rats2)[dim(df.rats2)[1]] <- "SD"

#Sample size
n.old <- 9
n.young <- 8
n.rats <- c(n.old,n.young)
df.rats2 <- rbind(df.rats2, n.rats)
row.names(df.rats2)[dim(df.rats2)[1]] <- "N"


#Difference between means
## Calculate difference btween the means
diff.means <- df.rats2["means",2]-
  df.rats2["means",1]

## Add difference two dataframe
df.rats2 <- rbind(df.rats2, c(diff.means,NA))

##Label row
row.names(df.rats2)[dim(df.rats2)[1]] <- "Diff. means"

The Data table

  • Here’s the finished datatable
  • Again, this is NOT a standard R format for working with data; this was just done to display the data like one might do in a spreadsheet
#pander package makes nice tables
library(pander)
## Warning: package 'pander' was built under R version 3.3.2
pander(df.rats2)
  old young
1 20.8 45.5
2 2.8 55
3 50 60.7
4 33.3 61.5
5 29.4 61.1
6 38.9 65.5
7 29.4 42.9
8 52.6 37.5
9 14.3 NA
means 30.17 53.71
SD 16.09 10.36
N 9 8
Diff. means 23.55 NA

2) Function to calculate the SE of the difference

Steps

  • Pool the variances of each sample
  • Calc. SE of the difference using pooled variance

I’ll make some functions that do this

Pooled variance

Equation for pooled variance

(need to double check this; am not very good at code for equations eyt)

In words in terms of variance: \[var_p = {\frac{df_1*var_1 + df_2*var_2}{df_1 + df_2}}\]

In words in terms of standard deviation (SD) \[var_p = {\frac{df_1*SD^2_1 + df_2*SD^2_2}{df_1 + df_2}}\]

In symbols, in terms of the variance (sigma^2) \[\sigma^2_p = {\frac{df_1*\sigma^2_1 + df_2*\sigma^2_1}{df_1 + df_2}}\]


An R function to calcualte the pooled variance

Give this funtion the two sets of degrees of freedom and the standard deviations, and it returns the pooled variance.

#Formula for POOLED standard deviation

## Note the formulas squares SD to get variance
var.pooled <- function(df1,df2,SD1,SD2){
  (df1*SD1^2 + df2*SD2^2)/(df1+df2)
}



### Calcualte pooled variance using function

Using data from our data frame df.rats2

  • 1st, get the degrees of freedom, df, into a vector w/ 2 element
  • 2nd, get the standard deviations (SD) into a vector
# Calculate pooled variance

## extract N and SD for easy access
dfs <- df.rats2["N", ]
SDs <- df.rats2["SD",]

dfs
##   old young
## N   9     8
SDs
##         old    young
## SD 16.09464 10.36373
## Apply function for pooled sd

### Note: df = sample size - 1
var.pool <-var.pooled(df1 = dfs[1]-1,
                      df2 = dfs[2]-1,
                      SD1 = SDs[1],
                      SD2 = SDs[2])

## Add to pooled variance dataframe
df.rats2["var.pooled",] <- c(var.pool,NA)



Standard error of the difference

Equation for SE of difference

\[SE_d = \sqrt{{var^2_p}(\frac{1}{n_1} + \frac{1}{n_2})} \]

\[SE_d = \sqrt{{\sigma_p^2}(\frac{1}{n_1} + \frac{1}{n_2})} \]

  • SE of the difference aka SE of the effect size
  • Based on pooled variance, var.p
  • Weighted by each sample size

Function to calcualte SE of difference

This function takes the pooled variance and the two sample sizes.

# Standard error of difference
## Note that this uses sample size, NOT degrees of freedom (df)
SE.diff <- function(var.pool, n1,n2){
  sqrt(var.pool*(1/n1 + 1/n2))
}



Apply function SE.diff

#Apply function
## Note: uses sample size, NOT df
se.dif <- SE.diff(var.pool,
                  n1 = dfs[1],
                  n2 = dfs[2])  
  

#Update df
df.rats2["SE.diff",] <- c(se.dif, NA)

Final dataframe

Table 30.2: Results of unpaired t-test. Motulsky Chapter 30, page 221.

pander(round(df.rats2,3))
  old young
1 20.8 45.5
2 2.8 55
3 50 60.7
4 33.3 61.5
5 29.4 61.1
6 38.9 65.5
7 29.4 42.9
8 52.6 37.5
9 14.3 NA
means 30.17 53.71
SD 16.09 10.36
N 9 8
Diff. means 23.55 NA
var.pooled 188.3 NA
SE.diff 6.667 NA