TO DO:

Double check degrees of freedom vs. sample size in equations

Standard error for the difference between 2 means

The difference between 2 means is used to calculate the t-statistic
Another component is the standard error (SE) for the difference between 2 means
The difference between 2 means is the effect size for a t-test (and any comparison of two groups; technically, an unstandardized effect size)
The SE of the difference tells us about the uncertainty around the estimate of the effect size

Exmple data: rat bladders

1st, I’ll build up the components need to calculate the difference between 2 means using data from Motulsky’s Intuitive Biostats, Chapter 30.
2nd, I’ll make some functions that will help me do the calculations
3rd, I’ll calcuale the SE of the difference

1) Make data table & calculate summary stats

The Data

Motulsky 2nd Ed, Chapter 30, page 220, Table 30.1. Maximal relaxaction of muscle strips of old and young rat bladders stimualted w/ high concentrations of nonrepinephrine (Frazier et al 2006). Response variable is %E.max

Create table of data and summary stats

This is not a standard R format but just used for illustartion and displaying data

# The data
df.rats <- data.frame(old = c(20.8,2.8,50.0,33.3,29.4,38.9, 29.4,52.6,14.3),
           young = c(45.5,55.0, 60.7, 61.5, 61.1, 65.5,42.9,37.5, NA))


#Means
## Calcualte means
mean.old <- with(df.rats, mean(old))
mean.young <- with(df.rats, mean(young, na.rm = T))

##put means into a vector
means.rats <- c(mean.old,mean.young)

## add means to dataframe
df.rats2 <- rbind(df.rats, means.rats)

##name the row that the means are in
row.names(df.rats2)[dim(df.rats2)[1]] <- "means"

#Standard deviation
##calc sd and add it to the dataframe
sd.old <- with(df.rats, sd(old))
sd.young <- with(df.rats, sd(young, na.rm = T))
sd.rats <- c(sd.old,sd.young)
df.rats2 <- rbind(df.rats2, sd.rats)
row.names(df.rats2)[dim(df.rats2)[1]] <- "SD"

#Sample size
n.old <- 9
n.young <- 8
n.rats <- c(n.old,n.young)
df.rats2 <- rbind(df.rats2, n.rats)
row.names(df.rats2)[dim(df.rats2)[1]] <- "N"


#Difference between means
## Calculate difference btween the means
diff.means <- df.rats2["means",2]-
  df.rats2["means",1]

## Add difference two dataframe
df.rats2 <- rbind(df.rats2, c(diff.means,NA))

##Label row
row.names(df.rats2)[dim(df.rats2)[1]] <- "Diff. means"

The Data table

Here’s the finished datatable
Again, this is NOT a standard R format for working with data; this was just done to display the data like one might do in a spreadsheet

#pander package makes nice tables
library(pander)

## Warning: package 'pander' was built under R version 3.3.2

pander(df.rats2)

	old	young
1	20.8	45.5
2	2.8	55
3	50	60.7
4	33.3	61.5
5	29.4	61.1
6	38.9	65.5
7	29.4	42.9
8	52.6	37.5
9	14.3	NA
means	30.17	53.71
SD	16.09	10.36
N	9	8
Diff. means	23.55	NA

2) Function to calculate the SE of the difference

Steps

Pool the variances of each sample
Calc. SE of the difference using pooled variance

I’ll make some functions that do this

Pooled variance

Equation for pooled variance

(need to double check this; am not very good at code for equations eyt)

In words in terms of variance: \[var_p = {\frac{df_1*var_1 + df_2*var_2}{df_1 + df_2}}\]

In words in terms of standard deviation (SD) \[var_p = {\frac{df_1*SD^2_1 + df_2*SD^2_2}{df_1 + df_2}}\]

In symbols, in terms of the variance (sigma^2) \[\sigma^2_p = {\frac{df_1*\sigma^2_1 + df_2*\sigma^2_1}{df_1 + df_2}}\]

An R function to calcualte the pooled variance

Give this funtion the two sets of degrees of freedom and the standard deviations, and it returns the pooled variance.

#Formula for POOLED standard deviation

## Note the formulas squares SD to get variance
var.pooled <- function(df1,df2,SD1,SD2){
  (df1*SD1^2 + df2*SD2^2)/(df1+df2)
}

### Calcualte pooled variance using function

Using data from our data frame df.rats2

1st, get the degrees of freedom, df, into a vector w/ 2 element
2nd, get the standard deviations (SD) into a vector

# Calculate pooled variance

## extract N and SD for easy access
dfs <- df.rats2["N", ]
SDs <- df.rats2["SD",]

dfs

##   old young
## N   9     8

SDs

##         old    young
## SD 16.09464 10.36373

## Apply function for pooled sd

### Note: df = sample size - 1
var.pool <-var.pooled(df1 = dfs[1]-1,
                      df2 = dfs[2]-1,
                      SD1 = SDs[1],
                      SD2 = SDs[2])

## Add to pooled variance dataframe
df.rats2["var.pooled",] <- c(var.pool,NA)

Standard error of the difference

Equation for SE of difference

\[SE_d = \sqrt{{var^2_p}(\frac{1}{n_1} + \frac{1}{n_2})} \]

\[SE_d = \sqrt{{\sigma_p^2}(\frac{1}{n_1} + \frac{1}{n_2})} \]

SE of the difference aka SE of the effect size
Based on pooled variance, var.p
Weighted by each sample size

Function to calcualte SE of difference

This function takes the pooled variance and the two sample sizes.

# Standard error of difference
## Note that this uses sample size, NOT degrees of freedom (df)
SE.diff <- function(var.pool, n1,n2){
  sqrt(var.pool*(1/n1 + 1/n2))
}

Apply function SE.diff

#Apply function
## Note: uses sample size, NOT df
se.dif <- SE.diff(var.pool,
                  n1 = dfs[1],
                  n2 = dfs[2])  
  

#Update df
df.rats2["SE.diff",] <- c(se.dif, NA)

Final dataframe

Table 30.2: Results of unpaired t-test. Motulsky Chapter 30, page 221.

pander(round(df.rats2,3))

	old	young
1	20.8	45.5
2	2.8	55
3	50	60.7
4	33.3	61.5
5	29.4	61.1
6	38.9	65.5
7	29.4	42.9
8	52.6	37.5
9	14.3	NA
means	30.17	53.71
SD	16.09	10.36
N	9	8
Diff. means	23.55	NA
var.pooled	188.3	NA
SE.diff	6.667	NA

Standard error of the difference between means

brouwern@gmail.com

February 1, 2017