Outline

Load data
Clean data
Visualize data
Question 1: What is the average human body temp?
Question 2: Is the average human body statistically different from 98.6?
Question 3: How do Male and Female Body Temps compare?
Question 4: What is the relationship between resting heart rate and temperature
Question 5: How is a paired t-test similar to a 1-sample t-test?

Commands and arguments used

read.csv()
hist()
abline()
sd()
c()
boxplot()
plot()
col =
lwd =
lty =
xlab =

Data source:

Shoemaker, A. 1996. What’s Normal? – daterature, Gender, and Heart Rate. Journal of Statistics Education v.4, n.2 ww2.amstat.org/publications/jse/v4n2/datasets.shoemaker.html

Original inspiration

Journal of the American Medical Association entitled “A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body daterature, and Other Legacies of Carl Reinhold August Wunderlich” (Mackowiak, Wasserman, and Levine 1992).

These data are covered in example 11.3 on page 310 in Whitlock and Shulter, 2nd ed.

Load Body Temp Data

Load the data from a .csv file

dat <- read.csv("bodytemp.csv")

NOTE: if you have trouble loading the data, go to the very end of this page; there is code to replicate the data directly in R w/o loading a .csv file

Look at the Body Temp data

#size of dataframe
dim(dat)

## [1] 130   3

#first few rows
head(dat)

##   temp gender heartrate
## 1 96.3      1        70
## 2 96.7      1        71
## 3 96.9      1        74
## 4 97.0      1        80
## 5 97.1      1        73
## 6 97.1      1        75

#last few rows
tail(dat)

##      temp gender heartrate
## 125  99.2      2        66
## 126  99.3      2        68
## 127  99.4      2        77
## 128  99.9      2        79
## 129 100.0      2        78
## 130 100.8      2        77

Look at overall summary

summary(dat)

##       temp            gender      heartrate    
##  Min.   : 96.30   Min.   :1.0   Min.   :57.00  
##  1st Qu.: 97.80   1st Qu.:1.0   1st Qu.:69.00  
##  Median : 98.30   Median :1.5   Median :74.00  
##  Mean   : 98.25   Mean   :1.5   Mean   :73.76  
##  3rd Qu.: 98.70   3rd Qu.:2.0   3rd Qu.:79.00  
##  Max.   :100.80   Max.   :2.0   Max.   :89.00

Data cleaning:

Convert “1” to male & “2” to female

The original file has male = 1 and female = 2. LEt’s change these to letters using the ifelse() command. What we’ll do is have R look at the gender column in the dat dataframe and if the value equals 1, change it to “male”, else change it to “female”. So, if its a 1, chagne it to male, if it its not a 1, it must be a 2 so change it to female. We’ll put these changes into a new data object.

#New object
gender.MF <- ifelse(dat$gender == 1,"male","female")

Note that we use == to set up the logical argument.

Add new column to existing dataframe

#add new object to existing dataframe
dat <- data.frame(dat, gender.MF)

Look aat the new dataframe

summary(dat)

##       temp            gender      heartrate      gender.MF 
##  Min.   : 96.30   Min.   :1.0   Min.   :57.00   female:65  
##  1st Qu.: 97.80   1st Qu.:1.0   1st Qu.:69.00   male  :65  
##  Median : 98.30   Median :1.5   Median :74.00              
##  Mean   : 98.25   Mean   :1.5   Mean   :73.76              
##  3rd Qu.: 98.70   3rd Qu.:2.0   3rd Qu.:79.00              
##  Max.   :100.80   Max.   :2.0   Max.   :89.00

Note that R automatically makes the gender.MF column into categorical data.

Question 1: What is the average human body temp?

Conventional wisdom is that “normal” body temp is 98.6 degrees F when taken with an oral theremometer. Is this true for this sample?

summary(dat)

##       temp            gender      heartrate      gender.MF 
##  Min.   : 96.30   Min.   :1.0   Min.   :57.00   female:65  
##  1st Qu.: 97.80   1st Qu.:1.0   1st Qu.:69.00   male  :65  
##  Median : 98.30   Median :1.5   Median :74.00              
##  Mean   : 98.25   Mean   :1.5   Mean   :73.76              
##  3rd Qu.: 98.70   3rd Qu.:2.0   3rd Qu.:79.00              
##  Max.   :100.80   Max.   :2.0   Max.   :89.00

The mean and the median are both about 0.3 degrees below 98.6.

Let’s look at the distribution of the data to get a sense for what’s going one.

Graph 1: histogram

Let’s make a histogram of all of the human body temps.
We can add a line for the mean (y.hat) using the command abline().
We give abline the arugment v= to tell it to make a vertical line.
We also pass abline the arguement col=“green” to change the color to green
the name of the color must be in quotes. col = green will not work.
Finally, we pass abline the arguement “lwd=2” to increase the size of the line.
lwd means “line width”

#1st, Calculate the mean and store into an R object
overall.mean <- mean(dat$temp)

#make the histogram of the $temp column
hist(dat$temp)

#add a line for the mean
abline(v = overall.mean, col = 3, lwd = 2)

Graph 2: histogram with more details

We’ll use the following command to add further info to the histogram

mean()
median()
sd()
abline
legend()
col =
lty =
lwd =

First, caculate sum summary stats and save them into R objects.

#Calculate the mean
overall.mean <- mean(dat$temp)

#Calculate the MEDIAN
overall.median <- median(dat$temp)


#Calculate the standard deviation
overall.sd <- sd(dat$temp)

Now, plot the data and add addition information

#make the histogram of the $temp column
hist(dat$temp,
     xlab = "Oral temperature (F)",
     main = "")

#add a green line for the mean
abline(v = overall.mean, 
       col = "green", 
       lwd = 2)

#add a line for the median
#  make the color BLUE
#  make the line DASHED
abline(v = overall.median, 
       col = "blue", 
       lwd = 2, 
       lty = 2)



# Add lines for +/- 1 SD
#  make the color RED

#plus 1 SD.  Note that I add the SD to the mean
abline(v = overall.mean+overall.sd, 
       col = "red", 
       lwd = 2, lty = 2)

#minus 1 SD Note that I subtract the SD from the mean
abline(v = overall.mean-overall.sd, 
       col = 2, 
       lwd = 2, 
       lty = 2)

#Add a line for where 98.6 is
abline(v = 98.6, 
       col = "lightblue", 
       lwd = 2, 
       lty = 4)


#Add a legend
legend.text <- c("mean",
                 "median",
                 "+/- 1 SD",
                 "98.6 degrees F")

colors.used <- c("green","blue","red","lightblue")
legend("topright",
       legend = legend.text,  #what each line is
       col = colors.used,     #what each color is
       lty = c(1,2,2,4),      #the type of dashs used
       lwd = 2)               #the width of the line

Note that 98.6 is higher than the mean and the median.

Question 2:

Is the average human body statistically different from 98.6?

The mean and median of our sample are both less than the conventional value of 98.6. But could this just be due to sampling variation? We can test the hypothesis that this difference is not due to chance using a one-sample t-test.

One-sample t-test

We will test the hypothesis that the mean of this sample of data is different than the conventional wisdom that the “normal” human temperature is 98.6 degrees F.
The null hypothesis (Ho) is that the mean of this sample equals 98.6
Recall that “mu” is the population mean (aka the population parameter).

First, we make an object that contains our value for mu, 98.6. Then we carry out a one-sample t-test.

normal.temp <- 98.6
t.test(dat$temp, 
       mu = normal.temp)

## 
##  One Sample t-test
## 
## data:  dat$temp
## t = -5.4548, df = 129, p-value = 2.411e-07
## alternative hypothesis: true mean is not equal to 98.6
## 95 percent confidence interval:
##  98.12200 98.37646
## sample estimates:
## mean of x 
##  98.24923

Plot the output of the 1-sample t-test

This code is very complex; just focus on the plot that it makes.

#save t-test to R object
one.sample.t.test.output <- t.test(dat$temp, 
       mu = normal.temp)

#extract important stuff from t-test
mean.out <- one.sample.t.test.output$estimate
min.max <- one.sample.t.test.output$conf.int
ci <- one.sample.t.test.output$conf.int
min.max[1] <- min.max[1] -min.max[1]*.001
min.max[2] <- 98.6+0.01

#plot the mean value 
plot(mean.out, ylim = min.max, pch = 16, cex =2,
     xlab = "",
     ylab = "Mean body temp",
     xaxt = "n")

#add errorbars bassed on the confidence intervals
segments(x0 = 1, x1 = 1,
         y0 = mean.out, y1 = ci[2], lwd = 2)
segments(x0 = 1, x1 = 1,
         y0 = mean.out, y1 = ci[1], lwd = 2)

#show where mu is
abline(h = 98.6, col = "red", lty = 2)

#add annotation
text(x = 0.65, y = 98.55, "mu = 98.6")

text(x = 0.7095, y = 98.1, "Error bars = 95%CI")

p.out <- paste("p=", round(one.sample.t.test.output$p.value,7))
text(x = 1.35, y = 98.1, p.out)

The p value is MUCH lower than the traditional cutoff of 0.05.

This 1-sample t-test indicates that the mean temperature of this sample is less than 98.6 and the difference is very unlikely to be due to change.

The most plausible range of values for the true, biologically determine population mean (mu) is between 98.12200 and 98.37646, as defined by the 95% confidence interval. 98.6 is therefore a very inplausible value!

Note, however, than many people did have a higher temp than 98.6; there is significant variation in the body temp of healthy individual.

Question 3:

How do Male and Female Body Temps compare?

Graph 3: How do males and female compare?

boxplot(temp ~ gender.MF, data = dat)

Males appear to have a lower median temp than female. Is this biologically based or could this difference just be due to chance?

We can assess this using a 2-sample t-test.
Before, we had all the data pooled together, hence we did a “1-sample” test.
Now we are splitting the data into two groups and therefore its a two sample test
Two sample tests proably the most common type of t-test
Therefore, if someones says that they did a t-test, the probalby mean a 2-sample test

2-sample t-test

For a 2-sample test we don’t specify “mu”
We specify the test using the equation notation just like the boxplot, using the tilda symbol (~)

t.test(temp ~ gender.MF, 
       data = dat)

## 
##  Welch Two Sample t-test
## 
## data:  temp by gender.MF
## t = 2.2854, df = 127.51, p-value = 0.02394
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.03881298 0.53964856
## sample estimates:
## mean in group female   mean in group male 
##             98.39385             98.10462

What does this test indicate about male and female body temp?
Female body temp is higher - is it closer to 98.6?

Question 4:

What is the relationship between resting heart rate and temperature

Making a scatterplot using plot()

plot(temp ~ heartrate,data = dat)

There might be some relationship here. We could explore this further if we wanted with linear regression.

Do male and females have different relationships between resting hear rate and body temp?

Subset the data by males and females

males <- subset(dat,  
                gender.MF == "male")

females <- subset(dat,  
                gender.MF == "female")

Plot males and females seperately

par(mfrow = c(1,1))
#set xlims
xlims <- c(min(dat$heartrate),max(dat$heartrate))

plot(temp ~ heartrate, 
     data = males,
     xlim = xlims)
points(temp ~ heartrate, 
       data = females, 
       col = "green")

The patterns appear to be similar. We could explore this further with multiple linear regression.

Question 5:

How is a paired t-test similar to a 1-sample t-test?

A paired t-test is also a common type of t-test
A common type of pairing is before and after a treatment occurred
A paired treatment can be set up a couple different ways in R
Mathematically, a paired t-test is similar to a 1-sample t-test.
With a paired t-test we are not interested in whether the overall means of each group or time point (ie before, after) are difference
We are interested in whether the difference between each pair of measurements is consistently different from zero
A paired t-test is therefore equivalent to a 1-sample t-test where the population parameter mu=0

Simulate data

We will look at paired t-tests with some fake data.
I’ve simulated some data representing a person’s body temp before the experimental treatment occured (before)
I’ve also simualted body temps after a stress treatment occurs (stressed)
The hypothesis (Ha) is that being stressed changes your body temp.
The null hypothesis (Ho) is that there is no consistent impact of stress on body temp.

This code makes the fake data; ignore it

      lm.out <- lm(temp ~ heartrate, data = dat)
      
      temp.orig <- simulate(lm.out, seed = 100)
      temp.stress <- simulate(lm.out, seed = 101) + 0.5+ rnorm(length(dat), mean = 0, sd = sd(dat$temp)/2)
      
      new.dat <- data.frame(before = temp.orig$sim_1,stressed = temp.stress$sim_1)

Plot the fake data

#set x axis limits
xlims. <- c(min(new.dat), max(new.dat))

#set up to plot 2 histograms next to each other
par(mfrow = c(1,2))

#plot the 1st hist
hist(new.dat$before,
     main = "", 
     xlim  =xlims.)

#add line for the mean
abline(v = mean(new.dat$before), 
       col = "green",
       lwd = 2)

#plot 2nd hist
hist(new.dat$stressed, 
     main = "", 
     xlim = xlims.)

#add line
abline(v = mean(new.dat$stressed),
       col = "blue", 
       lwd = 2)

(I intentionaly made the means to be different, so the differene is not surprising!)

Calculate the difference between “before”

We can do some simple math to calculate teh difference between before and after
we use the dollar sign ($) to select the columns we want

difference1 <- new.dat$before -  new.dat$stressed

We can add this to our dataframe like this

new.dat$difference1 <- difference1

Note that we could do this all in 1 step if we wanted. Lets flip the order of the values.

new.dat$difference2<- new.dat$stressed - new.dat$before

Visualize data

Look at distribution of differences

hist(new.dat$difference1)

1-sample t-test on difference

We’ll complete the paired t-test process by conducting a 1-sample test, setting mu = 0.
We are therefore testing the hypothesis that the average difference between before and stressed is greater than zero

t.test(new.dat$difference1 , 
       mu = 0)

## 
##  One Sample t-test
## 
## data:  new.dat$difference1
## t = -9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.9823678 -0.6452696
## sample estimates:
##  mean of x 
## -0.8138187

What’s the difference between these two results?

t.test when the difference is calcualted as “new.dat$before - new.dat$stressed”"

t.test(new.dat$difference1 , 
       mu = 0)

## 
##  One Sample t-test
## 
## data:  new.dat$difference1
## t = -9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.9823678 -0.6452696
## sample estimates:
##  mean of x 
## -0.8138187

t.test when the difference is calcualted as new.dat$stressed - new.dat$before

t.test(new.dat$difference2 , 
       mu = 0)

## 
##  One Sample t-test
## 
## data:  new.dat$difference2
## t = 9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.6452696 0.9823678
## sample estimates:
## mean of x 
## 0.8138187

Can you spot the differences? There are 2

Appendix

Code to rebuild the data from scratch

This code allows you to re-build the dataframe without loading any data.

temp <- c(96.3,96.7,96.9,97,97.1,97.1,97.1,97.2,97.3,97.4,97.4,97.4,97.4,97.5,97.5,97.6,97.6,97.6,97.7,97.8,97.8,97.8,97.8,97.9,97.9,98,98,98,98,98,98,98.1,98.1,98.2,98.2,98.2,98.2,98.3,98.3,98.4,98.4,98.4,98.4,98.5,98.5,98.6,98.6,98.6,98.6,98.6,98.6,98.7,98.7,98.8,98.8,98.8,98.9,99,99,99,99.1,99.2,99.3,99.4,99.5,96.4,96.7,96.8,97.2,97.2,97.4,97.6,97.7,97.7,97.8,97.8,97.8,97.9,97.9,97.9,98,98,98,98,98,98.1,98.2,98.2,98.2,98.2,98.2,98.2,98.3,98.3,98.3,98.4,98.4,98.4,98.4,98.4,98.5,98.6,98.6,98.6,98.6,98.7,98.7,98.7,98.7,98.7,98.7,98.8,98.8,98.8,98.8,98.8,98.8,98.8,98.9,99,99,99.1,99.1,99.2,99.2,99.3,99.4,99.9,100,100.8)
gender <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
heartrate <- c(70,71,74,80,73,75,82,64,69,70,68,72,78,70,75,74,69,73,77,58,73,65,74,76,72,78,71,74,67,64,78,73,67,66,64,71,72,86,72,68,70,82,84,68,71,77,78,83,66,70,82,73,78,78,81,78,80,75,79,81,71,83,63,70,75,69,62,75,66,68,57,61,84,61,77,62,71,68,69,79,76,87,78,73,89,81,73,64,65,73,69,57,79,78,80,79,81,73,74,84,83,82,85,86,77,72,79,59,64,65,82,64,70,83,89,69,73,84,76,79,81,80,74,77,66,68,77,79,78,77)

dat <- data.frame(temp = temp,
                  gender = gender,
                  heartrate = heartrate)

t-tests

brouwern@gmail.com

October 12, 2016