Shoemaker, A. 1996. What’s Normal? – daterature, Gender, and Heart Rate. Journal of Statistics Education v.4, n.2 ww2.amstat.org/publications/jse/v4n2/datasets.shoemaker.html
Journal of the American Medical Association entitled “A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body daterature, and Other Legacies of Carl Reinhold August Wunderlich” (Mackowiak, Wasserman, and Levine 1992).
These data are covered in example 11.3 on page 310 in Whitlock and Shulter, 2nd ed.
Load the data from a .csv file
dat <- read.csv("bodytemp.csv")
NOTE: if you have trouble loading the data, go to the very end of this page; there is code to replicate the data directly in R w/o loading a .csv file
#size of dataframe
dim(dat)
## [1] 130 3
#first few rows
head(dat)
## temp gender heartrate
## 1 96.3 1 70
## 2 96.7 1 71
## 3 96.9 1 74
## 4 97.0 1 80
## 5 97.1 1 73
## 6 97.1 1 75
#last few rows
tail(dat)
## temp gender heartrate
## 125 99.2 2 66
## 126 99.3 2 68
## 127 99.4 2 77
## 128 99.9 2 79
## 129 100.0 2 78
## 130 100.8 2 77
summary(dat)
## temp gender heartrate
## Min. : 96.30 Min. :1.0 Min. :57.00
## 1st Qu.: 97.80 1st Qu.:1.0 1st Qu.:69.00
## Median : 98.30 Median :1.5 Median :74.00
## Mean : 98.25 Mean :1.5 Mean :73.76
## 3rd Qu.: 98.70 3rd Qu.:2.0 3rd Qu.:79.00
## Max. :100.80 Max. :2.0 Max. :89.00
The original file has male = 1 and female = 2. LEt’s change these to letters using the ifelse() command. What we’ll do is have R look at the gender column in the dat dataframe and if the value equals 1, change it to “male”, else change it to “female”. So, if its a 1, chagne it to male, if it its not a 1, it must be a 2 so change it to female. We’ll put these changes into a new data object.
#New object
gender.MF <- ifelse(dat$gender == 1,"male","female")
Note that we use == to set up the logical argument.
#add new object to existing dataframe
dat <- data.frame(dat, gender.MF)
Look aat the new dataframe
summary(dat)
## temp gender heartrate gender.MF
## Min. : 96.30 Min. :1.0 Min. :57.00 female:65
## 1st Qu.: 97.80 1st Qu.:1.0 1st Qu.:69.00 male :65
## Median : 98.30 Median :1.5 Median :74.00
## Mean : 98.25 Mean :1.5 Mean :73.76
## 3rd Qu.: 98.70 3rd Qu.:2.0 3rd Qu.:79.00
## Max. :100.80 Max. :2.0 Max. :89.00
Note that R automatically makes the gender.MF column into categorical data.
Conventional wisdom is that “normal” body temp is 98.6 degrees F when taken with an oral theremometer. Is this true for this sample?
summary(dat)
## temp gender heartrate gender.MF
## Min. : 96.30 Min. :1.0 Min. :57.00 female:65
## 1st Qu.: 97.80 1st Qu.:1.0 1st Qu.:69.00 male :65
## Median : 98.30 Median :1.5 Median :74.00
## Mean : 98.25 Mean :1.5 Mean :73.76
## 3rd Qu.: 98.70 3rd Qu.:2.0 3rd Qu.:79.00
## Max. :100.80 Max. :2.0 Max. :89.00
The mean and the median are both about 0.3 degrees below 98.6.
Let’s look at the distribution of the data to get a sense for what’s going one.
#1st, Calculate the mean and store into an R object
overall.mean <- mean(dat$temp)
#make the histogram of the $temp column
hist(dat$temp)
#add a line for the mean
abline(v = overall.mean, col = 3, lwd = 2)
We’ll use the following command to add further info to the histogram
First, caculate sum summary stats and save them into R objects.
#Calculate the mean
overall.mean <- mean(dat$temp)
#Calculate the MEDIAN
overall.median <- median(dat$temp)
#Calculate the standard deviation
overall.sd <- sd(dat$temp)
Now, plot the data and add addition information
#make the histogram of the $temp column
hist(dat$temp,
xlab = "Oral temperature (F)",
main = "")
#add a green line for the mean
abline(v = overall.mean,
col = "green",
lwd = 2)
#add a line for the median
# make the color BLUE
# make the line DASHED
abline(v = overall.median,
col = "blue",
lwd = 2,
lty = 2)
# Add lines for +/- 1 SD
# make the color RED
#plus 1 SD. Note that I add the SD to the mean
abline(v = overall.mean+overall.sd,
col = "red",
lwd = 2, lty = 2)
#minus 1 SD Note that I subtract the SD from the mean
abline(v = overall.mean-overall.sd,
col = 2,
lwd = 2,
lty = 2)
#Add a line for where 98.6 is
abline(v = 98.6,
col = "lightblue",
lwd = 2,
lty = 4)
#Add a legend
legend.text <- c("mean",
"median",
"+/- 1 SD",
"98.6 degrees F")
colors.used <- c("green","blue","red","lightblue")
legend("topright",
legend = legend.text, #what each line is
col = colors.used, #what each color is
lty = c(1,2,2,4), #the type of dashs used
lwd = 2) #the width of the line
Note that 98.6 is higher than the mean and the median.
The mean and median of our sample are both less than the conventional value of 98.6. But could this just be due to sampling variation? We can test the hypothesis that this difference is not due to chance using a one-sample t-test.
First, we make an object that contains our value for mu, 98.6. Then we carry out a one-sample t-test.
normal.temp <- 98.6
t.test(dat$temp,
mu = normal.temp)
##
## One Sample t-test
##
## data: dat$temp
## t = -5.4548, df = 129, p-value = 2.411e-07
## alternative hypothesis: true mean is not equal to 98.6
## 95 percent confidence interval:
## 98.12200 98.37646
## sample estimates:
## mean of x
## 98.24923
This code is very complex; just focus on the plot that it makes.
#save t-test to R object
one.sample.t.test.output <- t.test(dat$temp,
mu = normal.temp)
#extract important stuff from t-test
mean.out <- one.sample.t.test.output$estimate
min.max <- one.sample.t.test.output$conf.int
ci <- one.sample.t.test.output$conf.int
min.max[1] <- min.max[1] -min.max[1]*.001
min.max[2] <- 98.6+0.01
#plot the mean value
plot(mean.out, ylim = min.max, pch = 16, cex =2,
xlab = "",
ylab = "Mean body temp",
xaxt = "n")
#add errorbars bassed on the confidence intervals
segments(x0 = 1, x1 = 1,
y0 = mean.out, y1 = ci[2], lwd = 2)
segments(x0 = 1, x1 = 1,
y0 = mean.out, y1 = ci[1], lwd = 2)
#show where mu is
abline(h = 98.6, col = "red", lty = 2)
#add annotation
text(x = 0.65, y = 98.55, "mu = 98.6")
text(x = 0.7095, y = 98.1, "Error bars = 95%CI")
p.out <- paste("p=", round(one.sample.t.test.output$p.value,7))
text(x = 1.35, y = 98.1, p.out)
The p value is MUCH lower than the traditional cutoff of 0.05.
This 1-sample t-test indicates that the mean temperature of this sample is less than 98.6 and the difference is very unlikely to be due to change.
The most plausible range of values for the true, biologically determine population mean (mu) is between 98.12200 and 98.37646, as defined by the 95% confidence interval. 98.6 is therefore a very inplausible value!
Note, however, than many people did have a higher temp than 98.6; there is significant variation in the body temp of healthy individual.
boxplot(temp ~ gender.MF, data = dat)
Males appear to have a lower median temp than female. Is this biologically based or could this difference just be due to chance?
t.test(temp ~ gender.MF,
data = dat)
##
## Welch Two Sample t-test
##
## data: temp by gender.MF
## t = 2.2854, df = 127.51, p-value = 0.02394
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.03881298 0.53964856
## sample estimates:
## mean in group female mean in group male
## 98.39385 98.10462
plot(temp ~ heartrate,data = dat)
There might be some relationship here. We could explore this further if we wanted with linear regression.
Subset the data by males and females
males <- subset(dat,
gender.MF == "male")
females <- subset(dat,
gender.MF == "female")
Plot males and females seperately
par(mfrow = c(1,1))
#set xlims
xlims <- c(min(dat$heartrate),max(dat$heartrate))
plot(temp ~ heartrate,
data = males,
xlim = xlims)
points(temp ~ heartrate,
data = females,
col = "green")
The patterns appear to be similar. We could explore this further with multiple linear regression.
lm.out <- lm(temp ~ heartrate, data = dat)
temp.orig <- simulate(lm.out, seed = 100)
temp.stress <- simulate(lm.out, seed = 101) + 0.5+ rnorm(length(dat), mean = 0, sd = sd(dat$temp)/2)
new.dat <- data.frame(before = temp.orig$sim_1,stressed = temp.stress$sim_1)
#set x axis limits
xlims. <- c(min(new.dat), max(new.dat))
#set up to plot 2 histograms next to each other
par(mfrow = c(1,2))
#plot the 1st hist
hist(new.dat$before,
main = "",
xlim =xlims.)
#add line for the mean
abline(v = mean(new.dat$before),
col = "green",
lwd = 2)
#plot 2nd hist
hist(new.dat$stressed,
main = "",
xlim = xlims.)
#add line
abline(v = mean(new.dat$stressed),
col = "blue",
lwd = 2)
(I intentionaly made the means to be different, so the differene is not surprising!)
difference1 <- new.dat$before - new.dat$stressed
We can add this to our dataframe like this
new.dat$difference1 <- difference1
Note that we could do this all in 1 step if we wanted. Lets flip the order of the values.
new.dat$difference2<- new.dat$stressed - new.dat$before
Look at distribution of differences
hist(new.dat$difference1)
t.test(new.dat$difference1 ,
mu = 0)
##
## One Sample t-test
##
## data: new.dat$difference1
## t = -9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.9823678 -0.6452696
## sample estimates:
## mean of x
## -0.8138187
t.test when the difference is calcualted as “new.dat\(before - new.dat\)stressed”"
t.test(new.dat$difference1 ,
mu = 0)
##
## One Sample t-test
##
## data: new.dat$difference1
## t = -9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.9823678 -0.6452696
## sample estimates:
## mean of x
## -0.8138187
t.test when the difference is calcualted as new.dat\(stressed - new.dat\)before
t.test(new.dat$difference2 ,
mu = 0)
##
## One Sample t-test
##
## data: new.dat$difference2
## t = 9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.6452696 0.9823678
## sample estimates:
## mean of x
## 0.8138187
Can you spot the differences? There are 2
This code allows you to re-build the dataframe without loading any data.
temp <- c(96.3,96.7,96.9,97,97.1,97.1,97.1,97.2,97.3,97.4,97.4,97.4,97.4,97.5,97.5,97.6,97.6,97.6,97.7,97.8,97.8,97.8,97.8,97.9,97.9,98,98,98,98,98,98,98.1,98.1,98.2,98.2,98.2,98.2,98.3,98.3,98.4,98.4,98.4,98.4,98.5,98.5,98.6,98.6,98.6,98.6,98.6,98.6,98.7,98.7,98.8,98.8,98.8,98.9,99,99,99,99.1,99.2,99.3,99.4,99.5,96.4,96.7,96.8,97.2,97.2,97.4,97.6,97.7,97.7,97.8,97.8,97.8,97.9,97.9,97.9,98,98,98,98,98,98.1,98.2,98.2,98.2,98.2,98.2,98.2,98.3,98.3,98.3,98.4,98.4,98.4,98.4,98.4,98.5,98.6,98.6,98.6,98.6,98.7,98.7,98.7,98.7,98.7,98.7,98.8,98.8,98.8,98.8,98.8,98.8,98.8,98.9,99,99,99.1,99.1,99.2,99.2,99.3,99.4,99.9,100,100.8)
gender <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
heartrate <- c(70,71,74,80,73,75,82,64,69,70,68,72,78,70,75,74,69,73,77,58,73,65,74,76,72,78,71,74,67,64,78,73,67,66,64,71,72,86,72,68,70,82,84,68,71,77,78,83,66,70,82,73,78,78,81,78,80,75,79,81,71,83,63,70,75,69,62,75,66,68,57,61,84,61,77,62,71,68,69,79,76,87,78,73,89,81,73,64,65,73,69,57,79,78,80,79,81,73,74,84,83,82,85,86,77,72,79,59,64,65,82,64,70,83,89,69,73,84,76,79,81,80,74,77,66,68,77,79,78,77)
dat <- data.frame(temp = temp,
gender = gender,
heartrate = heartrate)