- Load data
- Clean data
- Visualize data
- Question 1: What is the average human body temp?
- Question 2: Is the average human body statistically different from 98.6?
- Question 3: How do Male and Female Body Temps compare?
- Question 4: What is the relationship between resting heart rate and temperature
- Question 5: How is a paired t-test similar to a 1-sample t-test?

- read.csv()
- hist()
- abline()
- sd()
- c()
- boxplot()
- plot()
- col =
- lwd =
- lty =
- xlab =

Shoemaker, A. 1996. What’s Normal? – daterature, Gender, and Heart Rate. Journal of Statistics Education v.4, n.2 ww2.amstat.org/publications/jse/v4n2/datasets.shoemaker.html

Journal of the American Medical Association entitled “A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body daterature, and Other Legacies of Carl Reinhold August Wunderlich” (Mackowiak, Wasserman, and Levine 1992).

These data are covered in example 11.3 on page 310 in Whitlock and Shulter, 2nd ed.

Load the data from a .csv file

`dat <- read.csv("bodytemp.csv")`

**NOTE:** if you have trouble loading the data, go to the very end of this page; there is code to replicate the data directly in R w/o loading a .csv file

```
#size of dataframe
dim(dat)
```

`## [1] 130 3`

```
#first few rows
head(dat)
```

```
## temp gender heartrate
## 1 96.3 1 70
## 2 96.7 1 71
## 3 96.9 1 74
## 4 97.0 1 80
## 5 97.1 1 73
## 6 97.1 1 75
```

```
#last few rows
tail(dat)
```

```
## temp gender heartrate
## 125 99.2 2 66
## 126 99.3 2 68
## 127 99.4 2 77
## 128 99.9 2 79
## 129 100.0 2 78
## 130 100.8 2 77
```

`summary(dat)`

```
## temp gender heartrate
## Min. : 96.30 Min. :1.0 Min. :57.00
## 1st Qu.: 97.80 1st Qu.:1.0 1st Qu.:69.00
## Median : 98.30 Median :1.5 Median :74.00
## Mean : 98.25 Mean :1.5 Mean :73.76
## 3rd Qu.: 98.70 3rd Qu.:2.0 3rd Qu.:79.00
## Max. :100.80 Max. :2.0 Max. :89.00
```

The original file has male = 1 and female = 2. LEt’s change these to letters using the **ifelse()** command. What we’ll do is have R look at the gender column in the dat dataframe and if the value equals 1, change it to “male”, else change it to “female”. So, if its a 1, chagne it to male, if it its not a 1, it must be a 2 so change it to female. We’ll put these changes into a new data object.

```
#New object
gender.MF <- ifelse(dat$gender == 1,"male","female")
```

Note that we use **==** to set up the logical argument.

```
#add new object to existing dataframe
dat <- data.frame(dat, gender.MF)
```

Look aat the new dataframe

`summary(dat)`

```
## temp gender heartrate gender.MF
## Min. : 96.30 Min. :1.0 Min. :57.00 female:65
## 1st Qu.: 97.80 1st Qu.:1.0 1st Qu.:69.00 male :65
## Median : 98.30 Median :1.5 Median :74.00
## Mean : 98.25 Mean :1.5 Mean :73.76
## 3rd Qu.: 98.70 3rd Qu.:2.0 3rd Qu.:79.00
## Max. :100.80 Max. :2.0 Max. :89.00
```

Note that R automatically makes the gender.MF column into categorical data.

Conventional wisdom is that “normal” body temp is **98.6** degrees F when taken with an oral theremometer. Is this true for this sample?

`summary(dat)`

```
## temp gender heartrate gender.MF
## Min. : 96.30 Min. :1.0 Min. :57.00 female:65
## 1st Qu.: 97.80 1st Qu.:1.0 1st Qu.:69.00 male :65
## Median : 98.30 Median :1.5 Median :74.00
## Mean : 98.25 Mean :1.5 Mean :73.76
## 3rd Qu.: 98.70 3rd Qu.:2.0 3rd Qu.:79.00
## Max. :100.80 Max. :2.0 Max. :89.00
```

The mean and the median are both about 0.3 degrees below **98.6.**

Let’s look at the distribution of the data to get a sense for what’s going one.

- Let’s make a histogram of all of the human body temps.

- We can add a line for the
**mean**(y.hat) using the command**abline()**.

- We give
**abline**the arugment**v=**to tell it to make a vertical line. - We also pass abline the arguement
**col=“green”**to change the color to green - the name of the color must be in quotes. col = green will not work.
- Finally, we pass abline the arguement “lwd=2” to increase the size of the line.

**lwd**means “line width”

```
#1st, Calculate the mean and store into an R object
overall.mean <- mean(dat$temp)
#make the histogram of the $temp column
hist(dat$temp)
#add a line for the mean
abline(v = overall.mean, col = 3, lwd = 2)
```

We’ll use the following command to add further info to the histogram

- mean()
- median()
- sd()
- abline
- legend()
- col =
- lty =
- lwd =

First, caculate sum summary stats and save them into R objects.

```
#Calculate the mean
overall.mean <- mean(dat$temp)
#Calculate the MEDIAN
overall.median <- median(dat$temp)
#Calculate the standard deviation
overall.sd <- sd(dat$temp)
```

Now, plot the data and add addition information

```
#make the histogram of the $temp column
hist(dat$temp,
xlab = "Oral temperature (F)",
main = "")
#add a green line for the mean
abline(v = overall.mean,
col = "green",
lwd = 2)
#add a line for the median
# make the color BLUE
# make the line DASHED
abline(v = overall.median,
col = "blue",
lwd = 2,
lty = 2)
# Add lines for +/- 1 SD
# make the color RED
#plus 1 SD. Note that I add the SD to the mean
abline(v = overall.mean+overall.sd,
col = "red",
lwd = 2, lty = 2)
#minus 1 SD Note that I subtract the SD from the mean
abline(v = overall.mean-overall.sd,
col = 2,
lwd = 2,
lty = 2)
#Add a line for where 98.6 is
abline(v = 98.6,
col = "lightblue",
lwd = 2,
lty = 4)
#Add a legend
legend.text <- c("mean",
"median",
"+/- 1 SD",
"98.6 degrees F")
colors.used <- c("green","blue","red","lightblue")
legend("topright",
legend = legend.text, #what each line is
col = colors.used, #what each color is
lty = c(1,2,2,4), #the type of dashs used
lwd = 2) #the width of the line
```

Note that 98.6 is **higher** than the mean and the median.

The mean and median of our sample are both less than the conventional value of 98.6. But could this just be due to sampling variation? We can test the hypothesis that this difference is not due to chance using a one-sample t-test.

- We will test the hypothesis that the mean of this sample of data is different than the conventional wisdom that the “normal” human temperature is
**98.6 degrees F.**

- The
**null hypothesis (Ho)**is that the mean of this sample equals 98.6 - Recall that
**“mu”**is the population mean (aka the population parameter).

First, we make an object that contains our value for mu, 98.6. Then we carry out a one-sample t-test.

```
normal.temp <- 98.6
t.test(dat$temp,
mu = normal.temp)
```

```
##
## One Sample t-test
##
## data: dat$temp
## t = -5.4548, df = 129, p-value = 2.411e-07
## alternative hypothesis: true mean is not equal to 98.6
## 95 percent confidence interval:
## 98.12200 98.37646
## sample estimates:
## mean of x
## 98.24923
```

This code is **very complex**; just focus on the plot that it makes.

```
#save t-test to R object
one.sample.t.test.output <- t.test(dat$temp,
mu = normal.temp)
#extract important stuff from t-test
mean.out <- one.sample.t.test.output$estimate
min.max <- one.sample.t.test.output$conf.int
ci <- one.sample.t.test.output$conf.int
min.max[1] <- min.max[1] -min.max[1]*.001
min.max[2] <- 98.6+0.01
#plot the mean value
plot(mean.out, ylim = min.max, pch = 16, cex =2,
xlab = "",
ylab = "Mean body temp",
xaxt = "n")
#add errorbars bassed on the confidence intervals
segments(x0 = 1, x1 = 1,
y0 = mean.out, y1 = ci[2], lwd = 2)
segments(x0 = 1, x1 = 1,
y0 = mean.out, y1 = ci[1], lwd = 2)
#show where mu is
abline(h = 98.6, col = "red", lty = 2)
#add annotation
text(x = 0.65, y = 98.55, "mu = 98.6")
text(x = 0.7095, y = 98.1, "Error bars = 95%CI")
p.out <- paste("p=", round(one.sample.t.test.output$p.value,7))
text(x = 1.35, y = 98.1, p.out)
```

The p value is MUCH lower than the traditional cutoff of **0.05**.

This 1-sample t-test indicates that the mean temperature of this sample is less than 98.6 and the difference is very unlikely to be due to change.

The most plausible range of values for the true, biologically determine population mean (mu) is between **98.12200** and **98.37646**, as defined by the 95% confidence interval. 98.6 is therefore a very inplausible value!

Note, however, than many people did have a higher temp than 98.6; there is significant variation in the body temp of healthy individual.

`boxplot(temp ~ gender.MF, data = dat)`

Males appear to have a lower median temp than female. Is this biologically based or could this difference just be due to chance?

- We can assess this using a 2-sample t-test.

- Before, we had all the data pooled together, hence we did a “1-sample” test.
- Now we are splitting the data into two groups and therefore its a
**two sample**test - Two sample tests proably the most common type of t-test
- Therefore, if someones says that they did a t-test, the probalby mean a 2-sample test

- For a 2-sample test we don’t specify “mu”
- We specify the test using the equation notation just like the boxplot, using the tilda symbol (~)

```
t.test(temp ~ gender.MF,
data = dat)
```

```
##
## Welch Two Sample t-test
##
## data: temp by gender.MF
## t = 2.2854, df = 127.51, p-value = 0.02394
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.03881298 0.53964856
## sample estimates:
## mean in group female mean in group male
## 98.39385 98.10462
```

- What does this test indicate about male and female body temp?
- Female body temp is higher - is it closer to 98.6?

`plot(temp ~ heartrate,data = dat)`

There might be some relationship here. We could explore this further if we wanted with linear regression.

Subset the data by males and females

```
males <- subset(dat,
gender.MF == "male")
females <- subset(dat,
gender.MF == "female")
```

Plot males and females seperately

```
par(mfrow = c(1,1))
#set xlims
xlims <- c(min(dat$heartrate),max(dat$heartrate))
plot(temp ~ heartrate,
data = males,
xlim = xlims)
points(temp ~ heartrate,
data = females,
col = "green")
```

The patterns appear to be similar. We could explore this further with multiple linear regression.

- A paired t-test is also a common type of t-test
- A common type of pairing is before and after a treatment occurred
- A paired treatment can be set up a couple different ways in R
- Mathematically, a paired t-test is similar to a 1-sample t-test.
- With a paired t-test we are not interested in whether the overall means of each group or time point (ie before, after) are difference
- We are interested in whether the difference between each pair of measurements is consistently different from zero
**A paired t-test is therefore equivalent to a 1-sample t-test where the population parameter mu=0**

- We will look at paired t-tests with some fake data.

- I’ve simulated some data representing a person’s body temp before the experimental treatment occured (before)
- I’ve also simualted body temps after a stress treatment occurs (stressed)
- The hypothesis (Ha) is that being stressed changes your body temp.
- The null hypothesis (Ho) is that there is no consistent impact of stress on body temp.

```
lm.out <- lm(temp ~ heartrate, data = dat)
temp.orig <- simulate(lm.out, seed = 100)
temp.stress <- simulate(lm.out, seed = 101) + 0.5+ rnorm(length(dat), mean = 0, sd = sd(dat$temp)/2)
new.dat <- data.frame(before = temp.orig$sim_1,stressed = temp.stress$sim_1)
```

```
#set x axis limits
xlims. <- c(min(new.dat), max(new.dat))
#set up to plot 2 histograms next to each other
par(mfrow = c(1,2))
#plot the 1st hist
hist(new.dat$before,
main = "",
xlim =xlims.)
#add line for the mean
abline(v = mean(new.dat$before),
col = "green",
lwd = 2)
#plot 2nd hist
hist(new.dat$stressed,
main = "",
xlim = xlims.)
#add line
abline(v = mean(new.dat$stressed),
col = "blue",
lwd = 2)
```

(I intentionaly made the means to be different, so the differene is not surprising!)

- We can do some simple math to calculate teh difference between before and after
- we use the dollar sign ($) to select the columns we want

`difference1 <- new.dat$before - new.dat$stressed`

We can add this to our dataframe like this

`new.dat$difference1 <- difference1`

Note that we could do this all in 1 step if we wanted. Lets flip the order of the values.

`new.dat$difference2<- new.dat$stressed - new.dat$before `

Look at distribution of differences

`hist(new.dat$difference1)`

- We’ll complete the paired t-test process by conducting a 1-sample test, setting mu = 0.
- We are therefore testing the hypothesis that the average
**difference**between before and stressed is greater than zero

```
t.test(new.dat$difference1 ,
mu = 0)
```

```
##
## One Sample t-test
##
## data: new.dat$difference1
## t = -9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.9823678 -0.6452696
## sample estimates:
## mean of x
## -0.8138187
```

t.test when the difference is calcualted as “new.dat\(before - new.dat\)stressed”"

```
t.test(new.dat$difference1 ,
mu = 0)
```

```
##
## One Sample t-test
##
## data: new.dat$difference1
## t = -9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.9823678 -0.6452696
## sample estimates:
## mean of x
## -0.8138187
```

t.test when the difference is calcualted as new.dat\(stressed - new.dat\)before

```
t.test(new.dat$difference2 ,
mu = 0)
```

```
##
## One Sample t-test
##
## data: new.dat$difference2
## t = 9.5531, df = 129, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.6452696 0.9823678
## sample estimates:
## mean of x
## 0.8138187
```

Can you spot the differences? There are 2

This code allows you to re-build the dataframe without loading any data.

```
temp <- c(96.3,96.7,96.9,97,97.1,97.1,97.1,97.2,97.3,97.4,97.4,97.4,97.4,97.5,97.5,97.6,97.6,97.6,97.7,97.8,97.8,97.8,97.8,97.9,97.9,98,98,98,98,98,98,98.1,98.1,98.2,98.2,98.2,98.2,98.3,98.3,98.4,98.4,98.4,98.4,98.5,98.5,98.6,98.6,98.6,98.6,98.6,98.6,98.7,98.7,98.8,98.8,98.8,98.9,99,99,99,99.1,99.2,99.3,99.4,99.5,96.4,96.7,96.8,97.2,97.2,97.4,97.6,97.7,97.7,97.8,97.8,97.8,97.9,97.9,97.9,98,98,98,98,98,98.1,98.2,98.2,98.2,98.2,98.2,98.2,98.3,98.3,98.3,98.4,98.4,98.4,98.4,98.4,98.5,98.6,98.6,98.6,98.6,98.7,98.7,98.7,98.7,98.7,98.7,98.8,98.8,98.8,98.8,98.8,98.8,98.8,98.9,99,99,99.1,99.1,99.2,99.2,99.3,99.4,99.9,100,100.8)
gender <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
heartrate <- c(70,71,74,80,73,75,82,64,69,70,68,72,78,70,75,74,69,73,77,58,73,65,74,76,72,78,71,74,67,64,78,73,67,66,64,71,72,86,72,68,70,82,84,68,71,77,78,83,66,70,82,73,78,78,81,78,80,75,79,81,71,83,63,70,75,69,62,75,66,68,57,61,84,61,77,62,71,68,69,79,76,87,78,73,89,81,73,64,65,73,69,57,79,78,80,79,81,73,74,84,83,82,85,86,77,72,79,59,64,65,82,64,70,83,89,69,73,84,76,79,81,80,74,77,66,68,77,79,78,77)
dat <- data.frame(temp = temp,
gender = gender,
heartrate = heartrate)
```