Changes to this lab will be announced via email and the course FaceBook page CalU EcoStats.

For more labs and tutorials see my WordPress Site lo.brow.R

1 Outline

  • Regression model building
  • Regression model testing
  • Regression diagnostics
  • Transformation
  • Reporting results of regression
  • Multiple regression
  • Repeated measures data

** These data come from:** Meredith et al 1991 Repeated measures experiments in forestry: focus on analysis of response curves. Can. J. For. Res.

2 Data formatting

  • Wide format
  • Long form

2.1 Wide format

  • The 1st column is the concentration of aluminum (AL) that sugar maple seeds were treated with.
  • Each ROW is a different tree
  • Height growth was then measured for 4 weeks
  • Height was recored in a seperate column
  • This is called “wide” format b/c data for an individual thing being studied is read left to right
  • This is a common way to collect data and present it in a table and is easy to read


The size of the dataframe

dat.orig <- read.csv(file = "data_orig.csv")
dim(dat.orig)
## [1] 67  6

2.1.0.1 Data in wide format

head(dat.orig)
##   X conc.AL ht.wk.1 ht.wk.2 ht.wk.3 ht.wk.4
## 1 1       0      60      62      78     104
## 2 2       0      41      50      60      60
## 3 3       0      85      97     115     120
## 4 4       0      88      87      90      80
## 5 5       0      66      65      80      95
## 6 6       0     106     100     133     172



2.2 Long format

  • This is how R needs data to be formatted for regression and ANOVA
  • t.test also uses data in this format
  • Note that there are MANY rows of data
  • Data in this format is not very easy to read by eye
  • This format matches how the math gets done by the computer
  • ALL respone data (y variables, “height”) are in a SINGLE column
  • ALL predictor data (x variable, week) are in a single columns

The size of the dataframe

data.long <- read.csv(file = "data_long.csv")
dim(data.long)
## [1] 268   4

2.2.0.1 Data in long format

head(data.long)
##   X height week conc.AL
## 1 1     60    1       0
## 2 2     41    1       0
## 3 3     85    1       0
## 4 4     88    1       0
## 5 5     66    1       0
## 6 6    106    1       0



3 Plotting regression data

  • Plot regression style data wit the plot() command
plot(height ~ week, 
     data = data.long,
     main = "Seedling growth: Height ~ week"
     )

3.0.1 Extra: changing plotting symbols and other features

You can change … * color using col = 2 (or another number) * symbol with pch = … * x axis label w/ xlab = * y axis label w/ ylab =

4 Basic regression analysis 1: height ~ week

  • We are going to do a basic regression analysis of these data.
  • A VERY important caveat here is that we are ignoring the fact that the same seedlings were measured multiple times
  • We are therefore treating each row of data as independent * That is, as if EACH ROW represented a completley different plant that had not been measured before and will not be measured again!
  • This is therefore a completely invalid analysis and just done for illustration!
  • In reality, this data is considered “repeated measures” or “longitudinal”
  • Proper analysis requires calculating averages growth for each plant, using just the final measurement, or using special statistical techniques

4.1 Model fitting

We fill fit a “null” and alternative model.

4.1.1 Fit a null and alternative model

“Null”" Model

model.null <- lm(height ~ 1, 
     data = data.long)

“Alternative”" Model

model.alt <- lm(height ~ week, 
     data = data.long)

4.2 Test the models

anova(model.null,
      model.alt)
## Analysis of Variance Table
## 
## Model 1: height ~ 1
## Model 2: height ~ week
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    267 964926                                  
## 2    266 907135  1     57791 16.946 5.135e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The F statistic and p-value are very important quantities that need to be reported in a paper.

TASK: Write the F-statistc and p-value for test on the worksheet.

NOTE: Because we sampled the sample plant multiple tiems this p-value is WAY too small and our sample size is WAY too big. This is a form of psuedoreplication.

4.3 Examine the model against the raw data

par(mfrow = c(1,2))
plot(height ~ week, data = data.long,
     main = "Height ~ 1"
     )
abline(model.null, col = 2, lwd = 3)

plot(height ~ week, 
     data = data.long,
     main = "Height ~ week"
     )
abline(model.alt, col = 2, lwd = 3)