Time Series: Atmospheric CO\(_2\) Concentrations

In this section, we looked at some basic time series models and plots in R. Below is a summary of what we covered, followed by the script file.

# Time series data for co2 conc.
# 2017-10-29
# Forecasted using co2 data that only goes up to 1997.
# Since the NOAA still collects data, we were able to 
# compare there forecasts to the actual data.
# It seems that the CO2 concentration is increasing 
# faster than our simple models predicted.

#install.packages("forecast") # Make this a comment after installing
# so that you don't try to install it every time
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
data(co2)
#?co2 # comment out help files so they don't run every time

plot(co2)
class(co2)
## [1] "ts"
# I commented these out because I don't want the data to show up
# in the output
#co2
#time(co2)
#frequency(co2)

plot(co2)
abline(lm(co2 ~ time(co2)))
lines(aggregate(co2, FUN=mean), col = 'red')

aggregate(co2, FUN=mean)
## Time Series:
## Start = 1959 
## End = 1997 
## Frequency = 1 
##  [1] 315.8258 316.7475 317.4850 318.2975 318.8325 319.4625 319.8725
##  [8] 321.2100 322.0200 322.8900 324.4592 325.5175 326.1550 327.2933
## [15] 329.5117 330.0792 330.9858 331.9858 333.7300 335.3358 336.6808
## [22] 338.5150 339.7608 340.9592 342.6083 344.2467 345.7258 346.9750
## [29] 348.7508 351.3133 352.7542 354.0367 355.4783 356.2917 356.9958
## [36] 358.8800 360.9142 362.6867 363.8175
# Hey look what happens here!
monthplot(co2)

seasonplot(co2, col = heat.colors(39))

# Autocorrelation function
acf(co2)

pacf(co2)

co2.ar <- auto.arima(co2)
summary(co2.ar)
## Series: co2 
## ARIMA(1,1,1)(1,1,2)[12]                    
## 
## Coefficients:
##          ar1      ma1     sar1     sma1     sma2
##       0.2569  -0.5847  -0.5489  -0.2620  -0.5123
## s.e.  0.1406   0.1204   0.5879   0.5701   0.4819
## 
## sigma^2 estimated as 0.08576:  log likelihood=-84.39
## AIC=180.78   AICc=180.97   BIC=205.5
## 
## Training set error measures:
##                      ME     RMSE       MAE        MPE       MAPE      MASE
## Training set 0.01742092 0.287159 0.2303994 0.00507377 0.06845665 0.1819636
##                     ACF1
## Training set -0.00285809
co2.fore <- forecast(co2.ar, h = 4*12)
co2.fore
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1998       365.1519 364.7766 365.5272 364.5779 365.7259
## Feb 1998       365.9244 365.4722 366.3766 365.2329 366.6160
## Mar 1998       366.7579 366.2548 367.2611 365.9884 367.5275
## Apr 1998       368.1529 367.6066 368.6991 367.3175 368.9882
## May 1998       368.7106 368.1253 369.2960 367.8154 369.6059
## Jun 1998       367.9974 367.3756 368.6193 367.0464 368.9485
## Jul 1998       366.5046 365.8483 367.1609 365.5009 367.5083
## Aug 1998       364.4546 363.7656 365.1436 363.4009 365.5083
## Sep 1998       362.5883 361.8681 363.3085 361.4868 363.6898
## Oct 1998       362.7245 361.9744 363.4746 361.5773 363.8717
## Nov 1998       364.1808 363.4019 364.9597 362.9896 365.3720
## Dec 1998       365.6001 364.7934 366.4067 364.3664 366.8337
## Jan 1999       366.6541 365.8001 367.5082 365.3479 367.9603
## Feb 1999       367.4890 366.5970 368.3811 366.1248 368.8533
## Mar 1999       368.3600 367.4332 369.2868 366.9426 369.7774
## Apr 1999       369.6847 368.7249 370.6446 368.2167 371.1528
## May 1999       370.2598 369.2680 371.2516 368.7430 371.7766
## Jun 1999       369.5900 368.5673 370.6127 368.0260 371.1540
## Jul 1999       368.0811 367.0284 369.1337 366.4712 369.6909
## Aug 1999       366.0296 364.9479 367.1114 364.3752 367.6841
## Sep 1999       364.1807 363.0705 365.2908 362.4829 365.8785
## Oct 1999       364.2948 363.1570 365.4326 362.5547 366.0349
## Nov 1999       365.7120 364.5472 366.8769 363.9306 367.4935
## Dec 1999       367.1133 365.9221 368.3045 365.2915 368.9351
## Jan 2000       368.1622 366.9350 369.3895 366.2854 370.0391
## Feb 2000       368.9957 367.7369 370.2545 367.0706 370.9209
## Mar 2000       369.8545 368.5658 371.1433 367.8836 371.8255
## Apr 2000       371.2200 369.9022 372.5378 369.2046 373.2354
## May 2000       371.7861 370.4399 373.1323 369.7272 373.8450
## Jun 2000       371.0926 369.7186 372.4666 368.9913 373.1940
## Jul 2000       369.5926 368.1913 370.9938 367.4495 371.7356
## Aug 2000       367.5420 366.1140 368.9699 365.3581 369.7258
## Sep 2000       365.6835 364.2293 367.1377 363.4595 367.9075
## Oct 2000       365.8097 364.3298 367.2897 363.5463 368.0731
## Nov 2000       367.2484 365.7431 368.7537 364.9463 369.5505
## Dec 2000       368.6595 367.1294 370.1897 366.3193 370.9998
## Jan 2001       369.7113 368.1450 371.2776 367.3158 372.1068
## Feb 2001       370.5456 368.9479 372.1432 368.1021 372.9890
## Mar 2001       371.4110 369.7835 373.0385 368.9220 373.9000
## Apr 2001       372.7541 371.0977 374.4106 370.2208 375.2875
## May 2001       373.3252 371.6402 375.0101 370.7483 375.9021
## Jun 2001       372.6447 370.9318 374.3576 370.0250 375.2644
## Jul 2001       371.1398 369.3993 372.8802 368.4780 373.8015
## Aug 2001       369.0887 367.3212 370.8562 366.3855 371.7919
## Sep 2001       367.2354 365.4412 369.0296 364.4915 369.9794
## Oct 2001       367.3550 365.5346 369.1755 364.5709 370.1392
## Nov 2001       368.7819 366.9356 370.6283 365.9581 371.6057
## Dec 2001       370.1877 368.3157 372.0596 367.3248 373.0506
plot(co2.fore, main = "Forecasted co2",
     xlim = c(1995, 2002), ylim = c(355, 375))

# Forecasting with loess smoothers
#?stl
co2.stl <- stl(co2, s.window = "period")
plot(co2.stl)

plot(forecast(co2.stl, h = 4*12),
     xlim= c(1995, 2002), ylim = c(355, 375))
lines(co2.fore$mean, col = 'purple', lwd = 2)
legend('topleft', legend = c("ARIMA", "STL"),
       col = c("purple", "blue"), lwd=2)

# Load in the new data - contains co2 up to 2017
newco2 <- read.csv("C:/Users/d7bec/Desktop/newco2.csv")
plot(newco2$Average, type = 'l')

# Some of the average co2 conc's are below 0 - that's
# not how concentrations work.
# Missing data was recorded as -99.99.
ave <- newco2$Average
which(ave < 0)
## [1]   4   8  72  73  74 214 314
ave[which(ave < 0)] <- NA

plot(ave, type = 'l')

# Create a time series object to take advantage of 
# time series functions
newts <- ts(ave, start = c(1958, 3),
            end = c(2017, 9), frequency = 12)

plot(newts)

plot(newts, xlim = c(1995, 2017), ylim = c(355,415))
forc.arima <- forecast(co2.ar, h = 20*12)
forc.stl <- forecast(co2.stl, h = 20*12)

lines(forc.arima$mean, col = "purple")
lines(forc.stl$mean, col = "blue")

plot(forc.arima, xlim = c(1995, 2017), ylim = c(355, 415))
lines(newts)

Linear Modelling: CO\(_2\) Uptake in Chilled Environments

Model Notation
\(y = \beta_0+\beta_1x_1+\beta_2x_2\) y ~ x1 + x2
\(y = \beta_1x_1 +\beta_2x_2\) y ~ x1 + x2 - 1
\(y = \beta_0 + \beta_1x_1x_2\) y ~ x1 : x2
\(y = \beta_0+\beta_1x_1+\beta_2x_2 +\beta_3x_1x_2\) y ~ x1 + x2 + x1:x2
\(y = \beta_0+\beta_1x_1+\beta_2x_2 +\beta_3x_1x_2\) y ~ x1 * x2
\(y = \beta_0+\beta_1x_1+\beta_2x_2 +\beta_3x_1x_2\) y ~ (x1 + x2)^2
\(y = \beta_0 + \beta_1x_1^2\) y ~ I(x1^2)
\(y = \beta_0 + \beta_1x_1 + \beta_2x_1^2\) y ~ poly(x1, 2)
\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ...\) y ~ .
# Linear Modelling - CO2 data

data(CO2)
#?CO2

plot(uptake ~ conc, data = CO2)

boxplot(uptake ~ Treatment, data = CO2)

hist(CO2$uptake)

hist(CO2$conc)

# I probably shouldn't fit a linear model to this data.

# Fitting a linear model to this data ----

colm <- lm(uptake ~ conc, data = CO2)
summary(colm)
## 
## Call:
## lm(formula = uptake ~ conc, data = CO2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.831  -7.729   1.483   7.748  16.394 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 19.500290   1.853080  10.523  < 2e-16 ***
## conc         0.017731   0.003529   5.024 2.91e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.514 on 82 degrees of freedom
## Multiple R-squared:  0.2354, Adjusted R-squared:  0.2261 
## F-statistic: 25.25 on 1 and 82 DF,  p-value: 2.906e-06
plot(uptake ~ conc, data = CO2)
abline(colm)

coef(colm)
## (Intercept)        conc 
## 19.50028981  0.01773059
abline(a = 19.5, b = 0.0177, col = 'hotpink')

colm# not interesting
## 
## Call:
## lm(formula = uptake ~ conc, data = CO2)
## 
## Coefficients:
## (Intercept)         conc  
##    19.50029      0.01773
summary(colm)
## 
## Call:
## lm(formula = uptake ~ conc, data = CO2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.831  -7.729   1.483   7.748  16.394 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 19.500290   1.853080  10.523  < 2e-16 ***
## conc         0.017731   0.003529   5.024 2.91e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.514 on 82 degrees of freedom
## Multiple R-squared:  0.2354, Adjusted R-squared:  0.2261 
## F-statistic: 25.25 on 1 and 82 DF,  p-value: 2.906e-06
names(summary(colm))
##  [1] "call"          "terms"         "residuals"     "coefficients" 
##  [5] "aliased"       "sigma"         "df"            "r.squared"    
##  [9] "adj.r.squared" "fstatistic"    "cov.unscaled"
str(summary(colm))
## List of 11
##  $ call         : language lm(formula = uptake ~ conc, data = CO2)
##  $ terms        :Classes 'terms', 'formula'  language uptake ~ conc
##   .. ..- attr(*, "variables")= language list(uptake, conc)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "uptake" "conc"
##   .. .. .. ..$ : chr "conc"
##   .. ..- attr(*, "term.labels")= chr "conc"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(uptake, conc)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "uptake" "conc"
##  $ residuals    : Named num [1:84] -5.18 7.8 10.87 11.49 6.93 ...
##   ..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
##  $ coefficients : num [1:2, 1:4] 19.50029 0.01773 1.85308 0.00353 10.52318 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "(Intercept)" "conc"
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:2] FALSE FALSE
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "conc"
##  $ sigma        : num 9.51
##  $ df           : int [1:3] 2 82 2
##  $ r.squared    : num 0.235
##  $ adj.r.squared: num 0.226
##  $ fstatistic   : Named num [1:3] 25.2 1 82
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:2, 1:2] 3.79e-02 -5.98e-05 -5.98e-05 1.38e-07
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "(Intercept)" "conc"
##   .. ..$ : chr [1:2] "(Intercept)" "conc"
##  - attr(*, "class")= chr "summary.lm"
# Extract the coefficients
summary(colm)$coef
##                Estimate  Std. Error   t value     Pr(>|t|)
## (Intercept) 19.50028981 1.853080024 10.523177 6.967392e-17
## conc         0.01773059 0.003528853  5.024461 2.905607e-06
summary(colm)$coef[1,1]
## [1] 19.50029
summary(colm)$coef[1,]
##     Estimate   Std. Error      t value     Pr(>|t|) 
## 1.950029e+01 1.853080e+00 1.052318e+01 6.967392e-17
summary(colm)$coef[,4]
##  (Intercept)         conc 
## 6.967392e-17 2.905607e-06
# Multiple Linear Regression ----
colmm <- lm(uptake ~ conc + Treatment, data = CO2)
summary(colmm)
## 
## Call:
## lm(formula = uptake ~ conc + Treatment, data = CO2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.401  -7.066  -1.168   7.573  17.597 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      22.930052   1.989746  11.524  < 2e-16 ***
## conc              0.017731   0.003306   5.364 7.55e-07 ***
## Treatmentchilled -6.859524   1.944840  -3.527 0.000695 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.912 on 81 degrees of freedom
## Multiple R-squared:  0.3372, Adjusted R-squared:  0.3208 
## F-statistic:  20.6 on 2 and 81 DF,  p-value: 5.837e-08
# This only changes the intercept for Treatment = chilled

# Slope and intercept
colmm2 <- lm(uptake ~ conc * Treatment, data = CO2)
# * means conc + Treatment + conc:Treatment
summary(colmm2)
## 
## Call:
## lm(formula = uptake ~ conc * Treatment, data = CO2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.218  -7.401  -1.117   7.835  17.209 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           22.019163   2.464160   8.936 1.17e-13 ***
## conc                   0.019825   0.004693   4.225 6.29e-05 ***
## Treatmentchilled      -5.037747   3.484848  -1.446    0.152    
## conc:Treatmentchilled -0.004188   0.006636  -0.631    0.530    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.946 on 80 degrees of freedom
## Multiple R-squared:  0.3405, Adjusted R-squared:  0.3157 
## F-statistic: 13.77 on 3 and 80 DF,  p-value: 2.528e-07
# Comparing Models ----
anova(colmm, colmm2)
## Analysis of Variance Table
## 
## Model 1: uptake ~ conc + Treatment
## Model 2: uptake ~ conc * Treatment
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     81 6433.9                           
## 2     80 6402.0  1    31.871 0.3983 0.5298
# p-value is 0.53 so there is no significant difference
# between the variances in the model.
# The models have approximately the same fit.

AIC(colmm)
## [1] 610.8169
AIC(colmm2)
## [1] 612.3997
# Model without conc:Treatment is better (lower AIC).
# Use uptake ~ conc + Treatment as the model

BIC(colmm)
## [1] 620.5401
BIC(colmm2)
## [1] 624.5538
# Checking the residuals ----
plot(colmm$residuals ~ CO2$uptake)

# This looks really bad - there shouldn't be patterns!

qqnorm(colmm$residuals)
qqline(colmm$residuals)

hist(colmm$residuals)

# Conclusion: Our best model is still pretty bad.

# Challenge
data(mtcars)
plot(mpg ~ disp, data = mtcars)

plot(mpg ~ wt, data = mtcars)

boxplot(mpg ~ am, data = mtcars)

# fit a linear model and tell me about it.

More Modelling: mtcars and Logistic Regression

In this section, I include some of the example code. The first part shows the effect of interaction in a linear model; using * instead of + in your formula (see the table in the previous section).

data(mtcars)

lm1 <- lm(mpg ~ disp + am, data = mtcars)
lm2 <- lm(mpg ~ disp * am, data = mtcars)

plot(mpg ~ disp, col = am + 1, data=mtcars, pch = 16)
coef(lm1)
## (Intercept)        disp          am 
## 27.84808111 -0.03685086  1.83345825
coef(lm2)
## (Intercept)        disp          am     disp:am 
## 25.15706407 -0.02758360  7.70907298 -0.03145482
abline(a = 27.848 + 1.833, b = -0.03685, col = 1)
abline(a = 27.848, b = -0.03685, col = 2)

abline(a = 25.15706407, 
       b = -0.02758360, 
       col = 1, lty = 2)
abline(a = 25.15706407 + 7.70907298, 
       b = -0.02758360-0.03145482 , 
       col = 2, lty = 2)
legend('topright', legend=c("With *", "With +"), lty = 2:1)

# ~. performs 
mtcars.all <- lm (mpg ~ ., data = mtcars)
mtfinal <- step(mtcars.all)
## Start:  AIC=70.9
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - cyl   1    0.0799 147.57 68.915
## - vs    1    0.1601 147.66 68.932
## - carb  1    0.4067 147.90 68.986
## - gear  1    1.3531 148.85 69.190
## - drat  1    1.6270 149.12 69.249
## - disp  1    3.9167 151.41 69.736
## - hp    1    6.8399 154.33 70.348
## - qsec  1    8.8641 156.36 70.765
## <none>              147.49 70.898
## - am    1   10.5467 158.04 71.108
## - wt    1   27.0144 174.51 74.280
## 
## Step:  AIC=68.92
## mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1    0.2685 147.84 66.973
## - carb  1    0.5201 148.09 67.028
## - gear  1    1.8211 149.40 67.308
## - drat  1    1.9826 149.56 67.342
## - disp  1    3.9009 151.47 67.750
## - hp    1    7.3632 154.94 68.473
## <none>              147.57 68.915
## - qsec  1   10.0933 157.67 69.032
## - am    1   11.8359 159.41 69.384
## - wt    1   27.0280 174.60 72.297
## 
## Step:  AIC=66.97
## mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - carb  1    0.6855 148.53 65.121
## - gear  1    2.1437 149.99 65.434
## - drat  1    2.2139 150.06 65.449
## - disp  1    3.6467 151.49 65.753
## - hp    1    7.1060 154.95 66.475
## <none>              147.84 66.973
## - am    1   11.5694 159.41 67.384
## - qsec  1   15.6830 163.53 68.200
## - wt    1   27.3799 175.22 70.410
## 
## Step:  AIC=65.12
## mpg ~ disp + hp + drat + wt + qsec + am + gear
## 
##        Df Sum of Sq    RSS    AIC
## - gear  1     1.565 150.09 63.457
## - drat  1     1.932 150.46 63.535
## <none>              148.53 65.121
## - disp  1    10.110 158.64 65.229
## - am    1    12.323 160.85 65.672
## - hp    1    14.826 163.35 66.166
## - qsec  1    26.408 174.94 68.358
## - wt    1    69.127 217.66 75.350
## 
## Step:  AIC=63.46
## mpg ~ disp + hp + drat + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - drat  1     3.345 153.44 62.162
## - disp  1     8.545 158.64 63.229
## <none>              150.09 63.457
## - hp    1    13.285 163.38 64.171
## - am    1    20.036 170.13 65.466
## - qsec  1    25.574 175.67 66.491
## - wt    1    67.572 217.66 73.351
## 
## Step:  AIC=62.16
## mpg ~ disp + hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     6.629 160.07 61.515
## <none>              153.44 62.162
## - hp    1    12.572 166.01 62.682
## - qsec  1    26.470 179.91 65.255
## - am    1    32.198 185.63 66.258
## - wt    1    69.043 222.48 72.051
## 
## Step:  AIC=61.52
## mpg ~ hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - hp    1     9.219 169.29 61.307
## <none>              160.07 61.515
## - qsec  1    20.225 180.29 63.323
## - am    1    25.993 186.06 64.331
## - wt    1    78.494 238.56 72.284
## 
## Step:  AIC=61.31
## mpg ~ wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## <none>              169.29 61.307
## - am    1    26.178 195.46 63.908
## - qsec  1   109.034 278.32 75.217
## - wt    1   183.347 352.63 82.790
plot(mtfinal$residuals ~ mtcars$mpg)

summary(lm(mpg ~ am, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285
# Anova and linear models are the same!
anova(aov(mpg ~ am, data = mtcars))
## Analysis of Variance Table
## 
## Response: mpg
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## am         1 405.15  405.15   16.86 0.000285 ***
## Residuals 30 720.90   24.03                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Silly examples
# Logistic Regression
plot(am ~ mpg, data = mtcars)

ampg <- glm(am ~ mpg, data = mtcars, family = binomial())
summary(ampg)
## 
## Call:
## glm(formula = am ~ mpg, family = binomial(), data = mtcars)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5701  -0.7531  -0.4245   0.5866   2.0617  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -6.6035     2.3514  -2.808  0.00498 **
## mpg           0.3070     0.1148   2.673  0.00751 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 29.675  on 30  degrees of freedom
## AIC: 33.675
## 
## Number of Fisher Scoring iterations: 5
#?predict
newdata <- data.frame(mpg = seq(10,35,0.5))
ampg.pred <- predict(object = ampg, newdata = newdata,
                     type = 'response')

plot(am ~ mpg, data = mtcars)
lines(ampg.pred ~ newdata$mpg)

# Do not run:
# Repeated measures anova
anova(aov(response ~ predictor + Error(subject_id)))
?Error

ggplot2 Examples

ggplot2 is a relatively new plotting method that creates beautiful plots easily. I believe it is worth learning the base R functions for exploratory plotting, but look into ggplot2 for final reports. It makes a lot of plots very easy, but fine-tuning plots gets finnicky. If it’s really hard to do in ggplot2, then it’s probably not a good plot in the first place.

#install.packages("ggplot2")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
#Base R
plot(mpg ~ disp, col = am + 1, data = mtcars)

## ggplot2
ggplot(data = mtcars, 
       mapping = aes(x = disp, 
                     y = mpg, 
                     colour = factor(am))) +
    geom_point() + geom_smooth(method = "lm") + 
    theme_bw() + xlab("Displacement (Cu.in)") +
    ylab("Miles Per Gallon") +
    ggtitle(label = "MPG against displacement",
            subtitle = "Manual cars are better")

ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) + 
    geom_density(alpha = 0.5) + theme_minimal() +
    facet_grid(~am)

RMarkdown

RMarkdown is a fantastic tool for creating reports. You can use some plain English text and then switch to LaTeX code for equations and then add some R code that will be evaluated. The R code chunks can show the code, output, neither, or both. It can be used to make webpages, blogs, and applications in html, papers, books, and reports in pdf (requires LaTeX), and reports in docx (requires Word or LibreOffice).

Your code will be run each time you compile, so make sure the code can run on its own (you’ll need to have code chunks that import data and do all of the cleaning). The great thing about this is that if your data changes, your report changes too! You don’t have to re-make your plots and tables, they automatically update.

If you click the green plus in RStudio to open a new file, select RMarkdown. The default file has a nice, short overview to get you started. Fine-tuning reports can be fickle.

Further Topics

I obviously couldn’t cover everything. Here are some things to help you keep going with R:

Thanks for attending this workshop!