For the activites in the course and the in-class demonstrations, we’ll use R Markdown Notebooks. When you execute code within the notebook, the results automagically appear beneath the code. That way the code and the ideas are the methods are intricately entwined.

Before we can use R Markdown Notebooks properly, we need to learn a little about R and R Markdown.

Why are we using/learning R?

There are a multitude of reasons for using and learning R. Here are some of my favorite reasons.

  1. R is free and open-source. R being free is important because everyone has access to it regardless of their income. The poor, lowly graduate student can use it and ALWAYS use it. R being open-source is important because that means we can scrutinize every bit of R’s code and our algorithms are complete available for independent auditing. That means you can really trust what’s in the core of R and if you don’t like something about it, if you have the know how, you can change it.

  2. R is the premier statistical software. If you read about a method in an article, the odds are someone has already written a package that implements that method. It’s not unsual in the statistical world that when you submit a new method to a statistics journal that an R package accompanies that submission. So you won’t need to learn 5 different pieces of statistical software (e.g., SPSS, HLM, Mplus, etc).

  3. R has amazing visualization capabilities including the base R graphics and ggplot2 (which we’ll use in this stats camp).

  4. Once you learn R, and it’ll be hard if you don’t use it regularly, and become an efficient R user and programmer, you’ll discover how much more quickly you can manipulate data, run statistical models, and create beautiful plots.

What are all the hashes and underscores in this Rmd file?

This file is an R Markdown file. I recommend you spend some time this evening to learn download and read the R Markdown cheatsheet. I would strongly encourage you to always write your R code in an R Markdown file. The reason is two-fold.

  1. Your code is intricately woven with your narrative. This will make it easier for you to understand what you’ve done and why you’ve done it. It’ll make your logic more clear when you return 3 - 6 months from now when you need to revisit an analysis for a resubmission or to update the data.

  2. This helps to make your research more reproducible. If you give someone your data and this R Markdown file, you’re giving them a recipe to completely (mostly) reproduce your complete analysis.

So what are R Markdown Notebooks?

It’s essentially a fancy, and new, way to embed R code into an R Markdown file and allow me to create an HTML file that you can edit if you have RStudio or view even if you don’t have RStudio. So, I can, in essence, share my R session with you and you can share your findings with other students and with your advisors that may or may not have RStudio. The difference between a notebook and the usual R Markdown file is that it is interactive (you’ll see what I mean) and that I can share an HTML file with you which you can open up in RStudio and edit.

All the R code will be written inside a chunk.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter. Something should happen.

plot(cars)

So, code is evaluated and plots are created within the document if R code is written within these chunks.

So let’s start learning R.

Reading in and basic manipulation of data in R

# Comments --------------------------------------------------------------------
# Hashtags are what R uses for comments. 
# 2 + 2   # This will not be evaulated
2 + 2     # This will be evaluated
[1] 4

If you are completely new to R, the first thing you’ll note is that we can use it as a calculator.

2 ^ 3   # 2 cubed
[1] 8
log(10) # Take the log of 10
[1] 2.302585
sqrt(9) # Take the square root of 9
[1] 3

For this course, we’ll use data in four formats.

  1. Built-in R data sets
  2. Read in data that is in tabular format
  3. Read in data that is in CSV format (i.e., spreadsheets)
  4. Read in data from SPSS

Let’s look first at how to use a data sets in R. Let’s use the mtcars data set.

data(mtcars)

If you want to see all the data sets available in R.

data()

If you want to see all the data sets available in a specific package, for example, the lme4 package (which is used for mixed effects modeling).

data(package = "lme4")

Odds are that you’ll need to read in your own data. So this is unlikely to be terribly helpful for your own research. If you have a data set where the data are seperated by white space (among other things), you can use the read.table() function.

# Note, I am providing the optional col.names argument here because there are no column names in the data set.
holzinger <- read.table("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/holzinger.dat", col.names = c("id", "sex", "ageyr", "agemo", "school", "grade", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"))

(This data set is also available in the lavaan package.)

data(package = "lavaan", HolzingerSwineford1939)

If you have a data set that comes from Excel, I recommend you save it as a CSV file and then import the CSV file into R that way doing the following:

fmm <- read.csv("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/fmm-dataset.csv")

Finally, SPSS files can be read in using the read.spss() function in the foreign library.

library("foreign")
wiscr <- read.spss("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/wiscsem.sav", to.data.frame = T)
## Or you can call directly from the foreign library
wiscr <- foreign::read.spss("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/wiscsem.sav", to.data.frame = T)

Now that we’ve read in the data sets, we should make sure that it read in correctly. We can do that doing the following.

head(wiscr)
  client agemate info comp arith simil vocab digit pictcomp parang block object
1      3       3    8    7    13     9    12     9        6     11    12      7
2      4       3    9    6     8     7    11    12        6      8     7     12
3      5       3   13   18    11    16    15     6       18      8    11     12
4      6       3    8   11     6    12     9     7       13      4     7     12
5      7       2   10    3     8     9    12     9        7      7    11      4
6      8       3   11    7    15    12    10    12        6     12    10      5
  coding
1      9
2     14
3      9
4     11
5     10
6     10
tail(wiscr)
    client agemate info comp arith simil vocab digit pictcomp parang block object
170    235       6   14   11    12    12    18    13       12     12    11      8
171    237       1    6   11     5    11     8     9        5      9     7      9
172    238       3   10    8    11    11     6    12       12     10    10      6
173    239       1   10   10    10    10     9     5       10     11    10     13
174    240       2   13   12    11    11    12    12       13     11    10     12
175    241       2    9    9    10     8     8     9       10      6     6      7
    coding
170      7
171     10
172     10
173      9
174     12
175      7
summary(wiscr)
     client         agemate           info             comp        arith     
 Min.   :  3.0   Min.   :0.000   Min.   : 3.000   Min.   : 0   Min.   : 4.0  
 1st Qu.: 55.5   1st Qu.:1.000   1st Qu.: 8.000   1st Qu.: 8   1st Qu.: 7.0  
 Median :118.0   Median :2.000   Median :10.000   Median :10   Median : 9.0  
 Mean   :119.5   Mean   :2.051   Mean   : 9.497   Mean   :10   Mean   : 9.0  
 3rd Qu.:183.5   3rd Qu.:3.000   3rd Qu.:11.500   3rd Qu.:12   3rd Qu.:10.5  
 Max.   :241.0   Max.   :6.000   Max.   :19.000   Max.   :18   Max.   :16.0  
     simil           vocab          digit           pictcomp         parang     
 Min.   : 2.00   Min.   : 2.0   Min.   : 0.000   Min.   : 2.00   Min.   : 2.00  
 1st Qu.: 9.00   1st Qu.: 9.0   1st Qu.: 7.000   1st Qu.: 9.00   1st Qu.: 9.00  
 Median :11.00   Median :10.0   Median : 8.000   Median :11.00   Median :10.00  
 Mean   :10.61   Mean   :10.7   Mean   : 8.731   Mean   :10.68   Mean   :10.37  
 3rd Qu.:12.00   3rd Qu.:12.0   3rd Qu.:11.000   3rd Qu.:13.00   3rd Qu.:12.00  
 Max.   :18.00   Max.   :19.0   Max.   :16.000   Max.   :19.00   Max.   :17.00  
     block           object         coding      
 Min.   : 2.00   Min.   : 3.0   Min.   : 0.000  
 1st Qu.: 9.00   1st Qu.: 9.0   1st Qu.: 6.000  
 Median :10.00   Median :11.0   Median : 9.000  
 Mean   :10.31   Mean   :10.9   Mean   : 8.549  
 3rd Qu.:12.00   3rd Qu.:13.0   3rd Qu.:11.000  
 Max.   :18.00   Max.   :19.0   Max.   :15.000  
str(wiscr)
'data.frame':   175 obs. of  13 variables:
 $ client  : num  3 4 5 6 7 8 9 10 12 13 ...
 $ agemate : num  3 3 3 3 2 3 3 2 3 3 ...
 $ info    : num  8 9 13 8 10 11 6 7 10 9 ...
 $ comp    : num  7 6 18 11 3 7 13 10 8 10 ...
 $ arith   : num  13 8 11 6 8 15 7 10 8 8 ...
 $ simil   : num  9 7 16 12 9 12 8 15 14 11 ...
 $ vocab   : num  12 11 15 9 12 10 11 10 9 9 ...
 $ digit   : num  9 12 6 7 9 12 6 7 9 11 ...
 $ pictcomp: num  6 6 18 13 7 6 14 8 10 10 ...
 $ parang  : num  11 8 8 4 7 12 9 14 11 12 ...
 $ block   : num  12 7 11 7 11 10 14 11 10 9 ...
 $ object  : num  7 12 12 12 4 5 14 10 9 13 ...
 $ coding  : num  9 14 9 11 10 10 10 12 6 13 ...
 - attr(*, "variable.labels")= Named chr  "" "" "Information" "Comprehension" ...
  ..- attr(*, "names")= chr  "client" "agemate" "info" "comp" ...

If we want to access a specific variable.

wiscr$info
  [1]  8  9 13  8 10 11  6  7 10  9 11 12  8  9  9 12 12  5 12 11  7 10  9  7 12  8
 [27] 13  8  8 10 10  9 13 12  8 10 11 15  9 10  5  8  5  9 13 10 14  8 11 10  9  7
 [53]  7  5 10  9  8 10 13  8 10 17  5  6 10 10 11  9  5 15 12 14 14  9  3 12  9  8
 [79] 12  5 10 12 13 10 12 13  8  6 12 11 10 13 10  5  4 10 10  5 19  9 10  7 10  5
[105] 13  8  5  8  6 11 11  7 12 11  9 10 14  7  8 11  6 11 13 15  7 10  4  9 10  8
[131]  7  9  3  8  5  4 14 12  8  8 12  5 10  9 10 10 11  9 10  9  7 12  9 13 15 10
[157]  6 13  9  7 12 11  5 11 15  4 13  7  4 14  6 10 10 13  9
# or
wiscr[,"info"]
  [1]  8  9 13  8 10 11  6  7 10  9 11 12  8  9  9 12 12  5 12 11  7 10  9  7 12  8
 [27] 13  8  8 10 10  9 13 12  8 10 11 15  9 10  5  8  5  9 13 10 14  8 11 10  9  7
 [53]  7  5 10  9  8 10 13  8 10 17  5  6 10 10 11  9  5 15 12 14 14  9  3 12  9  8
 [79] 12  5 10 12 13 10 12 13  8  6 12 11 10 13 10  5  4 10 10  5 19  9 10  7 10  5
[105] 13  8  5  8  6 11 11  7 12 11  9 10 14  7  8 11  6 11 13 15  7 10  4  9 10  8
[131]  7  9  3  8  5  4 14 12  8  8 12  5 10  9 10 10 11  9 10  9  7 12  9 13 15 10
[157]  6 13  9  7 12 11  5 11 15  4 13  7  4 14  6 10 10 13  9

We probably should change the type of variables that client to a character variable and agemat to a factor.

wiscr$client <- as.character(wiscr$client)
wiscr$agemate <- as.factor(wiscr$agemate)

If you want to create tables.

# Table of sex in the Holzinger data set
table(holzinger$sex)

  1   2 
146 155 
# Cross tab of sex and age from the Holzinger data set
xtabs(~ sex + ageyr, data = holzinger)
   ageyr
sex 11 12 13 14 15 16
  1  4 39 56 29 11  7
  2  4 62 54 26  9  0

If you want to look at a specific row, for example, row 5.

holzinger[5,]
  id sex ageyr agemo school grade       x1   x2    x3       x4 x5       x6       x7
5  5   2    12     2      0     7 4.833333 4.75 0.875 2.666667  4 2.571429 3.695652
   x8       x9
5 6.3 5.916667

Or you want to look at everyone that with sex equal to 2 and ageyr equal to 11.

holzinger[holzinger$sex == 2 & holzinger$ageyr == 11, ]
     id sex ageyr agemo school grade       x1   x2    x3       x4   x5       x6
158 202   2    11    10      1     7 5.500000 5.50 2.125 2.666667 4.25 1.428571
164 209   2    11    11      1     7 4.666667 6.25 1.125 3.333333 4.50 1.571429
193 238   2    11    11      1     7 5.666667 7.00 1.250 3.000000 5.50 2.857143
205 250   2    11    11      1     7 5.166667 4.75 1.625 2.666667 5.50 2.428571
          x7   x8       x9
158 2.826087 4.90 5.416667
164 4.173913 4.75 4.833333
193 3.521739 4.80 5.527778
205 2.652174 3.50 3.333333

If you want to look at everyone that has sex equal to 2 or ageyr equal to 11.

holzinger[holzinger$sex == 2 | holzinger$ageyr == 11, ]
     id sex ageyr agemo school grade        x1   x2    x3        x4   x5        x6
2     2   2    13     7      0     7 5.3333333 5.25 2.125 1.6666667 3.00 1.2857143
3     3   2    13     1      0     7 4.5000000 5.25 1.875 1.0000000 1.75 0.4285714
5     5   2    12     2      0     7 4.8333333 4.75 0.875 2.6666667 4.00 2.5714286
6     6   2    14     1      0     7 5.3333333 5.00 2.250 1.0000000 3.00 0.8571429
8     8   2    12     2      0     7 5.6666667 6.25 1.875 3.6666667 4.25 1.2857143
9     9   2    13     0      0     7 4.5000000 5.75 1.500 2.6666667 5.75 2.7142857
10   11   2    12     5      0     7 3.5000000 5.25 0.750 2.6666667 5.00 2.5714286
13   14   2    12     7      0     7 5.6666667 4.50 4.125 2.6666667 4.00 2.2857143
14   15   2    12     8      0     7 6.0000000 5.50 1.750 4.6666667 4.00 1.5714286
16   17   2    12     1      0     7 4.6666667 4.75 2.375 2.6666667 4.25 0.7142857
17   18   2    14    11      0     7 4.3333333 4.75 1.500 2.0000000 4.00 1.2857143
19   20   2    12     8      0     7 5.6666667 5.25 4.000 4.3333333 5.25 3.7142857
20   21   2    12     3      0     7 6.3333333 8.75 3.000 3.6666667 3.75 2.5714286
24   25   2    12     8      0     7 3.8333333 5.50 1.625 2.6666667 3.00 1.0000000
28   29   2    13     2      0     7 6.0000000 5.00 2.125 1.6666667 3.00 1.1428571
29   30   2    12     5      0     7 4.6666667 6.00 4.250 2.0000000 3.00 2.0000000
30   31   2    12     2      0     7 5.0000000 4.50 0.750 2.6666667 3.25 1.8571429
31   33   2    12     7      0     7 3.5000000 5.75 1.375 2.0000000 3.50 1.8571429
33   35   2    12     2      0     7 5.0000000 5.25 1.750 2.6666667 5.25 2.0000000
34   36   2    12     3      0     7 4.1666667 6.00 2.375 3.3333333 4.25 1.8571429
35   38   2    13     3      0     7 3.3333333 3.75 1.500 1.3333333 3.00 0.8571429
38   41   2    12     8      0     7 3.8333333 4.50 2.250 3.0000000 3.00 1.7142857
39   42   2    12     6      0     7 6.3333333 4.00 3.875 4.0000000 5.25 3.2857143
41   44   2    12     1      0     7 3.8333333 5.75 1.625 2.6666667 4.25 2.1428571
42   45   2    13     6      0     7 3.1666667 5.00 1.250 1.6666667 4.25 1.5714286
43   46   2    13     8      0     7 1.8333333 5.25 1.000 1.6666667 3.75 0.5714286
48   51   2    13     6      0     7 3.1666667 4.75 1.375 2.6666667 3.25 1.2857143
50   54   2    13     2      0     7 4.5000000 6.25 1.125 3.6666667 5.50 2.1428571
59   65   2    14     0      0     7 3.8333333 6.50 2.000 1.0000000 2.50 1.2857143
61   67   2    12     5      0     7 5.3333333 5.75 3.375 4.0000000 5.50 2.5714286
62   68   2    12     9      0     7 4.0000000 6.00 3.625 2.6666667 5.00 2.1428571
63   69   2    12    10      0     7 5.3333333 6.75 1.375 1.6666667 2.50 0.7142857
64   70   2    13    11      0     7 5.3333333 5.00 1.250 3.3333333 4.50 2.4285714
65   71   2    13     1      0     7 3.6666667 5.75 3.625 2.6666667 5.00 1.4285714
66   72   2    12    11      0     7 6.5000000 6.00 2.500 3.6666667 4.50 3.0000000
68   74   2    13     0      0     7 4.6666667 5.75 3.625 2.6666667 2.75 1.8571429
69   75   2    13     9      0     7 2.8333333 5.00 0.875 3.0000000 2.50 1.5714286
71   77   2    14     2      0     7 0.6666667 4.50 0.750 2.0000000 3.00 1.0000000
75   81   2    12     9      0     7 4.1666667 5.25 2.125 3.0000000 3.25 0.7142857
76   82   2    13     9      0     7 3.8333333 5.25 2.375 1.6666667 2.25 1.0000000
79   86   2    14     7      0     8 6.3333333 5.50 4.125 4.0000000 3.25 1.8571429
84   91   2    13     2      0     8 4.8333333 6.50 1.750 2.3333333 3.50 0.7142857
85   93   2    14     7      0     8 3.0000000 5.25 0.625 1.0000000 1.50 1.0000000
86   94   2    12    11      0     8 6.3333333 6.25 2.500 3.3333333 5.75 3.7142857
87   95   2    13     8      0     8 5.5000000 7.00 2.875 2.6666667 4.00 0.8571429
88   96   2    13     3      0     8 5.3333333 6.00 2.750 4.0000000 5.00 2.1428571
89   97   2    13    11      0     8 3.3333333 7.25 3.250 1.3333333 1.25 0.5714286
90   98   2    13    10      0     8 5.5000000 5.00 2.250 3.6666667 4.50 1.7142857
92  100   2    14     0      0     8 3.8333333 6.25 3.375 2.0000000 3.75 0.4285714
98  106   2    13     2      0     8 7.5000000 7.50 2.125 5.3333333 6.75 2.7142857
99  108   2    13    11      0     8 5.0000000 3.75 1.375 1.6666667 2.75 1.4285714
102 111   2    13    10      0     8 6.5000000 6.50 1.875 3.6666667 3.75 1.7142857
106 115   2    12    10      0     8 4.5000000 6.50 3.125 2.6666667 5.50 2.1428571
107 116   2    13     5      0     8 6.8333333 6.50 0.750 5.3333333 5.75 3.1428571
108 117   2    13    10      0     8 5.5000000 6.75 3.750 2.6666667 2.75 2.0000000
109 118   2    14     5      0     8 6.3333333 5.50 1.625 5.6666667 5.50 4.0000000
110 119   2    13     3      0     8 4.1666667 4.25 2.250 3.3333333 3.50 1.4285714
112 121   2    13     1      0     8 6.3333333 5.25 2.625 4.3333333 4.25 1.5714286
114 123   2    14     0      0     8 5.1666667 6.50 1.750 4.0000000 5.25 1.8571429
116 125   2    13     1      0     8 4.1666667 5.75 1.000 3.6666667 4.75 1.8571429
121 131   2    14     4      0     8 4.8333333 6.00 1.500 2.6666667 4.25 1.2857143
123 133   2    14     4      0     8 7.5000000 7.00 4.250 4.0000000 6.50 3.5714286
124 134   2    15     8      0     8 5.3333333 5.25 4.125 2.3333333 3.75 1.5714286
126 136   2    15     4      0     8 5.3333333 8.75 3.125 4.3333333 4.00 1.1428571
127 137   2    13    11      0     8 4.3333333 7.00 1.000 4.3333333 5.50 3.0000000
128 138   2    13    11      0     8 4.8333333 7.00 1.125 3.3333333 5.00 2.2857143
129 139   2    13     4      0     8 6.0000000 7.50 3.250 3.0000000 6.00 2.4285714
132 143   2    14     1      0     8 5.6666667 5.50 3.500 4.0000000 4.25 2.2857143
133 144   2    15     3      0     8 5.0000000 6.25 1.750 1.3333333 2.00 0.5714286
134 145   2    13     6      0     8 3.5000000 5.25 2.250 2.0000000 3.75 1.8571429
139 150   2    14     2      0     8 5.5000000 6.75 4.500 2.3333333 4.50 2.5714286
140 151   2    14     7      0     8 5.1666667 5.00 2.500 3.3333333 5.75 3.0000000
141 152   2    14     0      0     8 3.1666667 3.75 1.500 3.3333333 5.75 2.2857143
142 153   2    14     9      0     8 5.0000000 7.50 4.500 2.0000000 1.50 0.4285714
144 155   2    12     9      0     8 7.3333333 6.75 4.000 6.0000000 6.50 6.1428571
145 156   2    15     8      0     8 2.0000000 5.50 0.625 2.6666667 5.00 0.8571429
146 157   2    13     6      0     8 3.8333333 5.50 1.875 3.6666667 5.00 3.0000000
147 158   2    13     8      0     8 4.1666667 6.00 1.250 4.0000000 5.00 2.5714286
148 159   2    15     8      0     8 4.6666667 5.75 1.625 0.6666667 2.50 1.4285714
154 166   2    15     1      0     8 5.1666667 6.00 2.375 1.3333333 2.25 1.1428571
155 167   2    15     7      0     8 6.3333333 6.75 1.125 3.0000000 2.50 1.4285714
156 168   2    15     6      0     8 4.8333333 5.75 1.250 3.0000000 4.75 2.1428571
158 202   2    11    10      1     7 5.5000000 5.50 2.125 2.6666667 4.25 1.4285714
160 204   1    11    11      1     7 4.8333333 5.75 1.125 3.0000000 4.75 1.5714286
162 206   2    12     6      1     7 5.0000000 6.25 2.500 3.3333333 5.75 2.5714286
163 208   2    12     8      1     7 6.0000000 8.25 4.500 5.6666667 6.25 5.8571429
164 209   2    11    11      1     7 4.6666667 6.25 1.125 3.3333333 4.50 1.5714286
165 210   2    12     5      1     7 5.0000000 6.25 1.375 3.6666667 5.25 1.1428571
166 211   2    12     5      1     7 3.3333333 6.25 0.750 3.0000000 5.25 2.2857143
170 215   2    12     8      1     7 2.8333333 5.25 0.750 1.6666667 2.50 1.4285714
173 218   2    12     1      1     7 5.5000000 7.75 3.750 3.6666667 5.75 2.5714286
175 220   2    12     7      1     7 5.0000000 5.50 2.500 2.6666667 4.25 2.8571429
176 221   2    12     1      1     7 6.0000000 7.00 2.750 4.3333333 6.00 5.1428571
179 224   2    12     6      1     7 5.0000000 6.00 2.375 4.6666667 6.50 3.4285714
181 226   2    12     3      1     7 5.5000000 6.75 2.000 2.6666667 4.25 1.8571429
182 227   2    12     4      1     7 5.3333333 5.50 1.875 3.0000000 5.00 2.4285714
184 229   2    14     7      1     7 4.5000000 5.75 0.500 3.0000000 2.75 1.0000000
187 232   2    12     0      1     7 2.8333333 7.50 1.625 3.0000000 4.25 1.8571429
192 237   2    12    10      1     7 6.3333333 6.25 1.625 3.0000000 5.75 2.1428571
193 238   2    11    11      1     7 5.6666667 7.00 1.250 3.0000000 5.50 2.8571429
194 239   2    13     8      1     7 3.0000000 5.50 0.625 0.6666667 1.00 0.2857143
195 240   2    12     7      1     7 2.6666667 5.00 1.000 2.0000000 4.50 1.8571429
196 241   2    12     4      1     7 3.0000000 7.50 2.125 6.3333333 6.00 4.7142857
197 242   2    12     6      1     7 5.3333333 5.25 1.125 5.0000000 5.00 3.5714286
200 245   2    12     9      1     7 4.6666667 5.00 1.750 2.6666667 4.50 1.4285714
201 246   2    12     3      1     7 6.5000000 6.00 3.125 4.6666667 4.25 1.5714286
202 247   2    12     4      1     7 5.3333333 6.50 1.250 3.6666667 5.75 3.2857143
203 248   2    12    10      1     7 4.6666667 6.00 1.000 2.0000000 2.50 1.4285714
205 250   2    11    11      1     7 5.1666667 4.75 1.625 2.6666667 5.50 2.4285714
206 251   2    12     8      1     7 4.8333333 6.50 3.125 3.3333333 5.50 2.5714286
207 252   1    11    11      1     7 4.8333333 7.50 1.875 2.3333333 4.00 1.7142857
209 254   2    12     4      1     7 4.1666667 6.25 3.250 1.6666667 1.75 0.7142857
211 257   2    12     3      1     7 6.1666667 7.75 2.000 4.6666667 7.00 2.8571429
212 258   2    12    11      1     7 5.0000000 6.50 2.375 2.3333333 5.25 2.0000000
213 259   1    11     5      1     7 4.8333333 6.00 1.250 3.0000000 5.25 2.4285714
214 260   2    12     4      1     7 6.1666667 8.50 2.125 6.0000000 6.00 3.8571429
215 261   2    12     0      1     7 4.6666667 5.00 0.500 2.6666667 4.50 0.8571429
217 263   2    12     4      1     7 6.3333333 5.25 2.250 6.0000000 6.50 4.4285714
220 266   2    13     7      1     7 4.6666667 6.75 1.625 1.6666667 1.50 0.5714286
222 268   2    14     0      1     7 1.8333333 5.00 1.125 4.3333333 4.00 2.1428571
223 269   2    12    11      1     7 7.5000000 9.25 3.625 3.3333333 4.00 2.2857143
224 270   2    13     0      1     7 3.1666667 3.75 0.875 5.0000000 4.75 2.7142857
225 271   2    12     8      1     7 6.8333333 5.25 1.375 3.0000000 4.00 2.0000000
227 273   2    12     0      1     7 5.6666667 6.00 2.000 3.0000000 3.50 1.4285714
230 276   1    11     4      1     7 6.0000000 7.00 4.125 5.0000000 6.00 5.5714286
231 277   2    14     0      1     7 3.1666667 5.75 0.750 2.3333333 3.25 2.7142857
232 278   2    12     9      1     7 4.5000000 5.25 0.875 3.0000000 4.75 3.1428571
233 279   2    12     5      1     7 3.5000000 2.25 1.750 5.0000000 6.25 4.2857143
235 281   2    12     7      1     7 4.3333333 5.50 1.000 1.6666667 2.75 1.5714286
237 283   2    13     1      1     8 5.5000000 5.50 1.750 4.3333333 6.25 3.7142857
238 284   2    12    11      1     8 6.3333333 6.50 0.875 5.3333333 6.25 2.1428571
239 285   2    13     2      1     8 5.5000000 6.25 4.250 5.0000000 5.75 3.0000000
242 288   2    13     2      1     8 5.1666667 6.25 3.500 3.3333333 6.25 3.8571429
243 289   2    13     1      1     8 6.0000000 7.00 1.625 3.6666667 5.75 2.5714286
245 291   2    14     2      1     8 4.6666667 6.00 0.750 2.6666667 4.25 2.1428571
249 295   2    13     3      1     8 5.8333333 6.25 3.125 3.6666667 6.00 2.7142857
250 296   2    12    10      1     8 3.8333333 5.25 0.375 3.0000000 5.25 1.7142857
251 297   2    13     5      1     8 5.1666667 7.00 3.125 4.0000000 5.75 2.1428571
253 299   2    14     9      1     8 5.5000000 7.00 2.250 5.0000000 5.75 4.2857143
254 300   2    15    11      1     8 3.5000000 6.50 2.125 4.0000000 4.00 4.7142857
255 302   2    13     9      1     8 6.1666667 8.00 1.375 5.3333333 6.25 4.5714286
259 306   2    13     0      1     8 6.5000000 8.50 4.125 3.3333333 4.75 2.2857143
260 307   2    13     8      1     8 3.0000000 4.00 0.500 3.3333333 5.25 2.7142857
261 308   2    12     9      1     8 4.6666667 7.00 2.250 3.3333333 6.25 3.2857143
263 310   2    13     5      1     8 4.1666667 5.00 1.375 4.0000000 4.75 2.4285714
265 312   2    13     6      1     8 5.3333333 5.25 1.875 4.6666667 5.00 4.1428571
267 314   2    13     5      1     8 5.8333333 7.00 2.375 6.0000000 6.75 4.7142857
269 316   2    13     5      1     8 6.3333333 7.50 4.125 4.6666667 5.50 4.1428571
270 317   2    13     2      1     8 4.3333333 6.25 0.875 3.3333333 3.75 1.8571429
271 318   2    13     7      1     8 5.5000000 7.00 1.500 4.6666667 5.25 2.0000000
273 321   2    13     3      1     8 4.5000000 5.50 2.000 2.6666667 4.75 2.1428571
274 322   2    14     4      1     8 4.3333333 7.25 1.250 2.6666667 4.75 1.8571429
283 331   2    14     1      1     8 4.0000000 6.00 2.000 2.6666667 4.50 1.7142857
285 334   2    14     9      1     8 5.0000000 5.75 1.250 2.6666667 3.50 1.7142857
289 338   2    13     1      1     8 4.0000000 6.25 0.750 3.0000000 4.75 3.2857143
293 342   2    12    11      1     8 5.6666667 5.50 1.625 3.3333333 4.25 2.0000000
295 344   2    13     0      1     8 5.8333333 7.00 1.250 3.0000000 3.25 1.5714286
298 347   2    14    10      1     8 3.0000000 6.00 1.625 2.3333333 4.00 1.0000000
299 348   2    14     3      1     8 4.6666667 5.50 1.875 3.6666667 5.75 4.2857143
          x7   x8       x9
2   3.782609 6.25 7.916667
3   3.260870 3.90 4.416667
5   3.695652 6.30 5.916667
6   4.347826 6.65 7.500000
8   3.391304 5.15 3.666667
9   4.521739 4.65 7.361111
10  4.130435 4.55 4.361111
13  5.869565 5.20 5.861111
14  5.130435 4.70 4.444444
16  4.086957 3.80 5.138889
17  3.695652 6.65 5.250000
19  3.913044 4.85 5.750000
20  3.478261 5.35 4.916667
24  5.826087 5.30 6.777778
28  2.826087 5.55 4.416667
29  5.130435 5.85 8.611111
30  4.652174 4.85 5.444444
31  4.826087 6.95 5.972222
33  2.695652 4.30 4.805556
34  5.391304 4.35 5.638889
35  2.782609 5.20 4.833333
38  4.086957 3.50 5.083333
39  5.521739 5.45 5.111111
41  4.913043 5.10 4.638889
42  2.826087 5.30 4.777778
43  3.956522 4.75 2.777778
48  6.130435 8.00 5.444444
50  2.565217 4.80 5.527778
59  3.391304 4.55 4.833333
61  5.260870 6.20 6.138889
62  4.913043 5.35 4.777778
63  2.652174 3.85 5.333333
64  5.521739 5.45 5.833333
65  3.304348 4.50 5.027778
66  5.695652 5.40 6.305556
68  3.869565 6.10 4.250000
69  6.826087 5.70 3.916667
71  3.608696 5.50 4.611111
75  3.478261 5.70 4.583333
76  3.130435 5.75 4.666667
79  5.565217 7.35 5.750000
84  5.782609 5.75 4.972222
85  4.956522 4.85 4.583333
86  4.608696 6.85 5.472222
87  4.086957 5.40 5.972222
88  5.521739 4.55 5.138889
89  3.565217 5.55 4.888889
90  4.173913 5.95 6.666667
92  5.391304 7.80 6.111111
98  5.130435 5.55 7.222222
99  5.565217 6.10 4.611111
102 4.521739 4.40 6.583333
106 6.347826 6.50 6.166667
107 7.434783 5.70 5.194444
108 4.173913 6.80 7.000000
109 5.695652 6.40 7.527778
110 4.478261 4.15 3.361111
112 2.869565 6.00 5.444444
114 4.695652 4.30 6.000000
116 4.826087 5.85 5.416667
121 5.826087 6.40 6.861111
123 3.739130 7.60 6.500000
124 3.652174 5.35 4.777778
126 4.000000 6.95 5.666667
127 5.130435 5.65 4.916667
128 4.913043 5.25 4.972222
129 3.304348 5.10 5.777778
132 7.260870 6.35 7.194444
133 4.913043 6.00 4.805556
134 4.695652 5.25 3.722222
139 6.260870 6.55 6.888889
140 6.434783 8.30 7.083333
141 6.043478 5.25 6.222222
142 3.304348 5.25 5.722222
144 5.347826 5.75 6.611111
145 2.391304 5.60 5.972222
146 5.608696 4.90 3.861111
147 6.956522 6.25 6.305556
148 4.521739 5.00 4.750000
154 5.391304 6.15 5.194444
155 4.695652 5.60 4.138889
156 3.608696 6.15 4.694444
158 2.826087 4.90 5.416667
160 4.956522 5.15 4.000000
162 4.086957 5.65 5.583333
163 5.608696 6.95 9.250000
164 4.173913 4.75 4.833333
165 4.478261 5.70 5.472222
166 3.869565 5.05 4.944444
170 4.304348 4.35 4.083333
173 2.869565 5.65 5.166667
175 4.130435 5.50 4.472222
176 3.565217 5.15 5.694444
179 4.869565 5.75 5.138889
181 2.956522 5.80 6.083333
182 3.217391 5.90 5.305556
184 2.391304 5.90 4.944444
187 2.260870 4.70 3.972222
192 2.434783 6.50 5.500000
193 3.521739 4.80 5.527778
194 1.304348 3.05 3.111111
195 4.043478 5.55 5.138889
196 4.000000 5.40 4.111111
197 4.130435 4.60 4.944444
200 2.652174 5.00 5.944444
201 3.565217 5.20 6.777778
202 2.695652 4.15 3.972222
203 2.652174 3.85 4.416667
205 2.652174 3.50 3.333333
206 3.695652 6.40 5.611111
207 2.782609 3.70 4.583333
209 2.000000 3.60 3.361111
211 3.826087 5.70 6.194444
212 3.869565 6.05 6.166667
213 4.347826 6.05 5.722222
214 3.478261 5.60 6.361111
215 3.173913 4.20 4.472222
217 5.217391 5.45 5.694444
220 4.391304 5.30 4.777778
222 3.173913 3.65 3.611111
223 3.173913 5.05 6.111111
224 4.391304 5.60 3.222222
225 5.391304 5.95 6.111111
227 3.869565 5.80 6.555556
230 3.521739 5.20 5.833333
231 2.956522 4.85 4.138889
232 3.826087 5.45 5.305556
233 2.956522 4.55 5.027778
235 3.304348 5.70 4.333333
237 3.652174 6.35 6.388889
238 4.739130 5.55 5.555556
239 4.739130 7.15 6.833333
242 4.478261 4.90 4.666667
243 5.826087 5.10 6.222222
245 4.000000 4.75 5.972222
249 5.739130 5.90 5.194444
250 5.782609 5.55 6.527778
251 6.043478 5.50 5.527778
253 4.173913 6.10 5.888889
254 5.130435 5.60 3.666667
255 3.478261 4.75 4.750000
259 4.130435 5.95 6.666667
260 2.956522 4.00 3.472222
261 3.782609 4.65 4.722222
263 4.826087 6.25 5.722222
265 5.086957 5.40 6.583333
267 6.304348 5.85 6.277778
269 4.217391 6.00 7.000000
270 5.000000 6.45 5.083333
271 4.608696 5.30 5.194444
273 5.869565 7.10 6.250000
274 3.695652 5.45 6.111111
283 3.347826 4.70 3.750000
285 4.260870 6.60 6.666667
289 5.217391 6.55 5.722222
293 5.478261 6.00 4.500000
295 4.173913 4.85 5.777778
298 4.608696 6.05 6.083333
299 4.000000 6.00 7.611111

Let’s say you figured out that person with id 5 was suppose to have sex equal to 1 not to 2.

holzinger[holzinger$id == 5, "sex"]
[1] 2
holzinger[holzinger$id == 5, "sex"] <- 1
holzinger[holzinger$id == 5, "sex"]
[1] 1
# Actually they were suppose to be a 2. Let's change it back
holzinger[holzinger$id == 5, "sex"] <- 2

Doing statistics in R

We saw that we can get basic summary statistics from R by:

summary(holzinger)
       id             sex            ageyr        agemo            school      
 Min.   :  1.0   Min.   :1.000   Min.   :11   Min.   : 0.000   Min.   :0.0000  
 1st Qu.: 82.0   1st Qu.:1.000   1st Qu.:12   1st Qu.: 2.000   1st Qu.:0.0000  
 Median :163.0   Median :2.000   Median :13   Median : 5.000   Median :0.0000  
 Mean   :176.6   Mean   :1.515   Mean   :13   Mean   : 5.375   Mean   :0.4817  
 3rd Qu.:272.0   3rd Qu.:2.000   3rd Qu.:14   3rd Qu.: 8.000   3rd Qu.:1.0000  
 Max.   :351.0   Max.   :2.000   Max.   :16   Max.   :11.000   Max.   :1.0000  
     grade                x1               x2              x3              x4       
 Min.   :-999.000   Min.   :0.6667   Min.   :2.250   Min.   :0.250   Min.   :0.000  
 1st Qu.:   7.000   1st Qu.:4.1667   1st Qu.:5.250   1st Qu.:1.375   1st Qu.:2.333  
 Median :   7.000   Median :5.0000   Median :6.000   Median :2.125   Median :3.000  
 Mean   :   4.133   Mean   :4.9358   Mean   :6.088   Mean   :2.250   Mean   :3.061  
 3rd Qu.:   8.000   3rd Qu.:5.6667   3rd Qu.:6.750   3rd Qu.:3.125   3rd Qu.:3.667  
 Max.   :   8.000   Max.   :8.5000   Max.   :9.250   Max.   :4.500   Max.   :6.333  
       x5              x6               x7              x8               x9       
 Min.   :1.000   Min.   :0.1429   Min.   :1.304   Min.   : 3.050   Min.   :2.778  
 1st Qu.:3.500   1st Qu.:1.4286   1st Qu.:3.478   1st Qu.: 4.850   1st Qu.:4.750  
 Median :4.500   Median :2.0000   Median :4.087   Median : 5.500   Median :5.417  
 Mean   :4.341   Mean   :2.1856   Mean   :4.186   Mean   : 5.527   Mean   :5.374  
 3rd Qu.:5.250   3rd Qu.:2.7143   3rd Qu.:4.913   3rd Qu.: 6.100   3rd Qu.:6.083  
 Max.   :7.000   Max.   :6.1429   Max.   :7.435   Max.   :10.000   Max.   :9.250  

Hmm, grade is -999. That seems strange. I bet that’s a missing data value. Let’s treat it as such.

holzinger[holzinger$grade == -999, "grade"] <- NA
summary(holzinger)
       id             sex            ageyr        agemo            school      
 Min.   :  1.0   Min.   :1.000   Min.   :11   Min.   : 0.000   Min.   :0.0000  
 1st Qu.: 82.0   1st Qu.:1.000   1st Qu.:12   1st Qu.: 2.000   1st Qu.:0.0000  
 Median :163.0   Median :2.000   Median :13   Median : 5.000   Median :0.0000  
 Mean   :176.6   Mean   :1.515   Mean   :13   Mean   : 5.375   Mean   :0.4817  
 3rd Qu.:272.0   3rd Qu.:2.000   3rd Qu.:14   3rd Qu.: 8.000   3rd Qu.:1.0000  
 Max.   :351.0   Max.   :2.000   Max.   :16   Max.   :11.000   Max.   :1.0000  
                                                                               
     grade             x1               x2              x3              x4       
 Min.   :7.000   Min.   :0.6667   Min.   :2.250   Min.   :0.250   Min.   :0.000  
 1st Qu.:7.000   1st Qu.:4.1667   1st Qu.:5.250   1st Qu.:1.375   1st Qu.:2.333  
 Median :7.000   Median :5.0000   Median :6.000   Median :2.125   Median :3.000  
 Mean   :7.477   Mean   :4.9358   Mean   :6.088   Mean   :2.250   Mean   :3.061  
 3rd Qu.:8.000   3rd Qu.:5.6667   3rd Qu.:6.750   3rd Qu.:3.125   3rd Qu.:3.667  
 Max.   :8.000   Max.   :8.5000   Max.   :9.250   Max.   :4.500   Max.   :6.333  
 NA's   :1                                                                       
       x5              x6               x7              x8               x9       
 Min.   :1.000   Min.   :0.1429   Min.   :1.304   Min.   : 3.050   Min.   :2.778  
 1st Qu.:3.500   1st Qu.:1.4286   1st Qu.:3.478   1st Qu.: 4.850   1st Qu.:4.750  
 Median :4.500   Median :2.0000   Median :4.087   Median : 5.500   Median :5.417  
 Mean   :4.341   Mean   :2.1856   Mean   :4.186   Mean   : 5.527   Mean   :5.374  
 3rd Qu.:5.250   3rd Qu.:2.7143   3rd Qu.:4.913   3rd Qu.: 6.100   3rd Qu.:6.083  
 Max.   :7.000   Max.   :6.1429   Max.   :7.435   Max.   :10.000   Max.   :9.250  
                                                                                  

What if we want variance or standard deviation?

apply(holzinger, 2, var, na.rm = T)
          id          sex        ageyr        agemo       school        grade 
1.122296e+04 2.506091e-01 1.103322e+00 1.191526e+01 2.504983e-01 2.502899e-01 
          x1           x2           x3           x4           x5           x6 
1.362898e+00 1.386390e+00 1.279114e+00 1.355167e+00 1.665318e+00 1.200346e+00 
          x7           x8           x9 
1.187083e+00 1.025389e+00 1.018387e+00 
apply(holzinger, 2, sd, na.rm = T)
         id         sex       ageyr       agemo      school       grade          x1 
105.9384781   0.5006087   1.0503915   3.4518488   0.5004981   0.5002898   1.1674321 
         x2          x3          x4          x5          x6          x7          x8 
  1.1774506   1.1309794   1.1641163   1.2904722   1.0956031   1.0895335   1.0126151 
         x9 
  1.0091517 
# Alterative, if we use the var() function on the data set we create a variance-covariance matrix. 
diag(var(holzinger, use = "pairwise.complete.obs"))
          id          sex        ageyr        agemo       school        grade 
1.122296e+04 2.506091e-01 1.103322e+00 1.191526e+01 2.504983e-01 2.502899e-01 
          x1           x2           x3           x4           x5           x6 
1.362898e+00 1.386390e+00 1.279114e+00 1.355167e+00 1.665318e+00 1.200346e+00 
          x7           x8           x9 
1.187083e+00 1.025389e+00 1.018387e+00 

If you want to get a correlation matrix.

cor(holzinger, use = "complete.obs")
                 id         sex        ageyr        agemo       school        grade
id      1.000000000 -0.02934052  0.006890281 -0.012032364  0.899717471  0.332238889
sex    -0.029340517  1.00000000 -0.161828572  0.024381005 -0.018692005 -0.025152477
ageyr   0.006890281 -0.16182857  1.000000000 -0.244202783 -0.251028024  0.511329621
agemo  -0.012032364  0.02438100 -0.244202783  1.000000000 -0.008195569 -0.003602712
school  0.899717471 -0.01869201 -0.251028024 -0.008195569  1.000000000 -0.048625183
grade   0.332238889 -0.02515248  0.511329621 -0.003602712 -0.048625183  1.000000000
x1      0.042549630 -0.08301280 -0.059998260  0.053127255 -0.003087534  0.171948019
x2      0.122378809 -0.11863772 -0.015259252  0.045328706  0.092251245  0.139541113
x3     -0.147127654 -0.17934638  0.037230487  0.034105071 -0.221709257  0.138766019
x4      0.269331848  0.12156266 -0.198020923 -0.016864885  0.211316591  0.207891762
x5      0.302481294  0.06187084 -0.222783739 -0.036890348  0.275294444  0.173473441
x6      0.280615027  0.01403554 -0.173897973  0.007405401  0.247527378  0.155478412
x7     -0.093489514  0.11581376  0.110717769  0.057457395 -0.235047885  0.349553387
x8      0.075997773 -0.03741871  0.238878673  0.009258502 -0.042084041  0.295627995
x9      0.031768594  0.04526865  0.098931578  0.028503928 -0.044270793  0.217055327
                 x1          x2          x3          x4          x5           x6
id      0.042549630  0.12237881 -0.14712765  0.26933185  0.30248129  0.280615027
sex    -0.083012804 -0.11863772 -0.17934638  0.12156266  0.06187084  0.014035544
ageyr  -0.059998260 -0.01525925  0.03723049 -0.19802092 -0.22278374 -0.173897973
agemo   0.053127255  0.04532871  0.03410507 -0.01686489 -0.03689035  0.007405401
school -0.003087534  0.09225125 -0.22170926  0.21131659  0.27529444  0.247527378
grade   0.171948019  0.13954111  0.13876602  0.20789176  0.17347344  0.155478412
x1      1.000000000  0.29735169  0.44331478  0.37394016  0.29605145  0.358896277
x2      0.297351685  1.00000000  0.34066453  0.15313110  0.13994136  0.192998751
x3      0.443314783  0.34066453  1.00000000  0.15724038  0.07383541  0.195327656
x4      0.373940162  0.15313110  0.15724038  1.00000000  0.73306452  0.704177712
x5      0.296051447  0.13994136  0.07383541  0.73306452  1.00000000  0.719116628
x6      0.358896277  0.19299875  0.19532766  0.70417771  0.71911663  1.000000000
x7      0.066737828 -0.07569338  0.07235378  0.17406840  0.10258274  0.121523992
x8      0.227204249  0.09293888  0.18224292  0.10484699  0.13424822  0.146174610
x9      0.390186974  0.20600565  0.32990344  0.20831513  0.22869014  0.215052378
                x7           x8          x9
id     -0.09348951  0.075997773  0.03176859
sex     0.11581376 -0.037418709  0.04526865
ageyr   0.11071777  0.238878673  0.09893158
agemo   0.05745740  0.009258502  0.02850393
school -0.23504788 -0.042084041 -0.04427079
grade   0.34955339  0.295627995  0.21705533
x1      0.06673783  0.227204249  0.39018697
x2     -0.07569338  0.092938877  0.20600565
x3      0.07235378  0.182242919  0.32990344
x4      0.17406840  0.104846993  0.20831513
x5      0.10258274  0.134248219  0.22869014
x6      0.12152399  0.146174610  0.21505238
x7      1.00000000  0.488808123  0.34061205
x8      0.48880812  1.000000000  0.45150669
x9      0.34061205  0.451506687  1.00000000

We can do a simple t-test.

t.test(x1 ~ sex, data = holzinger)

    Welch Two Sample t-test

data:  x1 by sex
t = 1.4095, df = 298.9, p-value = 0.1597
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.07489415  0.45293216
sample estimates:
mean in group 1 mean in group 2 
       5.033105        4.844086 

And we can do regressions as the following

mod1 <- lm(x9 ~ x7 + x8, data = holzinger)
summary(mod1)

Call:
lm(formula = x9 ~ x7 + x8, data = holzinger)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.45409 -0.63860  0.01326  0.59138  3.13874 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.70953    0.29470   9.194  < 2e-16 ***
x7           0.14819    0.05421   2.734  0.00664 ** 
x8           0.36987    0.05832   6.342 8.42e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8936 on 298 degrees of freedom
Multiple R-squared:  0.2211,    Adjusted R-squared:  0.2159 
F-statistic: 42.31 on 2 and 298 DF,  p-value: < 2.2e-16
anova(mod1)
Analysis of Variance Table

Response: x9
           Df  Sum Sq Mean Sq F value    Pr(>F)    
x7          1  35.452  35.452  44.398 1.294e-10 ***
x8          1  32.112  32.112  40.216 8.416e-10 ***
Residuals 298 237.952   0.798                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And get some diagnostic information

plot(mod1, which = 1:5)

