For the activites in the course and the in-class demonstrations, we’ll use R Markdown Notebooks. When you execute code within the notebook, the results automagically appear beneath the code. That way the code and the ideas are the methods are intricately entwined.
Before we can use R Markdown Notebooks properly, we need to learn a little about R and R Markdown.
There are a multitude of reasons for using and learning R. Here are some of my favorite reasons.
R is free and open-source. R being free is important because everyone has access to it regardless of their income. The poor, lowly graduate student can use it and ALWAYS use it. R being open-source is important because that means we can scrutinize every bit of R’s code and our algorithms are complete available for independent auditing. That means you can really trust what’s in the core of R and if you don’t like something about it, if you have the know how, you can change it.
R is the premier statistical software. If you read about a method in an article, the odds are someone has already written a package that implements that method. It’s not unsual in the statistical world that when you submit a new method to a statistics journal that an R package accompanies that submission. So you won’t need to learn 5 different pieces of statistical software (e.g., SPSS, HLM, Mplus, etc).
R has amazing visualization capabilities including the base R graphics and ggplot2 (which we’ll use in this stats camp).
Once you learn R, and it’ll be hard if you don’t use it regularly, and become an efficient R user and programmer, you’ll discover how much more quickly you can manipulate data, run statistical models, and create beautiful plots.
This file is an R Markdown file. I recommend you spend some time this evening to learn download and read the R Markdown cheatsheet. I would strongly encourage you to always write your R code in an R Markdown file. The reason is two-fold.
Your code is intricately woven with your narrative. This will make it easier for you to understand what you’ve done and why you’ve done it. It’ll make your logic more clear when you return 3 - 6 months from now when you need to revisit an analysis for a resubmission or to update the data.
This helps to make your research more reproducible. If you give someone your data and this R Markdown file, you’re giving them a recipe to completely (mostly) reproduce your complete analysis.
It’s essentially a fancy, and new, way to embed R code into an R Markdown file and allow me to create an HTML file that you can edit if you have RStudio or view even if you don’t have RStudio. So, I can, in essence, share my R session with you and you can share your findings with other students and with your advisors that may or may not have RStudio. The difference between a notebook and the usual R Markdown file is that it is interactive (you’ll see what I mean) and that I can share an HTML file with you which you can open up in RStudio and edit.
All the R code will be written inside a chunk.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter. Something should happen.
plot(cars)
So, code is evaluated and plots are created within the document if R code is written within these chunks.
So let’s start learning R.
# Comments --------------------------------------------------------------------
# Hashtags are what R uses for comments.
# 2 + 2 # This will not be evaulated
2 + 2 # This will be evaluated
[1] 4
If you are completely new to R, the first thing you’ll note is that we can use it as a calculator.
2 ^ 3 # 2 cubed
[1] 8
log(10) # Take the log of 10
[1] 2.302585
sqrt(9) # Take the square root of 9
[1] 3
For this course, we’ll use data in four formats.
Let’s look first at how to use a data sets in R. Let’s use the mtcars data set.
data(mtcars)
If you want to see all the data sets available in R.
data()
If you want to see all the data sets available in a specific package, for example, the lme4 package (which is used for mixed effects modeling).
data(package = "lme4")
Odds are that you’ll need to read in your own data. So this is unlikely to be terribly helpful for your own research. If you have a data set where the data are seperated by white space (among other things), you can use the read.table() function.
# Note, I am providing the optional col.names argument here because there are no column names in the data set.
holzinger <- read.table("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/holzinger.dat", col.names = c("id", "sex", "ageyr", "agemo", "school", "grade", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"))
(This data set is also available in the lavaan package.)
data(package = "lavaan", HolzingerSwineford1939)
If you have a data set that comes from Excel, I recommend you save it as a CSV file and then import the CSV file into R that way doing the following:
fmm <- read.csv("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/fmm-dataset.csv")
Finally, SPSS files can be read in using the read.spss() function in the foreign library.
library("foreign")
wiscr <- read.spss("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/wiscsem.sav", to.data.frame = T)
## Or you can call directly from the foreign library
wiscr <- foreign::read.spss("/Users/cdesjard/Google Drive/teaching-courses/ICD_summer2016/data/wiscsem.sav", to.data.frame = T)
Now that we’ve read in the data sets, we should make sure that it read in correctly. We can do that doing the following.
head(wiscr)
client agemate info comp arith simil vocab digit pictcomp parang block object
1 3 3 8 7 13 9 12 9 6 11 12 7
2 4 3 9 6 8 7 11 12 6 8 7 12
3 5 3 13 18 11 16 15 6 18 8 11 12
4 6 3 8 11 6 12 9 7 13 4 7 12
5 7 2 10 3 8 9 12 9 7 7 11 4
6 8 3 11 7 15 12 10 12 6 12 10 5
coding
1 9
2 14
3 9
4 11
5 10
6 10
tail(wiscr)
client agemate info comp arith simil vocab digit pictcomp parang block object
170 235 6 14 11 12 12 18 13 12 12 11 8
171 237 1 6 11 5 11 8 9 5 9 7 9
172 238 3 10 8 11 11 6 12 12 10 10 6
173 239 1 10 10 10 10 9 5 10 11 10 13
174 240 2 13 12 11 11 12 12 13 11 10 12
175 241 2 9 9 10 8 8 9 10 6 6 7
coding
170 7
171 10
172 10
173 9
174 12
175 7
summary(wiscr)
client agemate info comp arith
Min. : 3.0 Min. :0.000 Min. : 3.000 Min. : 0 Min. : 4.0
1st Qu.: 55.5 1st Qu.:1.000 1st Qu.: 8.000 1st Qu.: 8 1st Qu.: 7.0
Median :118.0 Median :2.000 Median :10.000 Median :10 Median : 9.0
Mean :119.5 Mean :2.051 Mean : 9.497 Mean :10 Mean : 9.0
3rd Qu.:183.5 3rd Qu.:3.000 3rd Qu.:11.500 3rd Qu.:12 3rd Qu.:10.5
Max. :241.0 Max. :6.000 Max. :19.000 Max. :18 Max. :16.0
simil vocab digit pictcomp parang
Min. : 2.00 Min. : 2.0 Min. : 0.000 Min. : 2.00 Min. : 2.00
1st Qu.: 9.00 1st Qu.: 9.0 1st Qu.: 7.000 1st Qu.: 9.00 1st Qu.: 9.00
Median :11.00 Median :10.0 Median : 8.000 Median :11.00 Median :10.00
Mean :10.61 Mean :10.7 Mean : 8.731 Mean :10.68 Mean :10.37
3rd Qu.:12.00 3rd Qu.:12.0 3rd Qu.:11.000 3rd Qu.:13.00 3rd Qu.:12.00
Max. :18.00 Max. :19.0 Max. :16.000 Max. :19.00 Max. :17.00
block object coding
Min. : 2.00 Min. : 3.0 Min. : 0.000
1st Qu.: 9.00 1st Qu.: 9.0 1st Qu.: 6.000
Median :10.00 Median :11.0 Median : 9.000
Mean :10.31 Mean :10.9 Mean : 8.549
3rd Qu.:12.00 3rd Qu.:13.0 3rd Qu.:11.000
Max. :18.00 Max. :19.0 Max. :15.000
str(wiscr)
'data.frame': 175 obs. of 13 variables:
$ client : num 3 4 5 6 7 8 9 10 12 13 ...
$ agemate : num 3 3 3 3 2 3 3 2 3 3 ...
$ info : num 8 9 13 8 10 11 6 7 10 9 ...
$ comp : num 7 6 18 11 3 7 13 10 8 10 ...
$ arith : num 13 8 11 6 8 15 7 10 8 8 ...
$ simil : num 9 7 16 12 9 12 8 15 14 11 ...
$ vocab : num 12 11 15 9 12 10 11 10 9 9 ...
$ digit : num 9 12 6 7 9 12 6 7 9 11 ...
$ pictcomp: num 6 6 18 13 7 6 14 8 10 10 ...
$ parang : num 11 8 8 4 7 12 9 14 11 12 ...
$ block : num 12 7 11 7 11 10 14 11 10 9 ...
$ object : num 7 12 12 12 4 5 14 10 9 13 ...
$ coding : num 9 14 9 11 10 10 10 12 6 13 ...
- attr(*, "variable.labels")= Named chr "" "" "Information" "Comprehension" ...
..- attr(*, "names")= chr "client" "agemate" "info" "comp" ...
If we want to access a specific variable.
wiscr$info
[1] 8 9 13 8 10 11 6 7 10 9 11 12 8 9 9 12 12 5 12 11 7 10 9 7 12 8
[27] 13 8 8 10 10 9 13 12 8 10 11 15 9 10 5 8 5 9 13 10 14 8 11 10 9 7
[53] 7 5 10 9 8 10 13 8 10 17 5 6 10 10 11 9 5 15 12 14 14 9 3 12 9 8
[79] 12 5 10 12 13 10 12 13 8 6 12 11 10 13 10 5 4 10 10 5 19 9 10 7 10 5
[105] 13 8 5 8 6 11 11 7 12 11 9 10 14 7 8 11 6 11 13 15 7 10 4 9 10 8
[131] 7 9 3 8 5 4 14 12 8 8 12 5 10 9 10 10 11 9 10 9 7 12 9 13 15 10
[157] 6 13 9 7 12 11 5 11 15 4 13 7 4 14 6 10 10 13 9
# or
wiscr[,"info"]
[1] 8 9 13 8 10 11 6 7 10 9 11 12 8 9 9 12 12 5 12 11 7 10 9 7 12 8
[27] 13 8 8 10 10 9 13 12 8 10 11 15 9 10 5 8 5 9 13 10 14 8 11 10 9 7
[53] 7 5 10 9 8 10 13 8 10 17 5 6 10 10 11 9 5 15 12 14 14 9 3 12 9 8
[79] 12 5 10 12 13 10 12 13 8 6 12 11 10 13 10 5 4 10 10 5 19 9 10 7 10 5
[105] 13 8 5 8 6 11 11 7 12 11 9 10 14 7 8 11 6 11 13 15 7 10 4 9 10 8
[131] 7 9 3 8 5 4 14 12 8 8 12 5 10 9 10 10 11 9 10 9 7 12 9 13 15 10
[157] 6 13 9 7 12 11 5 11 15 4 13 7 4 14 6 10 10 13 9
We probably should change the type of variables that client to a character variable and agemat to a factor.
wiscr$client <- as.character(wiscr$client)
wiscr$agemate <- as.factor(wiscr$agemate)
If you want to create tables.
# Table of sex in the Holzinger data set
table(holzinger$sex)
1 2
146 155
# Cross tab of sex and age from the Holzinger data set
xtabs(~ sex + ageyr, data = holzinger)
ageyr
sex 11 12 13 14 15 16
1 4 39 56 29 11 7
2 4 62 54 26 9 0
If you want to look at a specific row, for example, row 5.
holzinger[5,]
id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7
5 5 2 12 2 0 7 4.833333 4.75 0.875 2.666667 4 2.571429 3.695652
x8 x9
5 6.3 5.916667
Or you want to look at everyone that with sex equal to 2 and ageyr equal to 11.
holzinger[holzinger$sex == 2 & holzinger$ageyr == 11, ]
id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6
158 202 2 11 10 1 7 5.500000 5.50 2.125 2.666667 4.25 1.428571
164 209 2 11 11 1 7 4.666667 6.25 1.125 3.333333 4.50 1.571429
193 238 2 11 11 1 7 5.666667 7.00 1.250 3.000000 5.50 2.857143
205 250 2 11 11 1 7 5.166667 4.75 1.625 2.666667 5.50 2.428571
x7 x8 x9
158 2.826087 4.90 5.416667
164 4.173913 4.75 4.833333
193 3.521739 4.80 5.527778
205 2.652174 3.50 3.333333
If you want to look at everyone that has sex equal to 2 or ageyr equal to 11.
holzinger[holzinger$sex == 2 | holzinger$ageyr == 11, ]
id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6
2 2 2 13 7 0 7 5.3333333 5.25 2.125 1.6666667 3.00 1.2857143
3 3 2 13 1 0 7 4.5000000 5.25 1.875 1.0000000 1.75 0.4285714
5 5 2 12 2 0 7 4.8333333 4.75 0.875 2.6666667 4.00 2.5714286
6 6 2 14 1 0 7 5.3333333 5.00 2.250 1.0000000 3.00 0.8571429
8 8 2 12 2 0 7 5.6666667 6.25 1.875 3.6666667 4.25 1.2857143
9 9 2 13 0 0 7 4.5000000 5.75 1.500 2.6666667 5.75 2.7142857
10 11 2 12 5 0 7 3.5000000 5.25 0.750 2.6666667 5.00 2.5714286
13 14 2 12 7 0 7 5.6666667 4.50 4.125 2.6666667 4.00 2.2857143
14 15 2 12 8 0 7 6.0000000 5.50 1.750 4.6666667 4.00 1.5714286
16 17 2 12 1 0 7 4.6666667 4.75 2.375 2.6666667 4.25 0.7142857
17 18 2 14 11 0 7 4.3333333 4.75 1.500 2.0000000 4.00 1.2857143
19 20 2 12 8 0 7 5.6666667 5.25 4.000 4.3333333 5.25 3.7142857
20 21 2 12 3 0 7 6.3333333 8.75 3.000 3.6666667 3.75 2.5714286
24 25 2 12 8 0 7 3.8333333 5.50 1.625 2.6666667 3.00 1.0000000
28 29 2 13 2 0 7 6.0000000 5.00 2.125 1.6666667 3.00 1.1428571
29 30 2 12 5 0 7 4.6666667 6.00 4.250 2.0000000 3.00 2.0000000
30 31 2 12 2 0 7 5.0000000 4.50 0.750 2.6666667 3.25 1.8571429
31 33 2 12 7 0 7 3.5000000 5.75 1.375 2.0000000 3.50 1.8571429
33 35 2 12 2 0 7 5.0000000 5.25 1.750 2.6666667 5.25 2.0000000
34 36 2 12 3 0 7 4.1666667 6.00 2.375 3.3333333 4.25 1.8571429
35 38 2 13 3 0 7 3.3333333 3.75 1.500 1.3333333 3.00 0.8571429
38 41 2 12 8 0 7 3.8333333 4.50 2.250 3.0000000 3.00 1.7142857
39 42 2 12 6 0 7 6.3333333 4.00 3.875 4.0000000 5.25 3.2857143
41 44 2 12 1 0 7 3.8333333 5.75 1.625 2.6666667 4.25 2.1428571
42 45 2 13 6 0 7 3.1666667 5.00 1.250 1.6666667 4.25 1.5714286
43 46 2 13 8 0 7 1.8333333 5.25 1.000 1.6666667 3.75 0.5714286
48 51 2 13 6 0 7 3.1666667 4.75 1.375 2.6666667 3.25 1.2857143
50 54 2 13 2 0 7 4.5000000 6.25 1.125 3.6666667 5.50 2.1428571
59 65 2 14 0 0 7 3.8333333 6.50 2.000 1.0000000 2.50 1.2857143
61 67 2 12 5 0 7 5.3333333 5.75 3.375 4.0000000 5.50 2.5714286
62 68 2 12 9 0 7 4.0000000 6.00 3.625 2.6666667 5.00 2.1428571
63 69 2 12 10 0 7 5.3333333 6.75 1.375 1.6666667 2.50 0.7142857
64 70 2 13 11 0 7 5.3333333 5.00 1.250 3.3333333 4.50 2.4285714
65 71 2 13 1 0 7 3.6666667 5.75 3.625 2.6666667 5.00 1.4285714
66 72 2 12 11 0 7 6.5000000 6.00 2.500 3.6666667 4.50 3.0000000
68 74 2 13 0 0 7 4.6666667 5.75 3.625 2.6666667 2.75 1.8571429
69 75 2 13 9 0 7 2.8333333 5.00 0.875 3.0000000 2.50 1.5714286
71 77 2 14 2 0 7 0.6666667 4.50 0.750 2.0000000 3.00 1.0000000
75 81 2 12 9 0 7 4.1666667 5.25 2.125 3.0000000 3.25 0.7142857
76 82 2 13 9 0 7 3.8333333 5.25 2.375 1.6666667 2.25 1.0000000
79 86 2 14 7 0 8 6.3333333 5.50 4.125 4.0000000 3.25 1.8571429
84 91 2 13 2 0 8 4.8333333 6.50 1.750 2.3333333 3.50 0.7142857
85 93 2 14 7 0 8 3.0000000 5.25 0.625 1.0000000 1.50 1.0000000
86 94 2 12 11 0 8 6.3333333 6.25 2.500 3.3333333 5.75 3.7142857
87 95 2 13 8 0 8 5.5000000 7.00 2.875 2.6666667 4.00 0.8571429
88 96 2 13 3 0 8 5.3333333 6.00 2.750 4.0000000 5.00 2.1428571
89 97 2 13 11 0 8 3.3333333 7.25 3.250 1.3333333 1.25 0.5714286
90 98 2 13 10 0 8 5.5000000 5.00 2.250 3.6666667 4.50 1.7142857
92 100 2 14 0 0 8 3.8333333 6.25 3.375 2.0000000 3.75 0.4285714
98 106 2 13 2 0 8 7.5000000 7.50 2.125 5.3333333 6.75 2.7142857
99 108 2 13 11 0 8 5.0000000 3.75 1.375 1.6666667 2.75 1.4285714
102 111 2 13 10 0 8 6.5000000 6.50 1.875 3.6666667 3.75 1.7142857
106 115 2 12 10 0 8 4.5000000 6.50 3.125 2.6666667 5.50 2.1428571
107 116 2 13 5 0 8 6.8333333 6.50 0.750 5.3333333 5.75 3.1428571
108 117 2 13 10 0 8 5.5000000 6.75 3.750 2.6666667 2.75 2.0000000
109 118 2 14 5 0 8 6.3333333 5.50 1.625 5.6666667 5.50 4.0000000
110 119 2 13 3 0 8 4.1666667 4.25 2.250 3.3333333 3.50 1.4285714
112 121 2 13 1 0 8 6.3333333 5.25 2.625 4.3333333 4.25 1.5714286
114 123 2 14 0 0 8 5.1666667 6.50 1.750 4.0000000 5.25 1.8571429
116 125 2 13 1 0 8 4.1666667 5.75 1.000 3.6666667 4.75 1.8571429
121 131 2 14 4 0 8 4.8333333 6.00 1.500 2.6666667 4.25 1.2857143
123 133 2 14 4 0 8 7.5000000 7.00 4.250 4.0000000 6.50 3.5714286
124 134 2 15 8 0 8 5.3333333 5.25 4.125 2.3333333 3.75 1.5714286
126 136 2 15 4 0 8 5.3333333 8.75 3.125 4.3333333 4.00 1.1428571
127 137 2 13 11 0 8 4.3333333 7.00 1.000 4.3333333 5.50 3.0000000
128 138 2 13 11 0 8 4.8333333 7.00 1.125 3.3333333 5.00 2.2857143
129 139 2 13 4 0 8 6.0000000 7.50 3.250 3.0000000 6.00 2.4285714
132 143 2 14 1 0 8 5.6666667 5.50 3.500 4.0000000 4.25 2.2857143
133 144 2 15 3 0 8 5.0000000 6.25 1.750 1.3333333 2.00 0.5714286
134 145 2 13 6 0 8 3.5000000 5.25 2.250 2.0000000 3.75 1.8571429
139 150 2 14 2 0 8 5.5000000 6.75 4.500 2.3333333 4.50 2.5714286
140 151 2 14 7 0 8 5.1666667 5.00 2.500 3.3333333 5.75 3.0000000
141 152 2 14 0 0 8 3.1666667 3.75 1.500 3.3333333 5.75 2.2857143
142 153 2 14 9 0 8 5.0000000 7.50 4.500 2.0000000 1.50 0.4285714
144 155 2 12 9 0 8 7.3333333 6.75 4.000 6.0000000 6.50 6.1428571
145 156 2 15 8 0 8 2.0000000 5.50 0.625 2.6666667 5.00 0.8571429
146 157 2 13 6 0 8 3.8333333 5.50 1.875 3.6666667 5.00 3.0000000
147 158 2 13 8 0 8 4.1666667 6.00 1.250 4.0000000 5.00 2.5714286
148 159 2 15 8 0 8 4.6666667 5.75 1.625 0.6666667 2.50 1.4285714
154 166 2 15 1 0 8 5.1666667 6.00 2.375 1.3333333 2.25 1.1428571
155 167 2 15 7 0 8 6.3333333 6.75 1.125 3.0000000 2.50 1.4285714
156 168 2 15 6 0 8 4.8333333 5.75 1.250 3.0000000 4.75 2.1428571
158 202 2 11 10 1 7 5.5000000 5.50 2.125 2.6666667 4.25 1.4285714
160 204 1 11 11 1 7 4.8333333 5.75 1.125 3.0000000 4.75 1.5714286
162 206 2 12 6 1 7 5.0000000 6.25 2.500 3.3333333 5.75 2.5714286
163 208 2 12 8 1 7 6.0000000 8.25 4.500 5.6666667 6.25 5.8571429
164 209 2 11 11 1 7 4.6666667 6.25 1.125 3.3333333 4.50 1.5714286
165 210 2 12 5 1 7 5.0000000 6.25 1.375 3.6666667 5.25 1.1428571
166 211 2 12 5 1 7 3.3333333 6.25 0.750 3.0000000 5.25 2.2857143
170 215 2 12 8 1 7 2.8333333 5.25 0.750 1.6666667 2.50 1.4285714
173 218 2 12 1 1 7 5.5000000 7.75 3.750 3.6666667 5.75 2.5714286
175 220 2 12 7 1 7 5.0000000 5.50 2.500 2.6666667 4.25 2.8571429
176 221 2 12 1 1 7 6.0000000 7.00 2.750 4.3333333 6.00 5.1428571
179 224 2 12 6 1 7 5.0000000 6.00 2.375 4.6666667 6.50 3.4285714
181 226 2 12 3 1 7 5.5000000 6.75 2.000 2.6666667 4.25 1.8571429
182 227 2 12 4 1 7 5.3333333 5.50 1.875 3.0000000 5.00 2.4285714
184 229 2 14 7 1 7 4.5000000 5.75 0.500 3.0000000 2.75 1.0000000
187 232 2 12 0 1 7 2.8333333 7.50 1.625 3.0000000 4.25 1.8571429
192 237 2 12 10 1 7 6.3333333 6.25 1.625 3.0000000 5.75 2.1428571
193 238 2 11 11 1 7 5.6666667 7.00 1.250 3.0000000 5.50 2.8571429
194 239 2 13 8 1 7 3.0000000 5.50 0.625 0.6666667 1.00 0.2857143
195 240 2 12 7 1 7 2.6666667 5.00 1.000 2.0000000 4.50 1.8571429
196 241 2 12 4 1 7 3.0000000 7.50 2.125 6.3333333 6.00 4.7142857
197 242 2 12 6 1 7 5.3333333 5.25 1.125 5.0000000 5.00 3.5714286
200 245 2 12 9 1 7 4.6666667 5.00 1.750 2.6666667 4.50 1.4285714
201 246 2 12 3 1 7 6.5000000 6.00 3.125 4.6666667 4.25 1.5714286
202 247 2 12 4 1 7 5.3333333 6.50 1.250 3.6666667 5.75 3.2857143
203 248 2 12 10 1 7 4.6666667 6.00 1.000 2.0000000 2.50 1.4285714
205 250 2 11 11 1 7 5.1666667 4.75 1.625 2.6666667 5.50 2.4285714
206 251 2 12 8 1 7 4.8333333 6.50 3.125 3.3333333 5.50 2.5714286
207 252 1 11 11 1 7 4.8333333 7.50 1.875 2.3333333 4.00 1.7142857
209 254 2 12 4 1 7 4.1666667 6.25 3.250 1.6666667 1.75 0.7142857
211 257 2 12 3 1 7 6.1666667 7.75 2.000 4.6666667 7.00 2.8571429
212 258 2 12 11 1 7 5.0000000 6.50 2.375 2.3333333 5.25 2.0000000
213 259 1 11 5 1 7 4.8333333 6.00 1.250 3.0000000 5.25 2.4285714
214 260 2 12 4 1 7 6.1666667 8.50 2.125 6.0000000 6.00 3.8571429
215 261 2 12 0 1 7 4.6666667 5.00 0.500 2.6666667 4.50 0.8571429
217 263 2 12 4 1 7 6.3333333 5.25 2.250 6.0000000 6.50 4.4285714
220 266 2 13 7 1 7 4.6666667 6.75 1.625 1.6666667 1.50 0.5714286
222 268 2 14 0 1 7 1.8333333 5.00 1.125 4.3333333 4.00 2.1428571
223 269 2 12 11 1 7 7.5000000 9.25 3.625 3.3333333 4.00 2.2857143
224 270 2 13 0 1 7 3.1666667 3.75 0.875 5.0000000 4.75 2.7142857
225 271 2 12 8 1 7 6.8333333 5.25 1.375 3.0000000 4.00 2.0000000
227 273 2 12 0 1 7 5.6666667 6.00 2.000 3.0000000 3.50 1.4285714
230 276 1 11 4 1 7 6.0000000 7.00 4.125 5.0000000 6.00 5.5714286
231 277 2 14 0 1 7 3.1666667 5.75 0.750 2.3333333 3.25 2.7142857
232 278 2 12 9 1 7 4.5000000 5.25 0.875 3.0000000 4.75 3.1428571
233 279 2 12 5 1 7 3.5000000 2.25 1.750 5.0000000 6.25 4.2857143
235 281 2 12 7 1 7 4.3333333 5.50 1.000 1.6666667 2.75 1.5714286
237 283 2 13 1 1 8 5.5000000 5.50 1.750 4.3333333 6.25 3.7142857
238 284 2 12 11 1 8 6.3333333 6.50 0.875 5.3333333 6.25 2.1428571
239 285 2 13 2 1 8 5.5000000 6.25 4.250 5.0000000 5.75 3.0000000
242 288 2 13 2 1 8 5.1666667 6.25 3.500 3.3333333 6.25 3.8571429
243 289 2 13 1 1 8 6.0000000 7.00 1.625 3.6666667 5.75 2.5714286
245 291 2 14 2 1 8 4.6666667 6.00 0.750 2.6666667 4.25 2.1428571
249 295 2 13 3 1 8 5.8333333 6.25 3.125 3.6666667 6.00 2.7142857
250 296 2 12 10 1 8 3.8333333 5.25 0.375 3.0000000 5.25 1.7142857
251 297 2 13 5 1 8 5.1666667 7.00 3.125 4.0000000 5.75 2.1428571
253 299 2 14 9 1 8 5.5000000 7.00 2.250 5.0000000 5.75 4.2857143
254 300 2 15 11 1 8 3.5000000 6.50 2.125 4.0000000 4.00 4.7142857
255 302 2 13 9 1 8 6.1666667 8.00 1.375 5.3333333 6.25 4.5714286
259 306 2 13 0 1 8 6.5000000 8.50 4.125 3.3333333 4.75 2.2857143
260 307 2 13 8 1 8 3.0000000 4.00 0.500 3.3333333 5.25 2.7142857
261 308 2 12 9 1 8 4.6666667 7.00 2.250 3.3333333 6.25 3.2857143
263 310 2 13 5 1 8 4.1666667 5.00 1.375 4.0000000 4.75 2.4285714
265 312 2 13 6 1 8 5.3333333 5.25 1.875 4.6666667 5.00 4.1428571
267 314 2 13 5 1 8 5.8333333 7.00 2.375 6.0000000 6.75 4.7142857
269 316 2 13 5 1 8 6.3333333 7.50 4.125 4.6666667 5.50 4.1428571
270 317 2 13 2 1 8 4.3333333 6.25 0.875 3.3333333 3.75 1.8571429
271 318 2 13 7 1 8 5.5000000 7.00 1.500 4.6666667 5.25 2.0000000
273 321 2 13 3 1 8 4.5000000 5.50 2.000 2.6666667 4.75 2.1428571
274 322 2 14 4 1 8 4.3333333 7.25 1.250 2.6666667 4.75 1.8571429
283 331 2 14 1 1 8 4.0000000 6.00 2.000 2.6666667 4.50 1.7142857
285 334 2 14 9 1 8 5.0000000 5.75 1.250 2.6666667 3.50 1.7142857
289 338 2 13 1 1 8 4.0000000 6.25 0.750 3.0000000 4.75 3.2857143
293 342 2 12 11 1 8 5.6666667 5.50 1.625 3.3333333 4.25 2.0000000
295 344 2 13 0 1 8 5.8333333 7.00 1.250 3.0000000 3.25 1.5714286
298 347 2 14 10 1 8 3.0000000 6.00 1.625 2.3333333 4.00 1.0000000
299 348 2 14 3 1 8 4.6666667 5.50 1.875 3.6666667 5.75 4.2857143
x7 x8 x9
2 3.782609 6.25 7.916667
3 3.260870 3.90 4.416667
5 3.695652 6.30 5.916667
6 4.347826 6.65 7.500000
8 3.391304 5.15 3.666667
9 4.521739 4.65 7.361111
10 4.130435 4.55 4.361111
13 5.869565 5.20 5.861111
14 5.130435 4.70 4.444444
16 4.086957 3.80 5.138889
17 3.695652 6.65 5.250000
19 3.913044 4.85 5.750000
20 3.478261 5.35 4.916667
24 5.826087 5.30 6.777778
28 2.826087 5.55 4.416667
29 5.130435 5.85 8.611111
30 4.652174 4.85 5.444444
31 4.826087 6.95 5.972222
33 2.695652 4.30 4.805556
34 5.391304 4.35 5.638889
35 2.782609 5.20 4.833333
38 4.086957 3.50 5.083333
39 5.521739 5.45 5.111111
41 4.913043 5.10 4.638889
42 2.826087 5.30 4.777778
43 3.956522 4.75 2.777778
48 6.130435 8.00 5.444444
50 2.565217 4.80 5.527778
59 3.391304 4.55 4.833333
61 5.260870 6.20 6.138889
62 4.913043 5.35 4.777778
63 2.652174 3.85 5.333333
64 5.521739 5.45 5.833333
65 3.304348 4.50 5.027778
66 5.695652 5.40 6.305556
68 3.869565 6.10 4.250000
69 6.826087 5.70 3.916667
71 3.608696 5.50 4.611111
75 3.478261 5.70 4.583333
76 3.130435 5.75 4.666667
79 5.565217 7.35 5.750000
84 5.782609 5.75 4.972222
85 4.956522 4.85 4.583333
86 4.608696 6.85 5.472222
87 4.086957 5.40 5.972222
88 5.521739 4.55 5.138889
89 3.565217 5.55 4.888889
90 4.173913 5.95 6.666667
92 5.391304 7.80 6.111111
98 5.130435 5.55 7.222222
99 5.565217 6.10 4.611111
102 4.521739 4.40 6.583333
106 6.347826 6.50 6.166667
107 7.434783 5.70 5.194444
108 4.173913 6.80 7.000000
109 5.695652 6.40 7.527778
110 4.478261 4.15 3.361111
112 2.869565 6.00 5.444444
114 4.695652 4.30 6.000000
116 4.826087 5.85 5.416667
121 5.826087 6.40 6.861111
123 3.739130 7.60 6.500000
124 3.652174 5.35 4.777778
126 4.000000 6.95 5.666667
127 5.130435 5.65 4.916667
128 4.913043 5.25 4.972222
129 3.304348 5.10 5.777778
132 7.260870 6.35 7.194444
133 4.913043 6.00 4.805556
134 4.695652 5.25 3.722222
139 6.260870 6.55 6.888889
140 6.434783 8.30 7.083333
141 6.043478 5.25 6.222222
142 3.304348 5.25 5.722222
144 5.347826 5.75 6.611111
145 2.391304 5.60 5.972222
146 5.608696 4.90 3.861111
147 6.956522 6.25 6.305556
148 4.521739 5.00 4.750000
154 5.391304 6.15 5.194444
155 4.695652 5.60 4.138889
156 3.608696 6.15 4.694444
158 2.826087 4.90 5.416667
160 4.956522 5.15 4.000000
162 4.086957 5.65 5.583333
163 5.608696 6.95 9.250000
164 4.173913 4.75 4.833333
165 4.478261 5.70 5.472222
166 3.869565 5.05 4.944444
170 4.304348 4.35 4.083333
173 2.869565 5.65 5.166667
175 4.130435 5.50 4.472222
176 3.565217 5.15 5.694444
179 4.869565 5.75 5.138889
181 2.956522 5.80 6.083333
182 3.217391 5.90 5.305556
184 2.391304 5.90 4.944444
187 2.260870 4.70 3.972222
192 2.434783 6.50 5.500000
193 3.521739 4.80 5.527778
194 1.304348 3.05 3.111111
195 4.043478 5.55 5.138889
196 4.000000 5.40 4.111111
197 4.130435 4.60 4.944444
200 2.652174 5.00 5.944444
201 3.565217 5.20 6.777778
202 2.695652 4.15 3.972222
203 2.652174 3.85 4.416667
205 2.652174 3.50 3.333333
206 3.695652 6.40 5.611111
207 2.782609 3.70 4.583333
209 2.000000 3.60 3.361111
211 3.826087 5.70 6.194444
212 3.869565 6.05 6.166667
213 4.347826 6.05 5.722222
214 3.478261 5.60 6.361111
215 3.173913 4.20 4.472222
217 5.217391 5.45 5.694444
220 4.391304 5.30 4.777778
222 3.173913 3.65 3.611111
223 3.173913 5.05 6.111111
224 4.391304 5.60 3.222222
225 5.391304 5.95 6.111111
227 3.869565 5.80 6.555556
230 3.521739 5.20 5.833333
231 2.956522 4.85 4.138889
232 3.826087 5.45 5.305556
233 2.956522 4.55 5.027778
235 3.304348 5.70 4.333333
237 3.652174 6.35 6.388889
238 4.739130 5.55 5.555556
239 4.739130 7.15 6.833333
242 4.478261 4.90 4.666667
243 5.826087 5.10 6.222222
245 4.000000 4.75 5.972222
249 5.739130 5.90 5.194444
250 5.782609 5.55 6.527778
251 6.043478 5.50 5.527778
253 4.173913 6.10 5.888889
254 5.130435 5.60 3.666667
255 3.478261 4.75 4.750000
259 4.130435 5.95 6.666667
260 2.956522 4.00 3.472222
261 3.782609 4.65 4.722222
263 4.826087 6.25 5.722222
265 5.086957 5.40 6.583333
267 6.304348 5.85 6.277778
269 4.217391 6.00 7.000000
270 5.000000 6.45 5.083333
271 4.608696 5.30 5.194444
273 5.869565 7.10 6.250000
274 3.695652 5.45 6.111111
283 3.347826 4.70 3.750000
285 4.260870 6.60 6.666667
289 5.217391 6.55 5.722222
293 5.478261 6.00 4.500000
295 4.173913 4.85 5.777778
298 4.608696 6.05 6.083333
299 4.000000 6.00 7.611111
Let’s say you figured out that person with id 5 was suppose to have sex equal to 1 not to 2.
holzinger[holzinger$id == 5, "sex"]
[1] 2
holzinger[holzinger$id == 5, "sex"] <- 1
holzinger[holzinger$id == 5, "sex"]
[1] 1
# Actually they were suppose to be a 2. Let's change it back
holzinger[holzinger$id == 5, "sex"] <- 2
We saw that we can get basic summary statistics from R by:
summary(holzinger)
id sex ageyr agemo school
Min. : 1.0 Min. :1.000 Min. :11 Min. : 0.000 Min. :0.0000
1st Qu.: 82.0 1st Qu.:1.000 1st Qu.:12 1st Qu.: 2.000 1st Qu.:0.0000
Median :163.0 Median :2.000 Median :13 Median : 5.000 Median :0.0000
Mean :176.6 Mean :1.515 Mean :13 Mean : 5.375 Mean :0.4817
3rd Qu.:272.0 3rd Qu.:2.000 3rd Qu.:14 3rd Qu.: 8.000 3rd Qu.:1.0000
Max. :351.0 Max. :2.000 Max. :16 Max. :11.000 Max. :1.0000
grade x1 x2 x3 x4
Min. :-999.000 Min. :0.6667 Min. :2.250 Min. :0.250 Min. :0.000
1st Qu.: 7.000 1st Qu.:4.1667 1st Qu.:5.250 1st Qu.:1.375 1st Qu.:2.333
Median : 7.000 Median :5.0000 Median :6.000 Median :2.125 Median :3.000
Mean : 4.133 Mean :4.9358 Mean :6.088 Mean :2.250 Mean :3.061
3rd Qu.: 8.000 3rd Qu.:5.6667 3rd Qu.:6.750 3rd Qu.:3.125 3rd Qu.:3.667
Max. : 8.000 Max. :8.5000 Max. :9.250 Max. :4.500 Max. :6.333
x5 x6 x7 x8 x9
Min. :1.000 Min. :0.1429 Min. :1.304 Min. : 3.050 Min. :2.778
1st Qu.:3.500 1st Qu.:1.4286 1st Qu.:3.478 1st Qu.: 4.850 1st Qu.:4.750
Median :4.500 Median :2.0000 Median :4.087 Median : 5.500 Median :5.417
Mean :4.341 Mean :2.1856 Mean :4.186 Mean : 5.527 Mean :5.374
3rd Qu.:5.250 3rd Qu.:2.7143 3rd Qu.:4.913 3rd Qu.: 6.100 3rd Qu.:6.083
Max. :7.000 Max. :6.1429 Max. :7.435 Max. :10.000 Max. :9.250
Hmm, grade is -999. That seems strange. I bet that’s a missing data value. Let’s treat it as such.
holzinger[holzinger$grade == -999, "grade"] <- NA
summary(holzinger)
id sex ageyr agemo school
Min. : 1.0 Min. :1.000 Min. :11 Min. : 0.000 Min. :0.0000
1st Qu.: 82.0 1st Qu.:1.000 1st Qu.:12 1st Qu.: 2.000 1st Qu.:0.0000
Median :163.0 Median :2.000 Median :13 Median : 5.000 Median :0.0000
Mean :176.6 Mean :1.515 Mean :13 Mean : 5.375 Mean :0.4817
3rd Qu.:272.0 3rd Qu.:2.000 3rd Qu.:14 3rd Qu.: 8.000 3rd Qu.:1.0000
Max. :351.0 Max. :2.000 Max. :16 Max. :11.000 Max. :1.0000
grade x1 x2 x3 x4
Min. :7.000 Min. :0.6667 Min. :2.250 Min. :0.250 Min. :0.000
1st Qu.:7.000 1st Qu.:4.1667 1st Qu.:5.250 1st Qu.:1.375 1st Qu.:2.333
Median :7.000 Median :5.0000 Median :6.000 Median :2.125 Median :3.000
Mean :7.477 Mean :4.9358 Mean :6.088 Mean :2.250 Mean :3.061
3rd Qu.:8.000 3rd Qu.:5.6667 3rd Qu.:6.750 3rd Qu.:3.125 3rd Qu.:3.667
Max. :8.000 Max. :8.5000 Max. :9.250 Max. :4.500 Max. :6.333
NA's :1
x5 x6 x7 x8 x9
Min. :1.000 Min. :0.1429 Min. :1.304 Min. : 3.050 Min. :2.778
1st Qu.:3.500 1st Qu.:1.4286 1st Qu.:3.478 1st Qu.: 4.850 1st Qu.:4.750
Median :4.500 Median :2.0000 Median :4.087 Median : 5.500 Median :5.417
Mean :4.341 Mean :2.1856 Mean :4.186 Mean : 5.527 Mean :5.374
3rd Qu.:5.250 3rd Qu.:2.7143 3rd Qu.:4.913 3rd Qu.: 6.100 3rd Qu.:6.083
Max. :7.000 Max. :6.1429 Max. :7.435 Max. :10.000 Max. :9.250
What if we want variance or standard deviation?
apply(holzinger, 2, var, na.rm = T)
id sex ageyr agemo school grade
1.122296e+04 2.506091e-01 1.103322e+00 1.191526e+01 2.504983e-01 2.502899e-01
x1 x2 x3 x4 x5 x6
1.362898e+00 1.386390e+00 1.279114e+00 1.355167e+00 1.665318e+00 1.200346e+00
x7 x8 x9
1.187083e+00 1.025389e+00 1.018387e+00
apply(holzinger, 2, sd, na.rm = T)
id sex ageyr agemo school grade x1
105.9384781 0.5006087 1.0503915 3.4518488 0.5004981 0.5002898 1.1674321
x2 x3 x4 x5 x6 x7 x8
1.1774506 1.1309794 1.1641163 1.2904722 1.0956031 1.0895335 1.0126151
x9
1.0091517
# Alterative, if we use the var() function on the data set we create a variance-covariance matrix.
diag(var(holzinger, use = "pairwise.complete.obs"))
id sex ageyr agemo school grade
1.122296e+04 2.506091e-01 1.103322e+00 1.191526e+01 2.504983e-01 2.502899e-01
x1 x2 x3 x4 x5 x6
1.362898e+00 1.386390e+00 1.279114e+00 1.355167e+00 1.665318e+00 1.200346e+00
x7 x8 x9
1.187083e+00 1.025389e+00 1.018387e+00
If you want to get a correlation matrix.
cor(holzinger, use = "complete.obs")
id sex ageyr agemo school grade
id 1.000000000 -0.02934052 0.006890281 -0.012032364 0.899717471 0.332238889
sex -0.029340517 1.00000000 -0.161828572 0.024381005 -0.018692005 -0.025152477
ageyr 0.006890281 -0.16182857 1.000000000 -0.244202783 -0.251028024 0.511329621
agemo -0.012032364 0.02438100 -0.244202783 1.000000000 -0.008195569 -0.003602712
school 0.899717471 -0.01869201 -0.251028024 -0.008195569 1.000000000 -0.048625183
grade 0.332238889 -0.02515248 0.511329621 -0.003602712 -0.048625183 1.000000000
x1 0.042549630 -0.08301280 -0.059998260 0.053127255 -0.003087534 0.171948019
x2 0.122378809 -0.11863772 -0.015259252 0.045328706 0.092251245 0.139541113
x3 -0.147127654 -0.17934638 0.037230487 0.034105071 -0.221709257 0.138766019
x4 0.269331848 0.12156266 -0.198020923 -0.016864885 0.211316591 0.207891762
x5 0.302481294 0.06187084 -0.222783739 -0.036890348 0.275294444 0.173473441
x6 0.280615027 0.01403554 -0.173897973 0.007405401 0.247527378 0.155478412
x7 -0.093489514 0.11581376 0.110717769 0.057457395 -0.235047885 0.349553387
x8 0.075997773 -0.03741871 0.238878673 0.009258502 -0.042084041 0.295627995
x9 0.031768594 0.04526865 0.098931578 0.028503928 -0.044270793 0.217055327
x1 x2 x3 x4 x5 x6
id 0.042549630 0.12237881 -0.14712765 0.26933185 0.30248129 0.280615027
sex -0.083012804 -0.11863772 -0.17934638 0.12156266 0.06187084 0.014035544
ageyr -0.059998260 -0.01525925 0.03723049 -0.19802092 -0.22278374 -0.173897973
agemo 0.053127255 0.04532871 0.03410507 -0.01686489 -0.03689035 0.007405401
school -0.003087534 0.09225125 -0.22170926 0.21131659 0.27529444 0.247527378
grade 0.171948019 0.13954111 0.13876602 0.20789176 0.17347344 0.155478412
x1 1.000000000 0.29735169 0.44331478 0.37394016 0.29605145 0.358896277
x2 0.297351685 1.00000000 0.34066453 0.15313110 0.13994136 0.192998751
x3 0.443314783 0.34066453 1.00000000 0.15724038 0.07383541 0.195327656
x4 0.373940162 0.15313110 0.15724038 1.00000000 0.73306452 0.704177712
x5 0.296051447 0.13994136 0.07383541 0.73306452 1.00000000 0.719116628
x6 0.358896277 0.19299875 0.19532766 0.70417771 0.71911663 1.000000000
x7 0.066737828 -0.07569338 0.07235378 0.17406840 0.10258274 0.121523992
x8 0.227204249 0.09293888 0.18224292 0.10484699 0.13424822 0.146174610
x9 0.390186974 0.20600565 0.32990344 0.20831513 0.22869014 0.215052378
x7 x8 x9
id -0.09348951 0.075997773 0.03176859
sex 0.11581376 -0.037418709 0.04526865
ageyr 0.11071777 0.238878673 0.09893158
agemo 0.05745740 0.009258502 0.02850393
school -0.23504788 -0.042084041 -0.04427079
grade 0.34955339 0.295627995 0.21705533
x1 0.06673783 0.227204249 0.39018697
x2 -0.07569338 0.092938877 0.20600565
x3 0.07235378 0.182242919 0.32990344
x4 0.17406840 0.104846993 0.20831513
x5 0.10258274 0.134248219 0.22869014
x6 0.12152399 0.146174610 0.21505238
x7 1.00000000 0.488808123 0.34061205
x8 0.48880812 1.000000000 0.45150669
x9 0.34061205 0.451506687 1.00000000
We can do a simple t-test.
t.test(x1 ~ sex, data = holzinger)
Welch Two Sample t-test
data: x1 by sex
t = 1.4095, df = 298.9, p-value = 0.1597
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.07489415 0.45293216
sample estimates:
mean in group 1 mean in group 2
5.033105 4.844086
And we can do regressions as the following
mod1 <- lm(x9 ~ x7 + x8, data = holzinger)
summary(mod1)
Call:
lm(formula = x9 ~ x7 + x8, data = holzinger)
Residuals:
Min 1Q Median 3Q Max
-2.45409 -0.63860 0.01326 0.59138 3.13874
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.70953 0.29470 9.194 < 2e-16 ***
x7 0.14819 0.05421 2.734 0.00664 **
x8 0.36987 0.05832 6.342 8.42e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8936 on 298 degrees of freedom
Multiple R-squared: 0.2211, Adjusted R-squared: 0.2159
F-statistic: 42.31 on 2 and 298 DF, p-value: < 2.2e-16
anova(mod1)
Analysis of Variance Table
Response: x9
Df Sum Sq Mean Sq F value Pr(>F)
x7 1 35.452 35.452 44.398 1.294e-10 ***
x8 1 32.112 32.112 40.216 8.416e-10 ***
Residuals 298 237.952 0.798
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
And get some diagnostic information
plot(mod1, which = 1:5)