R for psychologists 2: data wrangling

Alejandro de la Vega
2/28/2014

Review

We covered the basics last time

  • Data types (character, numeric, logical, integer)
  • Data objects (data frames are awesome)
  • Factors (categorical variables & subject IDs)
  • Indexing (data[ROW, COLUMN] & data$name)
  • Transform (adding a column of same length)
  • Subset (Selecting part of data frame based on condition)

Some loose ends

Time for some disparate but necessary loose ends

Getting help

  • help() or ?
  • ?? for package help
?scale

Column names

colnames(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
[5] "Species"     
colnames(iris)[1] = "Length"
colnames(iris) = c("SLength", "SWidth", "PLength", "PWidth", "Species")
colnames(iris)
[1] "SLength" "SWidth"  "PLength" "PWidth"  "Species"

A note on missing values

  • Missing values stored as NA
  • If input has value as “NA” will get coded as missing
  SLength SWidth PLength PWidth Species
1     5.1    3.5     1.4     NA  setosa
2     4.9    3.0     1.4    0.2  setosa
3      NA    3.2     1.3    0.2  setosa
4     4.6    3.1     1.5    0.2  setosa
na.omit(iris) # removes rows with NAs
  SLength SWidth PLength PWidth Species
2     4.9    3.0     1.4    0.2  setosa
4     4.6    3.1     1.5    0.2  setosa
  • Note row numbers….

Dealing with NAs without removing

mean(iris$SLength)
[1] NA
  • Why?
mean(iris$SLength, na.rm=T)
[1] 4.867

Installing & loading packages

  • Automatically downloads from the internet
  • Simply use the name of the package
install.packages('package_name')
  • Or: Tools -> Install packages in RStudio
  • To load:
library(package_name)

Data manipulation

  • Now that you've got some fundumentals down…
  • Crucial to be able to manipulate your data in R
  • If you know the proper tools, no need to ever go to Excel!

Reshaping your data

  • Data is commonly stored in “wide-format”

    • Easier to enter and for humans to read
    • One row per “group”
  • R likes long format data because its a computer

    • One per observation
  sub angry neutral sad happy
1   1     2       5   3     7
2   2     1       4   3     9
3   3     3       6   3     7
  • How would you convert to long normally?

Reshape2 package

  • reshape2 package makes this very easy

  • melt() - converts wide to long

  • cast() - long back to wide

    • Can use this to do some human readible summarization as well

melt()

melt(data, id.vars=c("id"), measure.vars=c("var"),value.name = "val")
melt(face_ratings, id.vars=c("sub"), value.name="rating")
   sub variable rating
1    1    angry      2
2    2    angry      1
3    3    angry      3
4    1  neutral      5
5    2  neutral      4
6    3  neutral      6
7    1      sad      3
8    2      sad      3
9    3      sad      3
10   1    happy      7
11   2    happy      9
12   3    happy      7

melt()

melt(face_ratings, id.vars=c("sub"), measure.vars=c("sad", "happy"), value.name="rating")
  sub variable rating
1   1      sad      3
2   2      sad      3
3   3      sad      3
4   1    happy      7
5   2    happy      9
6   3    happy      7

_cast()

  • dcast & acast
dcast(data, x1 + x2 ~ y1 + y2)
  • … = all
dcast(melt_face, ...~variable)
  sub angry neutral sad happy
1   1     2       5   3     7
2   2     1       4   3     9
3   3     3       6   3     7

More fun with dcast()

  • Can be used for aggregation. Be careful
  • If more than one observation per column, will aggregate
head(trustData)
  sub condition value delay choice
1   1         N    11     4      1
2   1         N    18    90      1
3   1         N    14     4      0
4   1         N    14     7      0
5   1         N    30     7      0
6   1         N    11    90      1

Aggregation with dcast()

dcast(trustData, sub~condition, margins=T)
     sub   N   T  U (all)
1      1  37   0 11    48
2      2  38   0 10    48
3      3  43   0  5    48
4      4  39   0  9    48
5      5  44   0  4    48
6      6  44   0  4    48
7      7   0  45  4    49
8      8   0  41  8    49
9      9   0  42  7    49
10    10   0  32 17    49
11 (all) 245 160 79   484
  • I will admit I don't often use this but maybe I should

dplyr package

  • Split-Apply-Combine
  • Composed of many “verbs” or actions you can take on data frames
  • Many are incremental upgrades on base R (10-10k faster)
  • Play nice with each other + simple

Data frame wrapper

  • The following functions return a slightly altered data frame
  • Doesn't print extremely large data sets but similar
library(dplyr)
tbl_df(trustData)
Source: local data frame [484 x 5]

   sub condition value delay choice
1    1         N    11     4      1
2    1         N    18    90      1
3    1         N    14     4      0
4    1         N    14     7      0
5    1         N    30     7      0
6    1         N    11    90      1
7    1         N    18     4      1
8    1         N    26    21      1
9    1         N    26     7      0
10   1         N    18    14      1
.. ...       ...   ...   ...    ...

Data manipulation functions ("verbs")

  • Single data frame as first argument
  • Perform a single operation
  • Return altered data frame (remember must save)
new_data <- function(data, args, ...)

Filter rows with filter()

  • Very similar to subset()
  • Arguments are joined as “and”
filter(data, logical conditions, optional conditions)
  • Example data:
  sub condition value delay choice
1   1         N    11     4      1
2   1         N    18    90      1
3   1         N    14     4      0
4   1         N    14     7      0
5   1         N    30     7      0
6   1         N    11    90      1
7   1         N    18     4      1

Filter problem

  • Select rows with delay of 4 and choice of 1
head(trustData)
  sub condition value delay choice
1   1         N    11     4      1
2   1         N    18    90      1
3   1         N    14     4      0
4   1         N    14     7      0
5   1         N    30     7      0
6   1         N    11    90      1
trustData$delay == 4  & trustData$choice == 1
[1]  TRUE FALSE FALSE FALSE FALSE FALSE

Filter problem

filter(trustData, delay == 4, choice == 1)
   sub condition value delay choice
1    1         N    11     4      1
2    1         N    18     4      1
3    1         N    30     4      1
4    1         N    26     4      1
5    1         N    34     4      1
6    2         N    11     4      1
7    2         N    14     4      1
8    2         N    30     4      1
9    2         N    26     4      1
10   2         N    22     4      1

Multiple conditions on same column

  • Three conditions (N, T, U)
  • Select N & T trials
filter(trustData, condition == "N",
       condition == "T")
[1] sub       condition value     delay     choice   
<0 rows> (or 0-length row.names)

Multiple conditions on same column

  • No single trail meets both condition, must use “or”
filter(trustData, 
       condition == "N" | condition == "T")
  • Shorter and better if many “or"s:
filter(trustData, condition %in% c("N", "T"))
    sub condition value delay choice
1     1         N    11     4      1
2     1         N    18    90      1
3     1         N    14     4      0
4     1         N    14     7      0
5     1         N    30     7      0
6     1         N    11    90      1
7     1         N    18     4      1
8     1         N    26    21      1
9     1         N    26     7      0
10    1         N    18    14      1
11    1         N    30     4      1
12    1         N    26    90      1
13    1         N    30   150      0
14    1         N    22     7      1
15    1         N    14    21      1
16    1         N    30    42      1
17    1         N    22   150      1
18    1         N    22    42      1
19    1         N    14    42      1
20    1         N    18     7      0
21    1         N    34    42      1
22    1         N    11    14      1
23    1         N    22    90      0
24    1         N    14    90      1
25    1         N    14   150      1
26    1         N    34    90      1
27    1         N    26    14      1
28    1         N    11    21      1
29    1         N    18   150      1
30    1         N    14    14      1
31    1         N    22    21      1
32    1         N    26     4      1
33    1         N    30    21      1
34    1         N    18    21      1
35    1         N    26    42      1
36    1         N    34    21      1
37    1         N    34     4      1
38    2         N    11     4      1
39    2         N    18    90      1
40    2         N    22    14      1
41    2         N    14     4      1
42    2         N    11     7      1
43    2         N    30     7      1
44    2         N    34     7      1
45    2         N    26    21      1
46    2         N    26     7      1
47    2         N    18    14      1
48    2         N    30     4      1
49    2         N    30   150      1
50    2         N    34   150      1
51    2         N    22     7      1
52    2         N    30    42      1
53    2         N    22   150      0
54    2         N    30    90      1
55    2         N    22    42      1
56    2         N    14    42      0
57    2         N    18     7      1
58    2         N    34    42      1
59    2         N    11   150      0
60    2         N    26   150      0
61    2         N    22    90      1
62    2         N    30    14      1
63    2         N    14    90      0
64    2         N    34    90      1
65    2         N    26    14      1
66    2         N    18   150      0
67    2         N    14    14      1
68    2         N    22    21      1
69    2         N    26     4      1
70    2         N    30    21      1
71    2         N    18    21      1
72    2         N    26    42      1
73    2         N    34    21      1
74    2         N    22     4      1
75    2         N    34     4      1
76    3         N    11     4      1
77    3         N    18    90      0
78    3         N    22    14      1
79    3         N    14     4      1
80    3         N    14     7      0
81    3         N    11     7      1
82    3         N    30     7      1
83    3         N    34     7      1
84    3         N    11    90      1
85    3         N    18     4      0
86    3         N    26    21      1
87    3         N    26     7      1
88    3         N    18    14      1
89    3         N    11    42      1
90    3         N    30     4      1
91    3         N    26    90      1
92    3         N    30   150      1
93    3         N    34   150      1
94    3         N    22     7      1
95    3         N    14    21      0
96    3         N    30    42      1
97    3         N    22   150      1
98    3         N    18    42      1
99    3         N    30    90      1
100   3         N    22    42      0
101   3         N    14    42      1
102   3         N    18     7      1
103   3         N    34    42      0
104   3         N    11   150      0
105   3         N    11    14      0
106   3         N    26   150      1
107   3         N    30    14      1
108   3         N    14    90      1
109   3         N    14   150      1
110   3         N    34    90      0
111   3         N    26    14      1
112   3         N    11    21      0
113   3         N    14    14      0
114   3         N    22    21      1
115   3         N    26     4      1
116   3         N    30    21      0
117   3         N    18    21      1
118   3         N    26    42      1
119   4         N    18    90      0
120   4         N    22    14      1
121   4         N    14     4      1
122   4         N    14     7      1
123   4         N    11     7      1
124   4         N    30     7      1
125   4         N    34     7      1
126   4         N    11    90      0
127   4         N    18     4      1
128   4         N    26    21      1
129   4         N    26     7      1
130   4         N    18    14      1
131   4         N    34   150      0
132   4         N    22     7      1
133   4         N    14    21      1
134   4         N    30    42      1
135   4         N    22   150      1
136   4         N    18    42      0
137   4         N    30    90      1
138   4         N    22    42      1
139   4         N    14    42      1
140   4         N    18     7      1
141   4         N    34    42      1
142   4         N    11   150      1
143   4         N    11    14      1
144   4         N    26   150      1
145   4         N    22    90      1
146   4         N    30    14      0
147   4         N    14    90      1
148   4         N    14   150      1
149   4         N    34    90      1
150   4         N    26    14      1
151   4         N    11    21      0
152   4         N    18   150      1
153   4         N    14    14      1
154   4         N    22    21      1
155   4         N    26     4      1
156   4         N    30    21      1
157   4         N    18    21      1
158   5         N    11     4      1
159   5         N    18    90      1
160   5         N    22    14      0
161   5         N    14     4      0
162   5         N    14     7      0
163   5         N    11     7      0
164   5         N    30     7      0
165   5         N    34     7      1
166   5         N    11    90      1
167   5         N    18     4      0
168   5         N    26    21      1
169   5         N    26     7      1
170   5         N    18    14      1
171   5         N    11    42      0
172   5         N    30     4      0
173   5         N    26    90      0
174   5         N    30   150      1
175   5         N    34   150      1
176   5         N    22     7      0
177   5         N    14    21      0
178   5         N    30    42      0
179   5         N    22   150      0
180   5         N    18    42      1
181   5         N    30    90      1
182   5         N    22    42      1
183   5         N    14    42      1
184   5         N    18     7      1
185   5         N    34    42      1
186   5         N    11   150      0
187   5         N    11    14      1
188   5         N    26   150      0
189   5         N    22    90      1
190   5         N    30    14      1
191   5         N    14    90      1
192   5         N    14   150      1
193   5         N    34    90      0
194   5         N    26    14      1
195   5         N    11    21      0
196   5         N    18   150      1
197   5         N    14    14      1
198   5         N    22    21      0
199   5         N    26     4      0
200   5         N    22     4      1
201   5         N    34     4      0
202   6         N    11     4      0
203   6         N    18    90      1
204   6         N    22    14      0
205   6         N    14     4      1
206   6         N    14     7      0
207   6         N    11     7      0
208   6         N    30     7      0
209   6         N    34     7      1
210   6         N    11    90      0
211   6         N    18     4      0
212   6         N    26    21      1
213   6         N    26     7      1
214   6         N    18    14      1
215   6         N    11    42      0
216   6         N    30     4      1
217   6         N    26    90      1
218   6         N    30   150      1
219   6         N    34   150      1
220   6         N    22     7      0
221   6         N    14    21      1
222   6         N    30    42      1
223   6         N    22   150      1
224   6         N    18    42      1
225   6         N    30    90      1
226   6         N    22    42      1
227   6         N    14    42      1
228   6         N    18     7      1
229   6         N    34    42      1
230   6         N    11   150      0
231   6         N    11    14      1
232   6         N    14   150      1
233   6         N    34    90      0
234   6         N    26    14      0
235   6         N    11    21      0
236   6         N    18   150      0
237   6         N    14    14      1
238   6         N    22    21      0
239   6         N    26     4      0
240   6         N    30    21      0
241   6         N    18    21      0
242   6         N    26    42      1
243   6         N    34    21      0
244   6         N    22     4      0
245   6         N    34     4      0
246   7         T    18    42      0
247   7         T    18   150      0
248   7         T    30     7      1
249   7         T    26    14      1
250   7         T    14   150      0
251   7         T    14    21      0
252   7         T    26    90      1
253   7         T    18    21      0
254   7         T    26    21      1
255   7         T    30    21      1
256   7         T    26    42      1
257   7         T    18     7      0
258   7         T    11    21      0
259   7         T    34     7      1
260   7         T    34    21      1
261   7         T    30    42      1
262   7         T    14     7      0
263   7         T    34    42      1
264   7         T    18    90      1
265   7         T    30    14      1
266   7         T    18    14      1
267   7         T    34     4      1
268   7         T    14     4      1
269   7         T    30     4      1
270   7         T    26     7      1
271   7         T    22    21      1
272   7         T    18     4      1
273   7         T    11    90      0
274   7         T    22   150      1
275   7         T    11     7      1
276   7         T    34    90      1
277   7         T    14    14      1
278   7         T    22     7      0
279   7         T    22     4      1
280   7         T    11    42      0
281   7         T    14    42      0
282   7         T    11    14      0
283   7         T    30    90      1
284   7         T    26   150      1
285   7         T    26     4      1
286   7         T    11     4      0
287   7         T    30   150      1
288   7         T    22    14      1
289   7         T    34   150      1
290   7         T    22    42      1
291   8         T    14   150      0
292   8         T    14    21      1
293   8         T    26    90      0
294   8         T    18    21      1
295   8         T    26    21      1
296   8         T    30    21      1
297   8         T    26    42      1
298   8         T    18     7      1
299   8         T    11    21      1
300   8         T    34     7      1
301   8         T    34    21      1
302   8         T    30    42      1
303   8         T    14     7      1
304   8         T    34    42      1
305   8         T    18    90      0
306   8         T    34    14      1
307   8         T    11   150      0
308   8         T    22    90      0
309   8         T    14    90      0
310   8         T    30    14      1
311   8         T    18    14      1
312   8         T    34     4      1
313   8         T    14     4      1
314   8         T    30     4      1
315   8         T    26     7      1
316   8         T    22    21      1
317   8         T    18     4      1
318   8         T    11    90      0
319   8         T    22   150      0
320   8         T    11     7      1
321   8         T    34    90      0
322   8         T    14    14      1
323   8         T    22     7      1
324   8         T    22     4      1
325   8         T    26   150      0
326   8         T    26     4      1
327   8         T    11     4      1
328   8         T    30   150      0
329   8         T    22    14      1
330   8         T    34   150      0
331   8         T    22    42      1
332   9         T    18    42      0
333   9         T    18   150      0
334   9         T    30     7      1
335   9         T    26    14      1
336   9         T    14   150      0
337   9         T    14    21      1
338   9         T    26    90      0
339   9         T    18    21      0
340   9         T    26    21      1
341   9         T    30    21      0
342   9         T    26    42      0
343   9         T    18     7      1
344   9         T    11    21      1
345   9         T    34     7      1
346   9         T    34    21      0
347   9         T    30    42      0
348   9         T    14     7      0
349   9         T    34    42      1
350   9         T    18    90      1
351   9         T    30    14      1
352   9         T    18    14      0
353   9         T    34     4      1
354   9         T    14     4      1
355   9         T    30     4      1
356   9         T    26     7      1
357   9         T    22    21      1
358   9         T    18     4      1
359   9         T    11    90      1
360   9         T    22   150      1
361   9         T    11     7      1
362   9         T    34    90      0
363   9         T    14    14      1
364   9         T    22     7      1
365   9         T    22     4      1
366   9         T    11    42      1
367   9         T    14    42      0
368   9         T    11    14      1
369   9         T    30    90      0
370   9         T    26   150      0
371   9         T    26     4      1
372   9         T    11     4      1
373   9         T    30   150      0
374  10         T    18   150      0
375  10         T    30     7      1
376  10         T    26    14      1
377  10         T    14   150      0
378  10         T    14    21      0
379  10         T    26    90      1
380  10         T    18    21      0
381  10         T    26    21      1
382  10         T    30    21      1
383  10         T    26    42      1
384  10         T    18     7      1
385  10         T    14     7      1
386  10         T    34    42      1
387  10         T    18    90      1
388  10         T    34    14      1
389  10         T    11   150      0
390  10         T    22    90      1
391  10         T    14     4      1
392  10         T    30     4      1
393  10         T    26     7      1
394  10         T    22    21      1
395  10         T    18     4      1
396  10         T    11    90      0
397  10         T    22   150      0
398  10         T    11     7      1
399  10         T    34    90      1
400  10         T    14    14      0
401  10         T    11    14      1
402  10         T    30    90      1
403  10         T    22    14      1
404  10         T    34   150      0
405  10         T    22    42      1

Does not equal (!=)

filter(trustData, condition != "N", delay == 4)
   sub condition value delay choice
1    7         T    34     4      1
2    7         T    14     4      1
3    7         T    30     4      1
4    7         T    18     4      1
5    7         T    22     4      1
6    7         T    26     4      1
7    7         T    11     4      0
8    8         T    34     4      1
9    8         T    14     4      1
10   8         T    30     4      1
11   8         T    18     4      1
12   8         T    22     4      1
13   8         T    26     4      1
14   8         T    11     4      1
15   9         T    34     4      1
16   9         T    14     4      1
17   9         T    30     4      1
18   9         T    18     4      1
19   9         T    22     4      1
20   9         T    26     4      1
21   9         T    11     4      1
22  10         T    14     4      1
23  10         T    30     4      1
24  10         T    18     4      1
25   1         U    22     4      1
26   2         U    18     4      1
27   3         U    22     4      0
28   3         U    34     4      0
29   4         U    11     4      0
30   4         U    30     4      0
31   4         U    22     4      1
32   4         U    34     4      1
33  10         U    34     4      0
34  10         U    22     4      1
35  10         U    26     4      0
36  10         U    11     4      1

Arrange rows with arrange()

  • Note similar syntax
  • Sorts data frame
arrange(trustData, delay)
    sub condition value delay choice
1     1         N    11     4      1
2     1         N    14     4      0
3     1         N    18     4      1
4     1         N    30     4      1
5     1         N    26     4      1
6     1         N    34     4      1
7     2         N    11     4      1
8     2         N    14     4      1
9     2         N    30     4      1
10    2         N    26     4      1
11    2         N    22     4      1
12    2         N    34     4      1
13    3         N    11     4      1
14    3         N    14     4      1
15    3         N    18     4      0
16    3         N    30     4      1
17    3         N    26     4      1
18    4         N    14     4      1
19    4         N    18     4      1
20    4         N    26     4      1
21    5         N    11     4      1
22    5         N    14     4      0
23    5         N    18     4      0
24    5         N    30     4      0
25    5         N    26     4      0
26    5         N    22     4      1
27    5         N    34     4      0
28    6         N    11     4      0
29    6         N    14     4      1
30    6         N    18     4      0
31    6         N    30     4      1
32    6         N    26     4      0
33    6         N    22     4      0
34    6         N    34     4      0
35    7         T    34     4      1
36    7         T    14     4      1
37    7         T    30     4      1
38    7         T    18     4      1
39    7         T    22     4      1
40    7         T    26     4      1
41    7         T    11     4      0
42    8         T    34     4      1
43    8         T    14     4      1
44    8         T    30     4      1
45    8         T    18     4      1
46    8         T    22     4      1
47    8         T    26     4      1
48    8         T    11     4      1
49    9         T    34     4      1
50    9         T    14     4      1
51    9         T    30     4      1
52    9         T    18     4      1
53    9         T    22     4      1
54    9         T    26     4      1
55    9         T    11     4      1
56   10         T    14     4      1
57   10         T    30     4      1
58   10         T    18     4      1
59    1         U    22     4      1
60    2         U    18     4      1
61    3         U    22     4      0
62    3         U    34     4      0
63    4         U    11     4      0
64    4         U    30     4      0
65    4         U    22     4      1
66    4         U    34     4      1
67   10         U    34     4      0
68   10         U    22     4      1
69   10         U    26     4      0
70   10         U    11     4      1
71    1         N    14     7      0
72    1         N    30     7      0
73    1         N    26     7      0
74    1         N    22     7      1
75    1         N    18     7      0
76    2         N    11     7      1
77    2         N    30     7      1
78    2         N    34     7      1
79    2         N    26     7      1
80    2         N    22     7      1
81    2         N    18     7      1
82    3         N    14     7      0
83    3         N    11     7      1
84    3         N    30     7      1
85    3         N    34     7      1
86    3         N    26     7      1
87    3         N    22     7      1
88    3         N    18     7      1
89    4         N    14     7      1
90    4         N    11     7      1
91    4         N    30     7      1
92    4         N    34     7      1
93    4         N    26     7      1
94    4         N    22     7      1
95    4         N    18     7      1
96    5         N    14     7      0
97    5         N    11     7      0
98    5         N    30     7      0
99    5         N    34     7      1
100   5         N    26     7      1
101   5         N    22     7      0
102   5         N    18     7      1
103   6         N    14     7      0
104   6         N    11     7      0
105   6         N    30     7      0
106   6         N    34     7      1
107   6         N    26     7      1
108   6         N    22     7      0
109   6         N    18     7      1
110   7         T    30     7      1
111   7         T    18     7      0
112   7         T    34     7      1
113   7         T    14     7      0
114   7         T    26     7      1
115   7         T    11     7      1
116   7         T    22     7      0
117   8         T    18     7      1
118   8         T    34     7      1
119   8         T    14     7      1
120   8         T    26     7      1
121   8         T    11     7      1
122   8         T    22     7      1
123   9         T    30     7      1
124   9         T    18     7      1
125   9         T    34     7      1
126   9         T    14     7      0
127   9         T    26     7      1
128   9         T    11     7      1
129   9         T    22     7      1
130  10         T    30     7      1
131  10         T    18     7      1
132  10         T    14     7      1
133  10         T    26     7      1
134  10         T    11     7      1
135   1         U    11     7      0
136   1         U    34     7      1
137   2         U    14     7      1
138   8         U    30     7      1
139  10         U    34     7      0
140  10         U    22     7      1
141   1         N    18    14      1
142   1         N    11    14      1
143   1         N    26    14      1
144   1         N    14    14      1
145   2         N    22    14      1
146   2         N    18    14      1
147   2         N    30    14      1
148   2         N    26    14      1
149   2         N    14    14      1
150   3         N    22    14      1
151   3         N    18    14      1
152   3         N    11    14      0
153   3         N    30    14      1
154   3         N    26    14      1
155   3         N    14    14      0
156   4         N    22    14      1
157   4         N    18    14      1
158   4         N    11    14      1
159   4         N    30    14      0
160   4         N    26    14      1
161   4         N    14    14      1
162   5         N    22    14      0
163   5         N    18    14      1
164   5         N    11    14      1
165   5         N    30    14      1
166   5         N    26    14      1
167   5         N    14    14      1
168   6         N    22    14      0
169   6         N    18    14      1
170   6         N    11    14      1
171   6         N    26    14      0
172   6         N    14    14      1
173   7         T    26    14      1
174   7         T    30    14      1
175   7         T    18    14      1
176   7         T    14    14      1
177   7         T    11    14      0
178   7         T    22    14      1
179   8         T    34    14      1
180   8         T    30    14      1
181   8         T    18    14      1
182   8         T    14    14      1
183   8         T    22    14      1
184   9         T    26    14      1
185   9         T    30    14      1
186   9         T    18    14      0
187   9         T    14    14      1
188   9         T    11    14      1
189  10         T    26    14      1
190  10         T    34    14      1
191  10         T    14    14      0
192  10         T    11    14      1
193  10         T    22    14      1
194   1         U    22    14      1
195   1         U    30    14      1
196   2         U    11    14      0
197   6         U    30    14      1
198   7         U    34    14      1
199   8         U    26    14      1
200   8         U    11    14      1
201   9         U    34    14      0
202   9         U    22    14      0
203  10         U    30    14      0
204  10         U    18    14      0
205   1         N    26    21      1
206   1         N    14    21      1
207   1         N    11    21      1
208   1         N    22    21      1
209   1         N    30    21      1
210   1         N    18    21      1
211   1         N    34    21      1
212   2         N    26    21      1
213   2         N    22    21      1
214   2         N    30    21      1
215   2         N    18    21      1
216   2         N    34    21      1
217   3         N    26    21      1
218   3         N    14    21      0
219   3         N    11    21      0
220   3         N    22    21      1
221   3         N    30    21      0
222   3         N    18    21      1
223   4         N    26    21      1
224   4         N    14    21      1
225   4         N    11    21      0
226   4         N    22    21      1
227   4         N    30    21      1
228   4         N    18    21      1
229   5         N    26    21      1
230   5         N    14    21      0
231   5         N    11    21      0
232   5         N    22    21      0
233   6         N    26    21      1
234   6         N    14    21      1
235   6         N    11    21      0
236   6         N    22    21      0
237   6         N    30    21      0
238   6         N    18    21      0
239   6         N    34    21      0
240   7         T    14    21      0
241   7         T    18    21      0
242   7         T    26    21      1
243   7         T    30    21      1
244   7         T    11    21      0
245   7         T    34    21      1
246   7         T    22    21      1
247   8         T    14    21      1
248   8         T    18    21      1
249   8         T    26    21      1
250   8         T    30    21      1
251   8         T    11    21      1
252   8         T    34    21      1
253   8         T    22    21      1
254   9         T    14    21      1
255   9         T    18    21      0
256   9         T    26    21      1
257   9         T    30    21      0
258   9         T    11    21      1
259   9         T    34    21      0
260   9         T    22    21      1
261  10         T    14    21      0
262  10         T    18    21      0
263  10         T    26    21      1
264  10         T    30    21      1
265  10         T    22    21      1
266   2         U    14    21      0
267   2         U    11    21      1
268   3         U    34    21      0
269   4         U    34    21      0
270   5         U    30    21      0
271   5         U    18    21      0
272   5         U    34    21      0
273  10         U    11    21      0
274  10         U    34    21      0
275   1         N    30    42      1
276   1         N    22    42      1
277   1         N    14    42      1
278   1         N    34    42      1
279   1         N    26    42      1
280   2         N    30    42      1
281   2         N    22    42      1
282   2         N    14    42      0
283   2         N    34    42      1
284   2         N    26    42      1
285   3         N    11    42      1
286   3         N    30    42      1
287   3         N    18    42      1
288   3         N    22    42      0
289   3         N    14    42      1
290   3         N    34    42      0
291   3         N    26    42      1
292   4         N    30    42      1
293   4         N    18    42      0
294   4         N    22    42      1
295   4         N    14    42      1
296   4         N    34    42      1
297   5         N    11    42      0
298   5         N    30    42      0
299   5         N    18    42      1
300   5         N    22    42      1
301   5         N    14    42      1
302   5         N    34    42      1
303   6         N    11    42      0
304   6         N    30    42      1
305   6         N    18    42      1
306   6         N    22    42      1
307   6         N    14    42      1
308   6         N    34    42      1
309   6         N    26    42      1
310   7         T    18    42      0
311   7         T    26    42      1
312   7         T    30    42      1
313   7         T    34    42      1
314   7         T    11    42      0
315   7         T    14    42      0
316   7         T    22    42      1
317   8         T    26    42      1
318   8         T    30    42      1
319   8         T    34    42      1
320   8         T    22    42      1
321   9         T    18    42      0
322   9         T    26    42      0
323   9         T    30    42      0
324   9         T    34    42      1
325   9         T    11    42      1
326   9         T    14    42      0
327  10         T    26    42      1
328  10         T    34    42      1
329  10         T    22    42      1
330   1         U    11    42      1
331   1         U    18    42      1
332   2         U    11    42      0
333   2         U    18    42      0
334   4         U    11    42      0
335   4         U    26    42      0
336   5         U    26    42      0
337   8         U    18    42      0
338   8         U    11    42      0
339   8         U    14    42      0
340   9         U    22    42      0
341  10         U    18    42      0
342  10         U    30    42      1
343  10         U    11    42      0
344  10         U    14    42      0
345   1         N    18    90      1
346   1         N    11    90      1
347   1         N    26    90      1
348   1         N    22    90      0
349   1         N    14    90      1
350   1         N    34    90      1
351   2         N    18    90      1
352   2         N    30    90      1
353   2         N    22    90      1
354   2         N    14    90      0
355   2         N    34    90      1
356   3         N    18    90      0
357   3         N    11    90      1
358   3         N    26    90      1
359   3         N    30    90      1
360   3         N    14    90      1
361   3         N    34    90      0
362   4         N    18    90      0
363   4         N    11    90      0
364   4         N    30    90      1
365   4         N    22    90      1
366   4         N    14    90      1
367   4         N    34    90      1
368   5         N    18    90      1
369   5         N    11    90      1
370   5         N    26    90      0
371   5         N    30    90      1
372   5         N    22    90      1
373   5         N    14    90      1
374   5         N    34    90      0
375   6         N    18    90      1
376   6         N    11    90      0
377   6         N    26    90      1
378   6         N    30    90      1
379   6         N    34    90      0
380   7         T    26    90      1
381   7         T    18    90      1
382   7         T    11    90      0
383   7         T    34    90      1
384   7         T    30    90      1
385   8         T    26    90      0
386   8         T    18    90      0
387   8         T    22    90      0
388   8         T    14    90      0
389   8         T    11    90      0
390   8         T    34    90      0
391   9         T    26    90      0
392   9         T    18    90      1
393   9         T    11    90      1
394   9         T    34    90      0
395   9         T    30    90      0
396  10         T    26    90      1
397  10         T    18    90      1
398  10         T    22    90      1
399  10         T    11    90      0
400  10         T    34    90      1
401  10         T    30    90      1
402   1         U    30    90      1
403   2         U    11    90      0
404   2         U    26    90      1
405   3         U    22    90      1
406   4         U    26    90      0
407   6         U    22    90      0
408   6         U    14    90      0
409   7         U    22    90      0
410   7         U    14    90      1
411   8         U    30    90      0
412   9         U    22    90      0
413   9         U    14    90      1
414  10         U    14    90      0
415   1         N    30   150      0
416   1         N    22   150      1
417   1         N    14   150      1
418   1         N    18   150      1
419   2         N    30   150      1
420   2         N    34   150      1
421   2         N    22   150      0
422   2         N    11   150      0
423   2         N    26   150      0
424   2         N    18   150      0
425   3         N    30   150      1
426   3         N    34   150      1
427   3         N    22   150      1
428   3         N    11   150      0
429   3         N    26   150      1
430   3         N    14   150      1
431   4         N    34   150      0
432   4         N    22   150      1
433   4         N    11   150      1
434   4         N    26   150      1
435   4         N    14   150      1
436   4         N    18   150      1
437   5         N    30   150      1
438   5         N    34   150      1
439   5         N    22   150      0
440   5         N    11   150      0
441   5         N    26   150      0
442   5         N    14   150      1
443   5         N    18   150      1
444   6         N    30   150      1
445   6         N    34   150      1
446   6         N    22   150      1
447   6         N    11   150      0
448   6         N    14   150      1
449   6         N    18   150      0
450   7         T    18   150      0
451   7         T    14   150      0
452   7         T    22   150      1
453   7         T    26   150      1
454   7         T    30   150      1
455   7         T    34   150      1
456   8         T    14   150      0
457   8         T    11   150      0
458   8         T    22   150      0
459   8         T    26   150      0
460   8         T    30   150      0
461   8         T    34   150      0
462   9         T    18   150      0
463   9         T    14   150      0
464   9         T    22   150      1
465   9         T    26   150      0
466   9         T    30   150      0
467  10         T    18   150      0
468  10         T    14   150      0
469  10         T    11   150      0
470  10         T    22   150      0
471  10         T    34   150      0
472   1         U    34   150      1
473   1         U    11   150      1
474   1         U    26   150      1
475   2         U    14   150      0
476   3         U    18   150      0
477   4         U    30   150      0
478   6         U    26   150      1
479   7         U    11   150      1
480   8         U    18   150      0
481   9         U    11   150      1
482   9         U    34   150      0
483  10         U    26   150      0
484  10         U    30   150      0

Arrange by multiple rows

  • desc() - descending
arrange(trustData, desc(delay), choice)
  sub condition value delay choice
1   1         N    30   150      0
2   2         N    22   150      0
3   2         N    11   150      0
4   2         N    26   150      0
5   2         N    18   150      0
6   3         N    11   150      0
  • Base R:
trustData[order(desc(trustData$delay),
                trustData$choice), ]

Select columns with select()

  • Can be done using subset()
  • Select only “delay” and “choice” columns
select(trustData, delay, choice)
  delay choice
1     4      1
2    90      1
3     4      0
  • Select columns between “sub” and “value”
select(trustData, sub:value)
  sub condition value
1   1         N    11
2   1         N    18
3   1         N    14

Excluding columns

select(trustData, -sub, -choice)
  condition value delay
1         N    11     4
2         N    18    90
3         N    14     4
select(trustData, -(value:delay))
  sub condition choice
1   1         N      1
2   1         N      1
3   1         N      0

Add new columns with mutate()

  • Just like transform(), but you can reference newly generated columns:
mutate(trustData, VBD = value*delay, VBD2 = VBD * 2)
  sub condition value delay choice  VBD VBD2
1   1         N    11     4      1   44   88
2   1         N    18    90      1 1620 3240
3   1         N    14     4      0   56  112
4   1         N    14     7      0   98  196
5   1         N    30     7      0  210  420
6   1         N    11    90      1  990 1980
7   1         N    18     4      1   72  144
8   1         N    26    21      1  546 1092

Summarise value with summarise()

  • Collapses data frame into single value
  • Not very useful yet:
summarise(trustData, mean_choice = mean(choice))
  mean_choice
1      0.6467

Split-apply-combine problem

  • Every subject tasted 4 types of potato fries
  • 9 time points, 3 treatments, 2 repetitions
  • First, you want some descriptive stats for the various levels above
  • Second, you want to see if there's an effect of treatment, then separately for each fry type
[1] "time"      "treatment" "subject"   "rep"       "potato"    "buttery"  
[7] "grassy"    "rancid"    "painty"   
   subject potato buttery grassy rancid painty
61       3    2.9     0.0    0.0    0.0    5.5
25       3   14.0     0.0    0.0    1.1    0.0
62      10   11.0     6.4    0.0    0.0    0.0
26      10    9.9     5.9    2.9    2.2    0.0
63      15    1.2     0.1    0.0    1.1    5.1
27      15    8.8     3.0    3.6    1.5    2.3

First make into long format

mfries <- melt(french_fries, id.vars=c("time", "treatment", "subject", "rep"), value.name="rating", variable.name="type")
head(mfries)
  time treatment subject rep   type rating
1    1         1       3   1 potato    2.9
2    1         1       3   2 potato   14.0
3    1         1      10   1 potato   11.0
4    1         1      10   2 potato    9.9
5    1         1      15   1 potato    1.2
6    1         1      15   2 potato    8.8
  • How would you go about getting mean and SD for each fry type?

For loops? Apply?

  • Like any fully-fledged programming language, can use for and if loops
  • However, they are not reccomended because
    • Slow
    • Not explicit
    • Better way to do things!
  • Apply?
    • Yes but often confusing

Grouped operations

  • Can apply the various verbs above “by group” on a data frame
  • group_by() breaks down data into groups of rows
group_by(data, var1, var2, etc)
group_by(mfries, type, treatment)
Source: local data frame [5 x 6]
Groups: type, treatment

  time treatment subject rep   type rating
1    1         1       3   1 potato    2.9
2    1         1       3   2 potato   14.0
3    1         1      10   1 potato   11.0
4    1         1      10   2 potato    9.9
5    1         1      15   1 potato    1.2

group_by() and fries problems

  • What is the mean rating for each type of fry?
by_type  <- group_by(mfries, type)
summarise(by_type, rating = mean(rating), 
          sd = sd(rating))
Source: local data frame [5 x 3]

     type rating        sd
1  painty 2.5218       NaN
2  rancid 3.8522       NaN
3  grassy 0.6642 8.784e+75
4 buttery 1.8237       NaN
5  potato 6.9525       NaN

Complex grouping

  • What is the mean rating for each fry under each treatment?
type_treat <- group_by(mfries, type, treatment)
summarise(type_treat, rating = mean(rating))
Source: local data frame [15 x 3]
Groups: type

      type treatment rating
1   painty         3 2.5255
2   painty         2 2.4558
3   painty         1 2.5836
4   rancid         3 3.8667
5   rancid         2 3.6246
6   rancid         1 4.0655
7   grassy         3 0.6805
8   grassy         2 0.6629
9   grassy         1 0.6491
10 buttery         3 1.7177
11 buttery         2 1.9739
12 buttery         1 1.7801
13  potato         3 6.9680
14  potato         2 7.0017
15  potato         1 6.8879
  • Break up into two slides - showing results of complex grouping

n() & n_distinct()

  • Special function to count observations:
summarise(type_treat, n = n(), 
          nd = n_distinct(rating))
Source: local data frame [15 x 4]
Groups: type

      type treatment   n  nd
1   painty         3 231  76
2   painty         2 231  75
3   painty         1 232  74
4   rancid         3 231  91
5   rancid         2 232  95
6   rancid         1 232  87
7   grassy         3 231  39
8   grassy         2 232  40
9   grassy         1 232  32
10 buttery         3 231  59
11 buttery         2 230  65
12 buttery         1 231  62
13  potato         3 231 107
14  potato         2 232 117
15  potato         1 232 113

How about mutate?

  • Remember, mutate returns df of same length
  • Example: you want to standardize ratings WITHIN treatment
by_treat <- group_by(mfries2,treatment)
mutate(by_treat, scale_rating=scale(rating))
Source: local data frame [3,471 x 5]
Groups: treatment

   treatment subject   type rating scale_rating
1          1       3 potato    2.9     -0.07806
2          1       3 potato   14.0      2.86450
3          1      10 potato   11.0      2.06921
4          1      10 potato    9.9      1.77761
5          1      15 potato    1.2     -0.52873
6          1      15 potato    8.8      1.48600
7          1      16 potato    9.0      1.53902
8          1      16 potato    8.2      1.32694
9          1      19 potato    7.0      1.00883
10         1      19 potato   13.0      2.59941
..       ...     ...    ...    ...          ...

Same applies to filter()

  • Can filter rows by group specific criterion
  • Only include rows with above rating average for group
filter(by_treat, rating > mean(rating))
Source: local data frame [1,307 x 4]
Groups: treatment

   treatment subject   type rating
1          1       3 potato   14.0
2          1      10 potato   11.0
3          1      10 potato    9.9
4          1      15 potato    8.8
5          1      16 potato    9.0
6          1      16 potato    8.2
7          1      19 potato    7.0
8          1      19 potato   13.0
9          1      31 potato   12.2
10         1      31 potato   12.8
..       ...     ...    ...    ...

Chaining!

  • It's not ideal to have “intermediary” data frames
  • dplyr functions have no “side effects”
  • Chaining makes it easy to put together operations
  • %.% syntax

%.% syntax

  • f(x, y) = x %.% f(y)
mfries %.%
  group_by(type) %.%
  summarise(mean_rating = mean(rating))
Source: local data frame [5 x 2]

     type mean_rating
1  painty      2.5218
2  rancid      3.8522
3  grassy      0.6642
4 buttery      1.8237
5  potato      6.9525

Makes complex operations easy!

output <- mfries %.%
  filter(treatment != 1) %.%
  group_by(type) %.%
  filter(rating > mean(rating)) %.%
  summarise(mean_rating = mean(rating))

output
Source: local data frame [5 x 2]

     type mean_rating
1  painty       6.517
2  rancid       7.555
3  grassy       2.209
4 buttery       4.592
5  potato       9.754

Join problems

  • Merge data frames using common identifiers
  • Example: join based on subject ID
head(trustData, n = 3)
  sub condition value delay choice
1   1         N    11     4      1
2   1         N    18    90      1
3   1         N    14     4      0
head(qc, n=3)
  sub age gender
1   1  27   male
2   2  31   <NA>
3   3  30   male
  • How would you do it?

Types of joins (in dplyr)

  • inner_join(x, y): matching x + y
  • left_join(x, y): all x + matching y
  • semi_join(x, y): all x with match in y
  • anti_join(x, y): all x without match in y

Using join()

inner_join(x, y, by = c("id1", "id2"))
inner_join(trust_data, qc1, by  = c("sub"))
  sub condition value delay choice age gender
1   1         N    11     4      1  27   male
2   1         N    18    90      1  27   male
3   1         N    14     4      0  27   male
4   1         N    14     7      0  27   male
  • Will automatically detect IDs as well!
  • Careful with order!

Excluding based on other data frame

  • Only want to include male subject
  • Don't want to add column
males <- filter(qc, gender == "male")
anti_join(trustData, males)
  sub age gender
1   1  27   male
2   2  31   <NA>
3   3  30   male
  sub condition value delay choice
1  10         T    18   150      0
2  10         T    30     7      1
3  10         T    26    14      1