Key “verbs
•select: return a subset of the columns of a data frame, using a flexible notation
•filter: extract a subset of rows from a data frame based on logical conditions
•arrange: reorder rows of a data frame
•rename: rename variables in a data frame
•mutate: add new variables/columns or transform existing variables
•summarise / summarize: generate summary statistics of different variables in the data frame, possibly within strata
•%>%: the “pipe” operator is used to connect multiple verb actions together into a pipeline
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
chicago <- readRDS("chicago.rds")
dim(chicago)
[1] 6940 8
str(chicago)
'data.frame': 6940 obs. of 8 variables:
$ city : chr "chic" "chic" "chic" "chic" ...
$ tmpd : num 31.5 33 33 29 32 40 34.5 29 26.5 32.5 ...
$ dptp : num 31.5 29.9 27.4 28.6 28.9 ...
$ date : Date, format: ...
$ pm25tmean2: num NA NA NA NA NA NA NA NA NA NA ...
$ pm10tmean2: num 34 NA 34.2 47 NA ...
$ o3tmean2 : num 4.25 3.3 3.33 4.38 4.75 ...
$ no2tmean2 : num 20 23.2 23.8 30.4 30.3 ...
select
The select() function can be used to select columns of a data frame that you want to focus on.
选择特定列[直接指定]
names(chicago)[1:3]
[1] "city" "tmpd" "dptp"
subset <- select(chicago, city:dptp)
head(subset)
选择特定列[通过不显示一些列,间接指定]
subset2 <- select(chicago, -(city:dptp))
head(subset2)
同样地,还有另一种方式
i <- match("city", names(chicago)) # 1
j <- match("dptp", names(chicago)) # 3
head(chicago[, -(i:j)])
通过列名字符来选取特定列
subset <- select(chicago, ends_with("2"))
str(subset)
'data.frame': 6940 obs. of 4 variables:
$ pm25tmean2: num NA NA NA NA NA NA NA NA NA NA ...
$ pm10tmean2: num 34 NA 34.2 47 NA ...
$ o3tmean2 : num 4.25 3.3 3.33 4.38 4.75 ...
$ no2tmean2 : num 20 23.2 23.8 30.4 30.3 ...
subset <- select(chicago, starts_with("d"))
str(subset)
'data.frame': 6940 obs. of 2 variables:
$ dptp: num 31.5 29.9 27.4 28.6 28.9 ...
$ date: Date, format: ...
filter
The filter() function is used to extract subsets of rows from a data frame.
chic.f <- filter(chicago, pm25tmean2 > 30 & tmpd > 80)
select(chic.f, date, tmpd, pm25tmean2)
summary(chic.f$pm25tmean2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
31.20 32.70 33.90 36.68 38.85 51.54
arrange
The arrange() function is used to reorder rows of a data frame according to one of the variables/columns.
chicago <- arrange(chicago, date)
head(select(chicago, date, pm25tmean2), 3)
tail(select(chicago, date, pm25tmean2), 3)
降序排列
chicago <- arrange(chicago, desc(date))
head(select(chicago, date, pm25tmean2), 3)
tail(select(chicago, date, pm25tmean2), 3)
rename
head(chicago[, 1:5], 3)
chicago <- rename(chicago, dewpoint=dptp, pm25=pm25tmean2)
head(chicago[, 1:5], 3)
注意,新的名字在前,旧的名字在后
mutate
The mutate() function exists to compute transformations of variables in a data frame.
chicago <- mutate(chicago, pm25detrend = pm25 - mean(pm25, na.rm = TRUE))
head(chicago)
There is also the related transmute() function, which does the same thing as mutate() but then drops all non-transformed variables.
head(transmute(chicago,
pm10detrend = pm10tmean2 - mean(pm10tmean2, na.rm = TRUE),
o3detrend = o3tmean2 - mean(o3tmean2, na.rm = TRUE)))
group_by
The group_by() function is used to generate summary statistics from the data frame within strata defined by a variable.
chicago <- mutate(chicago, year=as.POSIXlt(date)$year + 1900)
years <- group_by(chicago, year)
summarise(years, pm25=mean(pm25, na.rm = TRUE),
o3 = max(o3tmean2, na.rm = TRUE),
n02 = median(no2tmean2, na.rm = TRUE))
一个更复杂的例子: what are the average levels of ozone (o3) and nitrogen dioxide (no2) within quintiles of pm25?
qq <- quantile(chicago$pm25, seq(0, 1, 0.2), na.rm = TRUE)
chicago <- mutate(chicago, pm25.quint = cut(pm25, qq))
quint <- group_by(chicago, pm25.quint)
summarize(quint, o3=mean(o3tmean2, na.rm = TRUE),
no2 = mean(no2tmean2, na.rm = TRUE))
%>%
The pipeline operater %>% is very handy for stringing together multiple dplyr functions in a sequence of operations.
使用管道操作符,使得我们可以一次执行一系列的命令,而免于嵌套和创建一些无用的中间变量。
mutate(chicago, pm25.quint=cut(pm25, qq)) %>%
group_by(pm25.quint) %>%
summarise(o3=mean(o3tmean2, na.rm = TRUE),
no2 = mean(no2tmean2, na.rm=TRUE))
Notice in the above code that I pass the chicago data frame to the first call to mutate(), but then afterwards I do not have to pass the first argument to group_by() or summarize(). Once you travel down the pipeline with %>%, the first argument is taken to be the output of the previous element in the pipeline.
Another example might be computing the average pollutant level by month. This could be useful to see if there are any seasonal trends in the data.
mutate(chicago, month=as.POSIXlt(date)$mon+1)%>%
group_by(month)%>%
summarise(pm25=mean(pm25, na.rm=TRUE),
o3 = max(o3tmean2, na.rm=TRUE),
no2=median(no2tmean2, na.rm=TRUE))
LS0tCnRpdGxlOiAiRURBLWRwbHlyIHN0dWR5IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyMgT3ZlcnZpZXcKCiAgRURB5a2m5Lmg77yM5qC55o2uIEV4cGxvcmF0b3J5IERhdGEgQW5hbHlzaXMgd2l0aCBSIOi/meacrOS5pu+8jOi/memHjOmmluWFiOaYr2RwbHly5YyF55qE5a2m5Lmg44CCCgoKIyMjIEtleSAidmVyYnMKCuKAonNlbGVjdDogcmV0dXJuIGEgc3Vic2V0IG9mIHRoZSBjb2x1bW5zIG9mIGEgZGF0YSBmcmFtZSwgdXNpbmcgYSBmbGV4aWJsZSBub3RhdGlvbgoK4oCiZmlsdGVyOiBleHRyYWN0IGEgc3Vic2V0IG9mIHJvd3MgZnJvbSBhIGRhdGEgZnJhbWUgYmFzZWQgb24gbG9naWNhbCBjb25kaXRpb25zCgrigKJhcnJhbmdlOiByZW9yZGVyIHJvd3Mgb2YgYSBkYXRhIGZyYW1lCgrigKJyZW5hbWU6IHJlbmFtZSB2YXJpYWJsZXMgaW4gYSBkYXRhIGZyYW1lCgrigKJtdXRhdGU6IGFkZCBuZXcgdmFyaWFibGVzL2NvbHVtbnMgb3IgdHJhbnNmb3JtIGV4aXN0aW5nIHZhcmlhYmxlcwoK4oCic3VtbWFyaXNlIC8gc3VtbWFyaXplOiBnZW5lcmF0ZSBzdW1tYXJ5IHN0YXRpc3RpY3Mgb2YgZGlmZmVyZW50IHZhcmlhYmxlcyBpbiB0aGUgZGF0YSBmcmFtZSwgcG9zc2libHkgd2l0aGluIHN0cmF0YQoK4oCiJT4lOiB0aGUg4oCccGlwZeKAnSBvcGVyYXRvciBpcyB1c2VkIHRvIGNvbm5lY3QgbXVsdGlwbGUgdmVyYiBhY3Rpb25zIHRvZ2V0aGVyIGludG8gYSBwaXBlbGluZQoKYGBge3J9CmxpYnJhcnkoZHBseXIpCmBgYApgYGB7cn0KY2hpY2FnbyA8LSByZWFkUkRTKCJjaGljYWdvLnJkcyIpCmRpbShjaGljYWdvKQpgYGAKCmBgYHtyfQpzdHIoY2hpY2FnbykKYGBgCgojIyMjIHNlbGVjdAoKVGhlIHNlbGVjdCgpIGZ1bmN0aW9uIGNhbiBiZSB1c2VkIHRvIHNlbGVjdCBjb2x1bW5zIG9mIGEgZGF0YSBmcmFtZSB0aGF0IHlvdSB3YW50IHRvIGZvY3VzIG9uLgoK6YCJ5oup54m55a6a5YiXW+ebtOaOpeaMh+Wuml0KYGBge3J9Cm5hbWVzKGNoaWNhZ28pWzE6M10Kc3Vic2V0IDwtIHNlbGVjdChjaGljYWdvLCBjaXR5OmRwdHApCmhlYWQoc3Vic2V0KQpgYGAKCumAieaLqeeJueWumuWIl1vpgJrov4fkuI3mmL7npLrkuIDkupvliJfvvIzpl7TmjqXmjIflrppdCgpgYGB7cn0Kc3Vic2V0MiA8LSBzZWxlY3QoY2hpY2FnbywgLShjaXR5OmRwdHApKQpoZWFkKHN1YnNldDIpCmBgYAoK5ZCM5qC35Zyw77yM6L+Y5pyJ5Y+m5LiA56eN5pa55byPCmBgYHtyfQppIDwtIG1hdGNoKCJjaXR5IiwgbmFtZXMoY2hpY2FnbykpICAjIDEKaiA8LSBtYXRjaCgiZHB0cCIsIG5hbWVzKGNoaWNhZ28pKSAgIyAzCmhlYWQoY2hpY2Fnb1ssIC0oaTpqKV0pCmBgYAoK6YCa6L+H5YiX5ZCN5a2X56ym5p2l6YCJ5Y+W54m55a6a5YiXCgpgYGB7cn0Kc3Vic2V0IDwtIHNlbGVjdChjaGljYWdvLCBlbmRzX3dpdGgoIjIiKSkKc3RyKHN1YnNldCkKYGBgCgpgYGB7cn0Kc3Vic2V0IDwtIHNlbGVjdChjaGljYWdvLCBzdGFydHNfd2l0aCgiZCIpKQpzdHIoc3Vic2V0KQpgYGAKCgojIyMjIGZpbHRlcgoKVGhlIGZpbHRlcigpIGZ1bmN0aW9uIGlzIHVzZWQgdG8gZXh0cmFjdCBzdWJzZXRzIG9mIHJvd3MgZnJvbSBhIGRhdGEgZnJhbWUuIAoKYGBge3J9CmNoaWMuZiA8LSBmaWx0ZXIoY2hpY2FnbywgcG0yNXRtZWFuMiA+IDMwICYgdG1wZCA+IDgwKQpzZWxlY3QoY2hpYy5mLCBkYXRlLCB0bXBkLCBwbTI1dG1lYW4yKQpzdW1tYXJ5KGNoaWMuZiRwbTI1dG1lYW4yKQpgYGAKCgojIyMjIGFycmFuZ2UKClRoZSBhcnJhbmdlKCkgZnVuY3Rpb24gaXMgdXNlZCB0byByZW9yZGVyIHJvd3Mgb2YgYSBkYXRhIGZyYW1lIGFjY29yZGluZyB0byBvbmUgb2YgdGhlIHZhcmlhYmxlcy9jb2x1bW5zLgoKYGBge3J9CmNoaWNhZ28gPC0gYXJyYW5nZShjaGljYWdvLCBkYXRlKQpoZWFkKHNlbGVjdChjaGljYWdvLCBkYXRlLCBwbTI1dG1lYW4yKSwgMykKdGFpbChzZWxlY3QoY2hpY2FnbywgZGF0ZSwgcG0yNXRtZWFuMiksIDMpCmBgYAoK6ZmN5bqP5o6S5YiXCmBgYHtyfQpjaGljYWdvIDwtIGFycmFuZ2UoY2hpY2FnbywgZGVzYyhkYXRlKSkKaGVhZChzZWxlY3QoY2hpY2FnbywgZGF0ZSwgcG0yNXRtZWFuMiksIDMpCnRhaWwoc2VsZWN0KGNoaWNhZ28sIGRhdGUsIHBtMjV0bWVhbjIpLCAzKQpgYGAKCgojIyMjIHJlbmFtZQoKYGBge3J9CmhlYWQoY2hpY2Fnb1ssIDE6NV0sIDMpCmBgYAoKCgpgYGB7cn0KY2hpY2FnbyA8LSByZW5hbWUoY2hpY2FnbywgZGV3cG9pbnQ9ZHB0cCwgcG0yNT1wbTI1dG1lYW4yKQpoZWFkKGNoaWNhZ29bLCAxOjVdLCAzKQpgYGAK5rOo5oSP77yM5paw55qE5ZCN5a2X5Zyo5YmN77yM5pen55qE5ZCN5a2X5Zyo5ZCOCgojIyMjIG11dGF0ZQoKVGhlIG11dGF0ZSgpIGZ1bmN0aW9uIGV4aXN0cyB0byBjb21wdXRlIHRyYW5zZm9ybWF0aW9ucyBvZiB2YXJpYWJsZXMgaW4gYSBkYXRhIGZyYW1lLgoKCmBgYHtyfQpjaGljYWdvIDwtIG11dGF0ZShjaGljYWdvLCBwbTI1ZGV0cmVuZCA9IHBtMjUgLSBtZWFuKHBtMjUsIG5hLnJtID0gVFJVRSkpCmhlYWQoY2hpY2FnbykKYGBgCgoKVGhlcmUgaXMgYWxzbyB0aGUgcmVsYXRlZCB0cmFuc211dGUoKSBmdW5jdGlvbiwgd2hpY2ggZG9lcyB0aGUgc2FtZSB0aGluZyBhcyBtdXRhdGUoKSBidXQgdGhlbiBkcm9wcyBhbGwgbm9uLXRyYW5zZm9ybWVkIHZhcmlhYmxlcy4KCmBgYHtyfQpoZWFkKHRyYW5zbXV0ZShjaGljYWdvLAogICAgICAgICAgICAgICBwbTEwZGV0cmVuZCA9IHBtMTB0bWVhbjIgLSBtZWFuKHBtMTB0bWVhbjIsIG5hLnJtID0gVFJVRSksCiAgICAgICAgICAgICAgIG8zZGV0cmVuZCA9IG8zdG1lYW4yIC0gbWVhbihvM3RtZWFuMiwgbmEucm0gPSBUUlVFKSkpCmBgYAoKCiMjIyMgZ3JvdXBfYnkKClRoZSBncm91cF9ieSgpIGZ1bmN0aW9uIGlzIHVzZWQgdG8gZ2VuZXJhdGUgc3VtbWFyeSBzdGF0aXN0aWNzIGZyb20gdGhlIGRhdGEgZnJhbWUgd2l0aGluIHN0cmF0YSBkZWZpbmVkIGJ5IGEgdmFyaWFibGUuCmBgYHtyfQpjaGljYWdvIDwtIG11dGF0ZShjaGljYWdvLCB5ZWFyPWFzLlBPU0lYbHQoZGF0ZSkkeWVhciArIDE5MDApCnllYXJzIDwtIGdyb3VwX2J5KGNoaWNhZ28sIHllYXIpCnN1bW1hcmlzZSh5ZWFycywgcG0yNT1tZWFuKHBtMjUsIG5hLnJtID0gVFJVRSksCiAgICAgICAgICBvMyA9IG1heChvM3RtZWFuMiwgbmEucm0gPSBUUlVFKSwKICAgICAgICAgIG4wMiA9IG1lZGlhbihubzJ0bWVhbjIsIG5hLnJtID0gVFJVRSkpCmBgYAoK5LiA5Liq5pu05aSN5p2C55qE5L6L5a2Q77yaCndoYXQgYXJlIHRoZSBhdmVyYWdlIGxldmVscyBvZiBvem9uZSAobzMpIGFuZCBuaXRyb2dlbiBkaW94aWRlIChubzIpIHdpdGhpbiBxdWludGlsZXMgb2YgcG0yNT8KCmBgYHtyfQpxcSA8LSBxdWFudGlsZShjaGljYWdvJHBtMjUsIHNlcSgwLCAxLCAwLjIpLCBuYS5ybSA9IFRSVUUpCmNoaWNhZ28gPC0gbXV0YXRlKGNoaWNhZ28sIHBtMjUucXVpbnQgPSBjdXQocG0yNSwgcXEpKQoKcXVpbnQgPC0gZ3JvdXBfYnkoY2hpY2FnbywgcG0yNS5xdWludCkKc3VtbWFyaXplKHF1aW50LCBvMz1tZWFuKG8zdG1lYW4yLCBuYS5ybSA9IFRSVUUpLAogICAgICAgICAgbm8yID0gbWVhbihubzJ0bWVhbjIsIG5hLnJtID0gVFJVRSkpCgpgYGAKCgojIyMjICU+JQoKVGhlIHBpcGVsaW5lIG9wZXJhdGVyICU+JSBpcyB2ZXJ5IGhhbmR5IGZvciBzdHJpbmdpbmcgdG9nZXRoZXIgbXVsdGlwbGUgZHBseXIgZnVuY3Rpb25zIGluIGEgc2VxdWVuY2Ugb2Ygb3BlcmF0aW9ucy4KCgrkvb/nlKjnrqHpgZPmk43kvZznrKbvvIzkvb/lvpfmiJHku6zlj6/ku6XkuIDmrKHmiafooYzkuIDns7vliJfnmoTlkb3ku6TvvIzogIzlhY3kuo7ltYzlpZflkozliJvlu7rkuIDkupvml6DnlKjnmoTkuK3pl7Tlj5jph4/jgIIKCmBgYHtyfQptdXRhdGUoY2hpY2FnbywgcG0yNS5xdWludD1jdXQocG0yNSwgcXEpKSAlPiUKICAgIGdyb3VwX2J5KHBtMjUucXVpbnQpICU+JQogICAgc3VtbWFyaXNlKG8zPW1lYW4obzN0bWVhbjIsIG5hLnJtID0gVFJVRSksCiAgICAgICAgICAgICAgbm8yID0gbWVhbihubzJ0bWVhbjIsIG5hLnJtPVRSVUUpKQpgYGAKCk5vdGljZSBpbiB0aGUgYWJvdmUgY29kZSB0aGF0IEkgcGFzcyB0aGUgY2hpY2FnbyBkYXRhIGZyYW1lIHRvIHRoZSBmaXJzdCBjYWxsIHRvIG11dGF0ZSgpLCBidXQgdGhlbiBhZnRlcndhcmRzIEkgZG8gbm90IGhhdmUgdG8gcGFzcyB0aGUgZmlyc3QgYXJndW1lbnQgdG8gZ3JvdXBfYnkoKSBvciBzdW1tYXJpemUoKS4gT25jZSB5b3UgdHJhdmVsIGRvd24gdGhlIHBpcGVsaW5lIHdpdGggJT4lLCB0aGUgZmlyc3QgYXJndW1lbnQgaXMgdGFrZW4gdG8gYmUgdGhlIG91dHB1dCBvZiB0aGUgcHJldmlvdXMgZWxlbWVudCBpbiB0aGUgcGlwZWxpbmUuCgoKCkFub3RoZXIgZXhhbXBsZSBtaWdodCBiZSBjb21wdXRpbmcgdGhlIGF2ZXJhZ2UgcG9sbHV0YW50IGxldmVsIGJ5IG1vbnRoLiBUaGlzIGNvdWxkIGJlIHVzZWZ1bCB0byBzZWUgaWYgdGhlcmUgYXJlIGFueSBzZWFzb25hbCB0cmVuZHMgaW4gdGhlIGRhdGEuCgoKYGBge3J9Cm11dGF0ZShjaGljYWdvLCBtb250aD1hcy5QT1NJWGx0KGRhdGUpJG1vbisxKSU+JQogICAgZ3JvdXBfYnkobW9udGgpJT4lCiAgICBzdW1tYXJpc2UocG0yNT1tZWFuKHBtMjUsIG5hLnJtPVRSVUUpLAogICAgICAgICAgICAgIG8zID0gbWF4KG8zdG1lYW4yLCBuYS5ybT1UUlVFKSwKICAgICAgICAgICAgICBubzI9bWVkaWFuKG5vMnRtZWFuMiwgbmEucm09VFJVRSkpCmBgYAoKCgoKCgoKCg==