This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
Piping is a development that began with the magrittr package. The purpose was to make code writing similar to text writing, i.e., from left to right (mostly).
The %>% piping command basically says to take what is on the left and pass it along to whatever occurs on the right.
First, four packages need to be loaded.
library(babynames) # data package
library(dplyr) # provides data manipulating functions.
library(magrittr) # ceci n'est pas un pipe
library(ggplot2) # for graphics
‘babynames’ is included in the babynames package. Here we just want to see what’s in the first 6 rows of babynames.
head(babynames)
## # A tibble: 6 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1880 F Mary 7065 0.07238359
## 2 1880 F Anna 2604 0.02667896
## 3 1880 F Emma 2003 0.02052149
## 4 1880 F Elizabeth 1939 0.01986579
## 5 1880 F Minnie 1746 0.01788843
## 6 1880 F Margaret 1578 0.01616720
Next, we pass the babynames data along to a filter that extracts the first 3 leftmost letters from each babyname.
Then the list of those 3 letter names is filtered and only those that begin with “Ste” are to remain.
babynames %>%
filter(name %>% substr(1, 3) %>% equals("Ste"))
## # A tibble: 5,663 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1880 F Stella 414 0.0042415860
## 2 1880 M Stephen 176 0.0014864865
## 3 1880 M Steve 52 0.0004391892
## 4 1880 M Stewart 19 0.0001604730
## 5 1880 M Sterling 17 0.0001435811
## 6 1880 M Steven 17 0.0001435811
## 7 1881 F Stella 416 0.0042081411
## 8 1881 M Stephen 147 0.0013575413
## 9 1881 M Steve 44 0.0004063389
## 10 1881 M Stewart 27 0.0002493443
## # ... with 5,653 more rows
Now those babynames that begin with Ste are grouped by year and then by sex.
babynames %>%
filter(name %>% substr(1, 3) %>% equals("Ste")) %>%
group_by(year, sex)
## Source: local data frame [5,663 x 5]
## Groups: year, sex [270]
##
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1880 F Stella 414 0.0042415860
## 2 1880 M Stephen 176 0.0014864865
## 3 1880 M Steve 52 0.0004391892
## 4 1880 M Stewart 19 0.0001604730
## 5 1880 M Sterling 17 0.0001435811
## 6 1880 M Steven 17 0.0001435811
## 7 1881 F Stella 416 0.0042081411
## 8 1881 M Stephen 147 0.0013575413
## 9 1881 M Steve 44 0.0004063389
## 10 1881 M Stewart 27 0.0002493443
## # ... with 5,653 more rows
This last stage summarizes and counts the number of Ste names by year and by sex within each year.
babynames %>%
filter(name %>% substr(1, 3) %>% equals("Ste")) %>%
group_by(year, sex)%>%
summarize(total = sum(n))
## Source: local data frame [270 x 3]
## Groups: year [?]
##
## year sex total
## <dbl> <chr> <int>
## 1 1880 F 414
## 2 1880 M 281
## 3 1881 F 416
## 4 1881 M 241
## 5 1882 F 506
## 6 1882 M 327
## 7 1883 F 529
## 8 1883 M 253
## 9 1884 F 584
## 10 1884 M 292
## # ... with 260 more rows