Functions and Functionals (R Ladies - Twin Cities)
Functions and Functionals (R Ladies - Twin Cities)
1 Takeaways from today
How to use and write custom functions
What is a functional?
Using
purrrmap family functions
Libraries for today:
#install.packages('fivethirtyeight','dplyr','broom','ggplot2','purrr', dependencies = T)
library(fivethirtyeight)
library(dplyr)
library(broom)
library(ggplot2)
library(purrr)2 Beginning at the end: my use case
data(bechdel)This data set contains the raw data behind the 538 story “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. The Bechdel test provides a pass/fail outcome for movies based on three criteria:
The movie has to have at least two women in it,
who talk to each other,
about something besides a man
My question: Are budget and gross related to the probability of failing the Bechdel test? Does time play a role?
The variables I’ll use are the following:
year- year of movie releasebinary- Bechdel test PASS vs FAIL binarybudget_2013- Budget in 2013 inflation adjusted dollarsdomgross_2013- Domestic gross (US) in 2013 inflation adjusted dollars
I’m going to fit a separate logistic regression to each decade. (This may not be a good model or even good statistics, but is a good way to show the concepts I’m after). First, I create a time factor variable and a Bechdel test variable.
bechdel_mod <- bechdel %>%
dplyr::mutate(
#Using cases since there are several decades
time_factor = dplyr::case_when(year %in% 1970:1979 ~ '1970s',
year %in% 1980:1989 ~ '1980s',
year %in% 1990:1999 ~ '1990s',
year %in% 2000:2009 ~ '2000s',
year %in% 2010:2019 ~ '2010s'),
#Using if_then since I only have 2 outcomes
bechdel_test = dplyr::if_else(binary == 'PASS', 0, 1))by_time <- split(bechdel_mod, bechdel_mod$time_factor)
by_time## $`1970s`
## # A tibble: 54 x 17
## year imdb title test clean_test binary budget domgross intgross code
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr>
## 1 1979 tt00… Alien ok ok PASS 9.00e6 8.09e7 2.04e8 1979…
## 2 1979 tt00… Apoc… nowo… nowomen FAIL 3.15e7 7.88e7 7.88e7 1979…
## 3 1979 tt00… Moon… nota… notalk FAIL 3.10e7 7.03e7 2.10e8 1979…
## 4 1979 tt00… Star… nota… notalk FAIL 3.50e7 8.23e7 1.39e8 1979…
## 5 1979 tt00… The … ok ok PASS 1.85e7 6.52e7 1.09e8 1979…
## 6 1978 tt00… Anim… men men FAIL 3.00e6 1.42e8 1.42e8 1978…
## 7 1978 tt00… Dawn… nowo… nowomen FAIL 2.80e7 5.90e7 1.03e8 1978…
## 8 1978 tt00… Days… ok ok PASS 3.00e6 3.45e6 3.66e6 1978…
## 9 1978 tt00… Grea… ok ok PASS 6.00e6 1.82e8 3.88e8 1978…
## 10 1978 tt00… Hall… dubi… dubious FAIL 1.50e7 5.83e7 7.75e7 1978…
## # … with 44 more rows, and 7 more variables: budget_2013 <int>,
## # domgross_2013 <dbl>, intgross_2013 <dbl>, period_code <int>,
## # decade_code <int>, time_factor <chr>, bechdel_test <dbl>
##
## $`1980s`
## # A tibble: 125 x 17
## year imdb title test clean_test binary budget domgross intgross code
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr>
## 1 1989 tt00… Back… ok ok PASS 4.00e7 1.18e8 3.32e8 1989…
## 2 1989 tt00… Batm… nota… notalk FAIL 3.50e7 2.51e8 4.11e8 1989…
## 3 1989 tt00… Bill… men men FAIL 1.00e7 4.05e7 4.05e7 1989…
## 4 1989 tt00… Dead… nota… notalk FAIL 1.64e7 9.59e7 2.40e8 1989…
## 5 1989 tt00… Henr… ok ok PASS 9.00e6 1.02e7 1.02e7 1989…
## 6 1989 tt00… Indi… nowo… nowomen FAIL 4.80e7 1.97e8 4.74e8 1989…
## 7 1989 tt00… Lice… men men FAIL 4.20e7 3.47e7 1.56e8 1989…
## 8 1989 tt00… Majo… nota… notalk FAIL 1.10e7 4.98e7 4.98e7 1989…
## 9 1989 tt00… Road… nota… notalk FAIL 1.00e7 3.01e7 3.01e7 1989…
## 10 1989 tt00… Sex,… men-… men FAIL 1.20e6 2.47e7 3.67e7 1989…
## # … with 115 more rows, and 7 more variables: budget_2013 <int>,
## # domgross_2013 <dbl>, intgross_2013 <dbl>, period_code <int>,
## # decade_code <int>, time_factor <chr>, bechdel_test <dbl>
##
## $`1990s`
## # A tibble: 337 x 17
## year imdb title test clean_test binary budget domgross intgross code
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr>
## 1 1999 tt01… 10 T… ok ok PASS 1.30e7 3.82e7 6.04e7 1999…
## 2 1999 tt01… 8MM nota… notalk FAIL 4.00e7 3.64e7 9.64e7 1999…
## 3 1999 tt01… Amer… ok ok PASS 1.50e7 1.30e8 3.56e8 1999…
## 4 1999 tt01… Amer… men men FAIL 1.20e7 1.02e8 2.35e8 1999…
## 5 1999 tt01… Anal… nota… notalk FAIL 3.00e7 1.07e8 1.77e8 1999…
## 6 1999 tt01… Anna… men men FAIL 7.50e7 3.93e7 3.93e7 1999…
## 7 1999 tt01… Anyw… dubi… dubious FAIL 2.30e7 1.87e7 1.87e7 1999…
## 8 1999 tt01… Aust… nota… notalk FAIL 3.50e7 2.06e8 3.10e8 1999…
## 9 1999 tt01… Bein… ok ok PASS 1.30e7 2.29e7 3.24e7 1999…
## 10 1999 tt01… Blac… ok ok PASS 1.00e7 5.24e6 5.24e6 1999…
## # … with 327 more rows, and 7 more variables: budget_2013 <int>,
## # domgross_2013 <dbl>, intgross_2013 <dbl>, period_code <int>,
## # decade_code <int>, time_factor <chr>, bechdel_test <dbl>
##
## $`2000s`
## # A tibble: 840 x 17
## year imdb title test clean_test binary budget domgross intgross code
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr>
## 1 2009 tt10… (500… nota… notalk FAIL 7.50e6 3.24e7 6.08e7 2009…
## 2 2009 tt09… 17 A… ok-d… ok PASS 4.00e7 6.42e7 1.39e8 2009…
## 3 2009 tt11… 2012 nota… notalk FAIL 2.00e8 1.66e8 7.88e8 2009…
## 4 2009 tt04… 9 nowo… nowomen FAIL 3.00e7 3.17e7 4.86e7 2009…
## 5 2009 tt09… A Pe… dubi… dubious FAIL 1.40e7 1.55e7 2.28e7 2009…
## 6 2009 tt10… A Se… men men FAIL 7.00e6 9.23e6 3.04e7 2009…
## 7 2009 tt13… A Si… nota… notalk FAIL 7.00e6 9.18e6 2.81e7 2009…
## 8 2009 tt11… Adam men men FAIL 3.20e6 2.28e6 2.83e6 2009…
## 9 2009 tt10… Adve… men-… men FAIL 9.80e6 1.60e7 1.71e7 2009…
## 10 2009 tt11… Agora nota… notalk FAIL 7.00e7 6.19e5 3.90e7 2009…
## # … with 830 more rows, and 7 more variables: budget_2013 <int>,
## # domgross_2013 <dbl>, intgross_2013 <dbl>, period_code <int>,
## # decade_code <int>, time_factor <chr>, bechdel_test <dbl>
##
## $`2010s`
## # A tibble: 438 x 17
## year imdb title test clean_test binary budget domgross intgross code
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr>
## 1 2013 tt17… 21 &… nota… notalk FAIL 1.30e7 25682380 4.22e7 2013…
## 2 2012 tt13… Dred… ok-d… ok PASS 4.50e7 13414714 4.09e7 2012…
## 3 2013 tt20… 12 Y… nota… notalk FAIL 2.00e7 53107035 1.59e8 2013…
## 4 2013 tt12… 2 Gu… nota… notalk FAIL 6.10e7 75612460 1.32e8 2013…
## 5 2013 tt04… 42 men men FAIL 4.00e7 95020213 9.50e7 2013…
## 6 2013 tt13… 47 R… men men FAIL 2.25e8 38362475 1.46e8 2013…
## 7 2013 tt16… A Go… nota… notalk FAIL 9.20e7 67349198 3.04e8 2013…
## 8 2013 tt21… Abou… ok-d… ok PASS 1.20e7 15323921 8.73e7 2013…
## 9 2013 tt18… Admi… ok ok PASS 1.30e7 18007317 1.80e7 2013…
## 10 2013 tt18… Afte… nota… notalk FAIL 1.30e8 60522097 2.44e8 2013…
## # … with 428 more rows, and 7 more variables: budget_2013 <int>,
## # domgross_2013 <dbl>, intgross_2013 <dbl>, period_code <int>,
## # decade_code <int>, time_factor <chr>, bechdel_test <dbl>
modeled_bech <- by_time %>%
#Fit logistic regression
purrr::map(~ glm(bechdel_test ~ log(budget_2013) + log(domgross_2013),
family = 'binomial', data = .x)) %>%
#Using broom::tidy to make the output nicer for processsing
purrr::map(~ broom::tidy(.x)) %>%
#Grabbing variable names, point estimates, and standard errors
purrr::map(~ dplyr::select(.x, term, estimate, std.error)) %>%
#Getting the list indices into the data set
purrr::imap(~ dplyr::mutate(.x, decade = .y)) %>%
#Recombining everything into a single data.frame
dplyr::bind_rows()
modeled_bech## # A tibble: 15 x 4
## term estimate std.error decade
## <chr> <dbl> <dbl> <chr>
## 1 (Intercept) -3.86 3.84 1970s
## 2 log(budget_2013) -0.0423 0.207 1970s
## 3 log(domgross_2013) 0.302 0.195 1970s
## 4 (Intercept) 0.471 4.15 1980s
## 5 log(budget_2013) -0.299 0.238 1980s
## 6 log(domgross_2013) 0.307 0.161 1980s
## 7 (Intercept) -1.85 1.44 1990s
## 8 log(budget_2013) 0.230 0.102 1990s
## 9 log(domgross_2013) -0.105 0.0920 1990s
## 10 (Intercept) -3.25 0.947 2000s
## 11 log(budget_2013) 0.185 0.0729 2000s
## 12 log(domgross_2013) 0.00687 0.0528 2000s
## 13 (Intercept) -3.97 1.26 2010s
## 14 log(budget_2013) 0.441 0.104 2010s
## 15 log(domgross_2013) -0.198 0.0810 2010s
ggplot2 package.
modeled_bech %>%
ggplot(aes(x = decade, y = estimate, group = term)) +
#adding error bars for 95% conf. intervals
geom_errorbar(aes(x = decade,
ymin = estimate - 2 * std.error,
ymax = estimate + 2 * std.error),
color = 'grey50', width = 0.2) +
geom_point(color = 'blue', size = 2) +
#If you want lines, you can uncomment the following line
#geom_line(lwd = 1.2) +
#highlighting the x-axis
geom_hline(yintercept = 0, alpha = 0.2) +
#allowing different scales on each coefficient
facet_wrap(~term, scales = 'free_y') +
#not a fan of the default plotting
theme_bw() +
#removing grid, changes facet title colors
theme(panel.grid = element_blank(),
strip.background = element_rect(fill = 'black'),
strip.text = element_text(color = 'white')) +
xlab('Decade') + ylab('Estimated Coefficient')Conclusions:
Higher probability of failing Bechdel test in high budget movies in recent decades (action movies have higher budget?)
Lower probability of failing Bechdel test in high gross movies in 2010s (is the audience paying attention to the presence of women?)
By the end of tonight, I hope that you will be able to do a variant of this analysis. Don’t worry about all of the tidyverse elements if they are new to you.
3 Function Basics
If you use R, you are probably already using many built-in R functions. A function is a block of organized, reusable code that is used to perform a single, related action.
3.1 Why write your own functions?
DRY Principle: don’t repeat yourself
For example, what is the median budget in each decade split by whether the movie passed the Bechdel test.
bech_1970 <- bechdel_mod %>%
filter(time_factor == '1970s') %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013))
bech_1980 <- bechdel_mod %>%
filter(time_factor == '1980s') %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013))
bech_1990 <- bechdel_mod %>%
filter(time_factor == '1990s') %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013))
bech_1970## # A tibble: 2 x 2
## binary med_budget
## <chr> <dbl>
## 1 FAIL 20666099
## 2 PASS 22241557
bech_1980## # A tibble: 2 x 2
## binary med_budget
## <chr> <dbl>
## 1 FAIL 30755961
## 2 PASS 33906418.
bech_1990## # A tibble: 2 x 2
## binary med_budget
## <chr> <dbl>
## 1 FAIL 51268214.
## 2 PASS 37157440
bechdel_mod %>%
group_by(time_factor, binary) %>%
summarise(med_budget = median(budget_2013))## # A tibble: 10 x 3
## # Groups: time_factor [5]
## time_factor binary med_budget
## <chr> <chr> <dbl>
## 1 1970s FAIL 20666099
## 2 1970s PASS 22241557
## 3 1980s FAIL 30755961
## 4 1980s PASS 33906418.
## 5 1990s FAIL 51268214.
## 6 1990s PASS 37157440
## 7 2000s FAIL 46231854
## 8 2000s PASS 31018861
## 9 2010s FAIL 46606085
## 10 2010s PASS 26927960
Back to the example at hand. A loop could do the trick, but is not as customizable for the user as we’d like. Let’s write a function. We can see it will give us identical results as before.
bech_yr <- function(decade){
modified_data <- bechdel_mod %>%
filter(time_factor == decade) %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013))
return(modified_data)
}
#Check that this yields identical results
all.equal(bech_1990,
bech_yr('1990s'))## [1] TRUE
3.2 Scoping
How did my previous function find the data? I didn’t tell it where to look. Also, I used a bunch of other dplyr functions (verbs) in my custom functions.
R uses lexical scoping. This means that names defined inside a function mask names defined outside a function. If a name isn’t defined inside a function, R looks one level up. The same rules apply if a function is defined inside another function.
First, R looks inside the current function.
Then, it looks where that function was defined (and so on, all the way up to the global environment).
Finally, it looks in other loaded packages.
Sometimes, it can be smart to add explicit arguments to your function rather than relying on users to have everything loaded correctly around it. It can also be helpful to add default argument values to help your users.
In this case I defaulted bech_data to have the data set I anticipate in the environment.
bech_yr2 <- function(decade, bech_data = bechdel_mod){
return(bech_data %>%
filter(time_factor == decade) %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013)))
}3.3 Syntactical flexibility
There are several ways to define and use the same function; be consistent to enhance readability!
#This is super dangerous... no {}
bech_yr3 <- function(decade, bech_data = bechdel_mod)
bech_data %>%
filter(time_factor == decade) %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013))
#Using argument names
bech_yr3(decade = '1990s', bech_data = bechdel_mod)## # A tibble: 2 x 2
## binary med_budget
## <chr> <dbl>
## 1 FAIL 51268214.
## 2 PASS 37157440
#No argument names
bech_yr3('1990s', bechdel_mod)## # A tibble: 2 x 2
## binary med_budget
## <chr> <dbl>
## 1 FAIL 51268214.
## 2 PASS 37157440
#Using the default - can get dicey
bech_yr3('1990s')## # A tibble: 2 x 2
## binary med_budget
## <chr> <dbl>
## 1 FAIL 51268214.
## 2 PASS 37157440
Some other general readability advice from my experience:
Use {} - you will see people skipping this all over the internet! This makes your short functions consistent with your longer functions where you need braces.
Use return - by default, the item in last line of a function is returned. However, it is helpful make explicit what is returned and mirrors other programming languages. You can only return one thing. Use a list() object if you need to return multiple items in one.
3.4 Function components
Functions can be broken down into three components: arguments, body, and environment.my_first_function <- function(x, y) {
return(x + y)
}
formals(my_first_function) #a.k.a arguments## $x
##
##
## $y
body(my_first_function)## {
## return(x + y)
## }
environment(my_first_function)## <environment: R_GlobalEnv>
environment(mutate)## <environment: namespace:dplyr>
You can also view a function by typing the name after it has been defined.
my_first_function## function(x, y) {
## return(x + y)
## }
Functions can be flexible in ways you might expect (or not expect). This function can take a scalar or vector of numeric inputs and apply them in (hopefully) intuitive ways. It is important to check a range of input types to understand behavior.
my_first_function(1, 2)## [1] 3
my_first_function(c(1,1), 2)## [1] 3 3
my_first_function(c(1,1), c(2,3))## [1] 3 4
4 Functionals
The functions I’ve covered so far are part of the upper left corner of this 2x2 table. However, things can get a lot more exotic. For today, I’m going to cover functionals. As seen in the table, a functional is a function that takes a function as an input and returns a vector as output.
set.seed(1000) #for reproducibility
#fn is a function, not a vector!
random_fn <- function(fn, ...) {
#Taking a function fn of 1000 random uniform(0,1) values
return(fn(runif(1e3), ...))
}
random_fn(mean)## [1] 0.4998541
random_fn(median)## [1] 0.4909904
#Example including extra argument
random_fn(quantile, probs = c(0.25, 0.5, 0.75))## 25% 50% 75%
## 0.2523493 0.4903264 0.7404705
4.1 purrr
purrr is part of R’s functional programming toolkit. It can help reduce the use of for loops and enhance code transparency. There is not always a functional to replace a for loop! Don’t go crazy trying to find one if it doesn’t make sense :) Additionally, the purrr functions are specialised to work with one-dimensional vectors. base::apply() is specialised to work with two-dimensional and higher vectors, i.e. matrices and arrays.
4.2 purrr map family functions
The map functions transform their input by applying a function to each element of a vector or list and returning a list or vector the same length as the input.
map(),map_if(), andmap_at()always return a list. The _if and _at variants take a predicate function .p that determines which elements of .x are transformed with .f.map_lgl()- logical,map_int()- integer,map_dbl()- double, andmap_chr()- character return vectors of the corresponding type.map_dfr()andmap_dfc()return data frames created by row-binding and column-binding respectively. They requiredplyr.
4.2.1 Translating a for loop to a map
#Creating a named vector to store my calculations
number_of_columns <- dim(bechdel)[2]
len <- vector(length = number_of_columns)
names(len) <- names(bechdel)
#Calculating the number of unique values in each column of the data.frame
for(col in 1:number_of_columns){
len[col] <- length(unique((bechdel[[col]])))
}
len## year imdb title test clean_test
## 44 1794 1768 10 5
## binary budget domgross intgross code
## 2 272 1751 1757 85
## budget_2013 domgross_2013 intgross_2013 period_code decade_code
## 1188 1776 1783 6 4
However, we can use purrr::map() for a neater solution.
f <- function(x){
return(length(unique(x)))
}
#I want output as a numeric vector
map_dbl(bechdel, f)## year imdb title test clean_test
## 44 1794 1768 10 5
## binary budget domgross intgross code
## 2 272 1751 1757 85
## budget_2013 domgross_2013 intgross_2013 period_code decade_code
## 1188 1776 1783 6 4
Note that the function maps column by column just like the for loop. Columns in data.frames look like rows in this graphic.
4.2.2 Anonymous functions
Instead of using map() with an existing function (we created a function f above), you can create an inline anonymous (not saved elsewhere) functions.
purrr supports a special shortcut for this kind of function:
map_dbl(bechdel, ~ length(unique(.x)))## year imdb title test clean_test
## 44 1794 1768 10 5
## binary budget domgross intgross code
## 2 272 1751 1757 85
## budget_2013 domgross_2013 intgross_2013 period_code decade_code
## 1188 1776 1783 6 4
You can use the pipe (%>%) in dplyr style syntax.
bechdel %>%
map_dbl( ~ length(unique(.x))) ## year imdb title test clean_test
## 44 1794 1768 10 5
## binary budget domgross intgross code
## 2 272 1751 1757 85
## budget_2013 domgross_2013 intgross_2013 period_code decade_code
## 1188 1776 1783 6 4
Let’s use some of the other map functions to see how the output format changes.
bechdel %>%
map_chr( ~ length(unique(.x)))## year imdb title test clean_test
## "44" "1794" "1768" "10" "5"
## binary budget domgross intgross code
## "2" "272" "1751" "1757" "85"
## budget_2013 domgross_2013 intgross_2013 period_code decade_code
## "1188" "1776" "1783" "6" "4"
We could also bind the results into a data.frame by rows:
bechdel %>%
map_dfr( ~ length(unique(.x)))## # A tibble: 1 x 15
## year imdb title test clean_test binary budget domgross intgross code
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 44 1794 1768 10 5 2 272 1751 1757 85
## # … with 5 more variables: budget_2013 <int>, domgross_2013 <int>,
## # intgross_2013 <int>, period_code <int>, decade_code <int>
4.2.3 Other map functions
- Output same type as input with
modify(). - Iterate over two inputs with
map2(). - Iterate with an index using
imap(). - Return nothing with
walk(). - Iterate over any number of inputs with
pmap().
I’ll cover map2() and imap() briefly.
Regular map can only apply a function to one input. Other inputs are applied to all iterations.
map2() let’s your function have two inputs - x and y
#List of 8 vectors filled with 10 random uniform(0,1) values
#Cool trick for generating random vectors!!!
xs <- map(1:8, ~runif(10))
#Giving weights to the values in xs based on their relative size
ws <- map(xs, ~ .x/sum(.x))
#Calculating a weighted mean
map2_dbl(xs, ws, weighted.mean)## [1] 0.3568452 0.6299610 0.4993194 0.6774614 0.5278370 0.6718677 0.7677067
## [8] 0.5714697
imap() If y is an index or names of x, you can use imap instead of map2. imap(x, f) = map2(x, y = names(x), f)
Note you still get all of the output variants (dbl, chr, etc.)
names(xs) <- paste0('rladies', 1:8)
#Printing out the first value and labeling the index
imap_chr(xs, ~ paste0("The first value of ", .y, " is ", .x[[1]]))## rladies1
## "The first value of rladies1 is 0.556265470804647"
## rladies2
## "The first value of rladies2 is 0.477754818974063"
## rladies3
## "The first value of rladies3 is 0.124986263224855"
## rladies4
## "The first value of rladies4 is 0.926147045800462"
## rladies5
## "The first value of rladies5 is 0.00617365562357008"
## rladies6
## "The first value of rladies6 is 0.677013292210177"
## rladies7
## "The first value of rladies7 is 0.0583309563808143"
## rladies8
## "The first value of rladies8 is 0.494969137944281"
4.3 Challenge problem
Going back to the very first example, say I want to change around my logic. My question is now how the pass/fail of the Bechdel test and the budget of the movie will affect the domestic gross.
Hints: uselm instead of glm; use log(domgross_2013) as dependent variable
modeled_bech_lm <- by_time %>%
#Fit logistic regression
purrr::map(~ lm(log(domgross_2013) ~ binary + log(budget_2013), data = .x)) %>%
#Using broom::tidy to make the output nicer for processsing
purrr::map(~ broom::tidy(.x)) %>%
#Grabbing variable names and point estimates
purrr::map(~ dplyr::select(.x, term, estimate, std.error)) %>%
#Getting the list indices into the data set
purrr::imap(~ dplyr::mutate(.x, decade = .y)) %>%
#Recombining everything into a data.frame
dplyr::bind_rows()
modeled_bech_lm## # A tibble: 15 x 4
## term estimate std.error decade
## <chr> <dbl> <dbl> <chr>
## 1 (Intercept) 12.3 2.32 1970s
## 2 binaryPASS -0.848 0.505 1970s
## 3 log(budget_2013) 0.393 0.137 1970s
## 4 (Intercept) 9.64 2.14 1980s
## 5 binaryPASS -0.513 0.253 1980s
## 6 log(budget_2013) 0.507 0.125 1980s
## 7 (Intercept) 5.67 0.802 1990s
## 8 binaryPASS 0.155 0.136 1990s
## 9 log(budget_2013) 0.693 0.0457 1990s
## 10 (Intercept) 1.73 0.616 2000s
## 11 binaryPASS -0.0138 0.0936 2000s
## 12 log(budget_2013) 0.904 0.0352 2000s
## 13 (Intercept) 1.60 0.754 2010s
## 14 binaryPASS 0.310 0.124 2010s
## 15 log(budget_2013) 0.902 0.0429 2010s
ggplot package.
modeled_bech_lm %>%
ggplot(aes(x = decade, y = estimate, group = term)) +
#adding error bars for 95% conf. intervals
geom_errorbar(aes(x = decade,
ymin = estimate - 2 * std.error,
ymax = estimate + 2 * std.error),
color = 'grey50', width = 0.2) +
geom_point(color = 'blue', size = 2) +
#Uncomment the following line if you want a connecting line
#geom_line(lwd = 1.2) +
#highlighting the x-axis
geom_hline(yintercept = 0, alpha = 0.2) +
#allowing different scales on each coefficient
facet_wrap(~term, scales = 'free_y') +
#not a fan of the default plotting
theme_bw() +
#removing grid, changes facet title colors
theme(panel.grid = element_blank(),
strip.background = element_rect(fill = 'black'),
strip.text = element_text(color = 'white')) +
xlab('Decade') + ylab('Estimated Coefficient')4.4 More stuff in purrr
- Reduce family functions
reduce()takes a vector of length n and produces a vector of length 1 by calling a function with a pair of values at a time: reduce(1:4, f) is equivalent to f(f(f(1, 2), 3), 4).accumulate()is useful for understanding how reduce works, because instead of returning just the final result, it returns all the intermediate results as well.
- Predicate functions
- Returns a single TRUE or FALSE.
some(.x, .p)returns TRUE if any element matches;every(.x, .p)returns TRUE if all elements match.detect(.x, .p)returns the value of the first match;detect_index(.x, .p)returns the location of the first match.keep(.x, .p)keeps all matching elements;discard(.x, .p)drops all matching elements.