This chapter introduced interactions, which allow for the association between a predictor and an outcome to depend upon the value of another predictor. While you can’t see them in a DAG, interactions can be important for making accurate inferences. Interactions can be difficult to interpret, and so the chapter also introduced triptych plots that help in visualizing the effect of an interaction. No new coding skills were introduced, but the statistical models considered were among the most complicated so far in the book.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
8E1. For each of the causal relationships below, name a hypothetical third variable that would lead to an interaction effect:
# 1. Temperature or sugar
# 2. Gender or race or major
# 3. Brand of the car which means different quality of engine or something.
8E2. Which of the following explanations invokes an interaction?
# 1
8E3. For each of the explanations in 8E2, write a linear model that expresses the stated relationship.
# 1. Caramelization = a + b*heat + c*dryness + d*heat*dryness
# 2. Speed = a + b*cylinders + c*fuel_injector
# 3. political_beliefs = a + b*from_parent + c*from_friends
# 4. Intelligence = a + b*sociality + c*manipulative_appendages
8M1. Recall the tulips example from the chapter. Suppose another set of treatments adjusted the temperature in the greenhouse over two levels: cold and hot. The data in the chapter were collected at the cold temperature. You find none of the plants grown under the hot temperature developed any blooms at all, regardless of the water and shade levels. Can you explain this result in terms of interactions between water, shade, and temperature?
# yes. Since the interaction on water and shade depends on the level of temperature, this is the a statement for the three-way interaction.
8M2. Can you invent a regression equation that would make the bloom size zero, whenever the temperature is hot?
# I = 1 when temperature is hot and 0 when temperature is cold
# bloom = (original model)(1 - I)
8M4. Repeat the tulips analysis, but this time use priors that constrain the effect of water to be positive and the effect of shade to be negative. Use prior predictive simulation. What do these prior assumptions mean for the interaction prior, if anything? Visualize the prior simulation.
library(dplyr)
library(rethinking)
data(tulips)
df = tulips
df = df %>%
mutate(water_c = water - mean(water),
shade_c = shade - mean(shade),
blooms_std = blooms/max(blooms)
)
m1 <- quap(
alist(
blooms_std ~ dnorm( mu , sigma ) ,
mu <- a + bw*water_c + bs*shade_c + bws*water_c*shade_c,
a ~ dnorm( 0.5, 0.25) ,
bw ~ dnorm( 1 , 0.25 ) ,
bs ~ dnorm( -1 , 0.25 ) ,
bws ~ dnorm( 0 , 0.25 ) ,
sigma ~ dexp( 1 )
),
data = df)
precis(m1)
## mean sd 5.5% 94.5%
## a 0.3579972 0.02404867 0.31956275 0.39643160
## bw 0.2205119 0.02953045 0.17331658 0.26770729
## bs -0.1272575 0.02956790 -0.17451270 -0.08000226
## bws -0.1431292 0.03587130 -0.20045844 -0.08579991
## sigma 0.1255285 0.01721840 0.09801021 0.15304686
8H1. Return to the data(tulips) example in the chapter. Now include the bed variable as a predictor in the interaction model. Don’t interact bed with the other predictors; just include it as a main effect. Note that bed is categorical. So to use it properly, you will need to either construct dummy variables or rather an index variable, as explained in Chapter 5.
library(dplyr)
library(rethinking)
data(tulips)
df = tulips
df = df %>%
mutate(water_c = water - mean(water),
shade_c = shade - mean(shade),
bed_b = ifelse(bed == "b", 1, 0),
bed_c = ifelse(bed == "c", 1, 0))
m1 <- map(
alist(
blooms ~ dnorm(mu,sigma),
mu <- a + bW*water_c + bS*shade_c + bWS*water_c*shade_c + bBB*bed_b + bBC*bed_c ,
a ~ dnorm(0,100),
c(bW,bS,bWS,bBB,bBC) ~ dnorm(0,100),
sigma ~ dunif(0,100)
),
start=list(a=130,bW=0,bS=0,bWS=0,bBB=0,bBC=0,sigma=90),
data=df)
precis(m1)
## mean sd 5.5% 94.5%
## a 97.33917 12.738825 76.98007 117.69827
## bW 75.16581 9.195133 60.47021 89.86141
## bS -41.25380 9.194014 -55.94761 -26.55999
## bWS -52.22757 11.237265 -70.18689 -34.26825
## bBB 44.42496 18.015989 15.63193 73.21799
## bBC 49.03426 18.016352 20.24065 77.82787
## sigma 39.17074 5.330386 30.65175 47.68972
8H5. Consider the data(Wines2012) data table. These data are expert ratings of 20 different French and American wines by 9 different French and American judges. Your goal is to model score, the subjective rating assigned by each judge to each wine. I recommend standardizing it. In this problem, consider only variation among judges and wines. Construct index variables of judge and wine and then use these index variables to construct a linear regression model. Justify your priors. You should end up with 9 judge parameters and 20 wine parameters. Plot the parameter estimates. How do you interpret the variation among individual judges and individual wines? Do you notice any patterns, just by plotting the differences? Which judges gave the highest/lowest ratings? Which wines were rated worst/best on average?
library(dplyr)
library(rethinking)
data(Wines2012)
df = Wines2012
df2 = df %>%
mutate(s = standardize(score),
judge = as.integer(judge),
wine = as.integer(wine)) %>%
select(s, judge, wine)
m2 <- ulam(
alist(
s ~ dnorm(mu, sigma),
mu <- a[judge] + b[wine],
a[judge] ~ dnorm(0, 0.5),
b[wine] ~ dnorm(0, 0.5),
sigma ~ dexp(1)
),
data = df2,
chains = 4 ,
cores = 4
)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/usr/local/include -fPIC -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:16: error: expected ';' after top level declarator
## namespace Eigen {
## ^
## ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
## ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
precis(m2, 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] -0.280600714 0.19050389 -0.58365797 0.01437641 2190.695 0.9981236
## a[2] 0.208540438 0.19689691 -0.10330061 0.51208528 2327.454 0.9997113
## a[3] 0.200911365 0.19697702 -0.10844624 0.50956785 2431.067 0.9983620
## a[4] -0.545389840 0.20309564 -0.86572747 -0.22711477 2384.073 0.9984522
## a[5] 0.791030001 0.19678133 0.46589186 1.11292314 2570.728 0.9985539
## a[6] 0.469484279 0.20014324 0.13491483 0.77773327 2490.065 0.9990251
## a[7] 0.129271907 0.19581095 -0.17882993 0.43985072 1991.046 1.0001667
## a[8] -0.659839261 0.19396898 -0.96306987 -0.34558684 1935.295 0.9988078
## a[9] -0.343878335 0.20002000 -0.66583855 -0.02642579 2090.180 0.9988553
## b[1] 0.117215690 0.25591504 -0.28952398 0.52357563 3036.023 0.9987001
## b[2] 0.087212154 0.25761460 -0.33593030 0.49437528 3187.735 0.9980640
## b[3] 0.230274258 0.26603216 -0.20828863 0.64371621 2728.102 0.9989098
## b[4] 0.461681793 0.25550438 0.05153983 0.86384518 2518.212 0.9990931
## b[5] -0.109983727 0.25889057 -0.54180620 0.31100129 2716.183 0.9987349
## b[6] -0.309980213 0.26369897 -0.73030114 0.11664519 2391.580 1.0004569
## b[7] 0.252041584 0.25747705 -0.15275898 0.66591833 2645.964 0.9987999
## b[8] 0.230265779 0.25093112 -0.16251925 0.63508341 2215.460 0.9993079
## b[9] 0.069223689 0.25149291 -0.33223548 0.47424692 2605.972 0.9998092
## b[10] 0.108235338 0.25956634 -0.30005543 0.52807133 3047.277 0.9988528
## b[11] -0.003802214 0.25904608 -0.43091926 0.39634458 2589.138 0.9984171
## b[12] -0.025183534 0.26651038 -0.45468444 0.40102842 2913.684 0.9999867
## b[13] -0.088262138 0.27025338 -0.53047788 0.34649205 3262.710 0.9986698
## b[14] 0.011612358 0.26633690 -0.41329444 0.44495306 3387.121 0.9987698
## b[15] -0.179735075 0.24688948 -0.58131438 0.22230428 3175.406 0.9985195
## b[16] -0.165456034 0.25505328 -0.58178629 0.22836526 3089.810 0.9987438
## b[17] -0.114545615 0.24882635 -0.50945145 0.28530275 2906.113 0.9989782
## b[18] -0.711675066 0.25715266 -1.12193425 -0.29045388 2651.458 0.9996908
## b[19] -0.131881679 0.25895529 -0.55341809 0.27506882 3064.918 0.9995525
## b[20] 0.325907242 0.27307327 -0.11599489 0.75818074 2880.483 0.9990134
## sigma 0.846955058 0.04683172 0.77579793 0.92392809 3046.120 0.9991995
traceplot(m2)
## [1] 1000
## [1] 1
## [1] 1000
# From this plot, we know that judge 5 generally gave highest rating in average and judge 8 gave lowest rating in average. And, wine 4 got highest rating and 18 got lowest rating in anverage.