This chapter introduced interactions, which allow for the association between a predictor and an outcome to depend upon the value of another predictor. While you can’t see them in a DAG, interactions can be important for making accurate inferences. Interactions can be difficult to interpret, and so the chapter also introduced triptych plots that help in visualizing the effect of an interaction. No new coding skills were introduced, but the statistical models considered were among the most complicated so far in the book.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
8E1. For each of the causal relationships below, name a hypothetical third variable that would lead to an interaction effect:
#1. The time it takes for the bread dough to rise: different amount of time spent can result in different level of rising.
#2. The area of study: different areas of study usually have different levels of income.
#3. The type of the engine: whether the engine relies on gas, diesel or new powers like electricity or hydrogen fuel cells can determine if a car can go with gas.
8E2. Which of the following explanations invokes an interaction?
#1. This statement invokes interaction between driness and heat in caramelizing onions.
#2. No interaction is invoked.
#3. No interaction is invoked.
#4. This statement invokes interaction between acting socially or having manupulative appendages in determine animals' intellectual level.
8E3. For each of the explanations in 8E2, write a linear model that expresses the stated relationship.
#1.M = aX + bY + cXY
#2.M = aX +bY
#3.M = aXY + bXZ
#4.M = aX + bY + cXY
8M1. Recall the tulips example from the chapter. Suppose another set of treatments adjusted the temperature in the greenhouse over two levels: cold and hot. The data in the chapter were collected at the cold temperature. You find none of the plants grown under the hot temperature developed any blooms at all, regardless of the water and shade levels. Can you explain this result in terms of interactions between water, shade, and temperature?
#It is stated that the relationships between blossoms and water and between blossoms and shade depend upon the value of temperature, we can use the 3 predictor variables water, shade, and temperature to come up with the interactions effects: a single three-way interaction ot three two-way interactions (WST, WS, WT, and ST).
8M2. Can you invent a regression equation that would make the bloom size zero, whenever the temperature is hot?
#Make the model: M = a +bW +cS +dT + eWS + fWT + gST + hWST
#When W=S=T=1, and a+d=0, b+f=0, c+g=0, e+h=0, the equation should always be 0.
8M4. Repeat the tulips analysis, but this time use priors that constrain the effect of water to be positive and the effect of shade to be negative. Use prior predictive simulation. What do these prior assumptions mean for the interaction prior, if anything?
library(rethinking)
## Loading required package: rstan
## Warning: package 'rstan' was built under R version 3.6.3
## Loading required package: StanHeaders
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.6.3
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
## Loading required package: parallel
## Loading required package: dagitty
## Warning: package 'dagitty' was built under R version 3.6.3
## rethinking (Version 2.01)
##
## Attaching package: 'rethinking'
## The following object is masked from 'package:stats':
##
## rstudent
data(tulips)
d <- tulips
str(d)
## 'data.frame': 27 obs. of 4 variables:
## $ bed : Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 2 ...
## $ water : int 1 1 1 2 2 2 3 3 3 1 ...
## $ shade : int 1 2 3 1 2 3 1 2 3 1 ...
## $ blooms: num 0 0 111 183.5 59.2 ...
d$blooms_std <- d$blooms/max(d$blooms)
d$water_cent <- d$water-mean(d$water)
d$shade_cent <- d$shade-mean(d$shade)
m <- rnorm(10000, 0.5, 1);sum(m<0|m>1)/length(m)
## [1] 0.6086
m <- rnorm(10000, 0.5, 0.25);sum(m<0|m>1)/length(m)
## [1] 0.044
8H1. Return to the data(tulips) example in the chapter. Now include the bed variable as a predictor in the interaction model. Don’t interact bed with the other predictors; just include it as a main effect. Note that bed is categorical. So to use it properly, you will need to either construct dummy variables or rather an index variable, as explained in Chapter 5.
data(tulips)
d <- tulips
str(d)
## 'data.frame': 27 obs. of 4 variables:
## $ bed : Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 2 ...
## $ water : int 1 1 1 2 2 2 3 3 3 1 ...
## $ shade : int 1 2 3 1 2 3 1 2 3 1 ...
## $ blooms: num 0 0 111 183.5 59.2 ...
d$bed_id <- coerce_index(d$bed)
d$blooms_std <- d$blooms / max(d$blooms)
d$water_cent <- d$water - mean(d$water)
d$shade_cent <- d$shade - mean(d$shade)
m <- quap(
alist(
blooms_std ~ dnorm(mu, sigma),
mu <- a + bb*bed_id + bw*water_cent + bs*shade_cent + bws*water_cent*shade_cent,
a ~ dnorm(0.5,0.25),
bw ~ dnorm(0,0.25),
bs ~ dnorm(0,0.25),
bws ~ dnorm(0,0.25),
bb ~ dnorm(0, 0.25),
sigma ~ dunif(0, 100)
),
data=d
)
precis(m,depth = 2)
## mean sd 5.5% 94.5%
## a 0.2332042 0.05535627 0.14473422 0.32167424
## bw 0.2072940 0.02619484 0.16542960 0.24915842
## bs -0.1137708 0.02618970 -0.15562700 -0.07191461
## bws -0.1437442 0.03199181 -0.19487332 -0.09261514
## bb 0.0627202 0.02569133 0.02166049 0.10377991
## sigma 0.1117187 0.01524575 0.08735302 0.13608433
#From the results we can see bws has a negative mean when factoring in the negative effect of shade and positive effect of water.
8H5. Consider the data(Wines2012) data table. These data are expert ratings of 20 different French and American wines by 9 different French and American judges. Your goal is to model score, the subjective rating assigned by each judge to each wine. I recommend standardizing it. In this problem, consider only variation among judges and wines. Construct index variables of judge and wine and then use these index variables to construct a linear regression model. Justify your priors. You should end up with 9 judge parameters and 20 wine parameters. How do you interpret the variation among individual judges and individual wines? Do you notice any patterns, just by plotting the differences? Which judges gave the highest/lowest ratings? Which wines were rated worst/best on average?
library(lavaan)
## Warning: package 'lavaan' was built under R version 3.6.3
## This is lavaan 0.6-7
## lavaan is BETA software! Please report any bugs.
data("wines2012")
d <-wines2012
d_list = list(s = standardize(d$score),
wine = as.integer(d$wine),
judge = as.integer(d$judge))
str(d_list)
## List of 3
## $ s : num [1:180] -1.5766 -0.4505 -0.0751 0.3003 -2.3274 ...
## ..- attr(*, "scaled:center")= num 14.2
## ..- attr(*, "scaled:scale")= num 2.66
## $ wine : int [1:180] 1 3 5 7 9 11 13 15 17 19 ...
## $ judge: int [1:180] 4 4 4 4 4 4 4 4 4 4 ...
M <- ulam(alist(
s ~ dnorm(mu, sigma),
mu <- j[judge] + w[wine] ,
w[wine] ~ dnorm(0, 0.5),
j[judge] ~ dnorm(0, 0.5),
sigma ~ dexp(1)),
data = d_list,
chains = 4,
cores = 4)
precis(M, 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## w[1] 0.116266471 0.25921793 -0.29631064 0.52798012 2758.707 0.9994292
## w[2] 0.090263892 0.24937125 -0.31907736 0.48928918 2777.306 0.9984478
## w[3] 0.231210720 0.25887381 -0.16870937 0.64796013 2760.933 0.9989677
## w[4] 0.466367196 0.25801820 0.06250571 0.87679858 2886.948 0.9986463
## w[5] -0.095416658 0.25991056 -0.51936796 0.32524965 2942.226 0.9981814
## w[6] -0.307740102 0.25525620 -0.71539754 0.10184734 2713.669 1.0000089
## w[7] 0.245789089 0.25688836 -0.16061271 0.66207807 2650.036 0.9982529
## w[8] 0.226228231 0.26900818 -0.20855551 0.64748342 2855.893 0.9988179
## w[9] 0.073548684 0.25955806 -0.33327054 0.49949935 3176.975 0.9990584
## w[10] 0.105860444 0.25723431 -0.31110497 0.50256846 3338.808 0.9991290
## w[11] -0.005887209 0.25721454 -0.41563724 0.39050646 2467.431 1.0004121
## w[12] -0.026268416 0.26259111 -0.44245296 0.39795301 2507.631 0.9991729
## w[13] -0.092629102 0.25709891 -0.50499008 0.32185024 2642.336 1.0001546
## w[14] 0.012426106 0.24905507 -0.38436340 0.40803538 2685.550 0.9988763
## w[15] -0.181728497 0.26310092 -0.60869824 0.22387195 3188.943 0.9990476
## w[16] -0.165873207 0.25279349 -0.57255214 0.25936030 2758.445 1.0010367
## w[17] -0.121002720 0.26383497 -0.56314924 0.30829800 2416.859 0.9991789
## w[18] -0.711691756 0.25456401 -1.11018288 -0.29953762 2674.450 0.9985718
## w[19] -0.132956587 0.25612394 -0.55228285 0.27852182 3024.802 1.0006658
## w[20] 0.326431398 0.25796519 -0.07710453 0.74172353 2826.409 0.9987609
## j[1] -0.284375156 0.19189810 -0.59090054 0.02550844 2137.565 1.0006132
## j[2] 0.218172354 0.19092640 -0.08458625 0.52040526 2310.817 0.9996142
## j[3] 0.202597509 0.19477612 -0.11691384 0.50423164 2231.393 1.0004486
## j[4] -0.545477147 0.19212286 -0.85819369 -0.25479469 2166.379 1.0001482
## j[5] 0.790743354 0.19772317 0.47470469 1.10240782 2374.973 0.9989444
## j[6] 0.468222544 0.19616023 0.15977148 0.78204563 2116.277 0.9997751
## j[7] 0.131293692 0.19064415 -0.18356069 0.42799377 2022.634 0.9998785
## j[8] -0.662943611 0.19747789 -0.97699879 -0.34464695 2183.044 1.0006947
## j[9] -0.343326393 0.19422643 -0.65084048 -0.02489394 2235.349 0.9995605
## sigma 0.846666918 0.04766987 0.77309344 0.92213546 2451.933 0.9987225
traceplot(M)
## Waiting to draw page 2 of 2
#From the graphs we can tell on average, judge 9 seems to give the highest rating whereas judge 4 gives the lowest rating. Wine 17 appears to have the highest rating whereas wine 13 has the lowest rating.