Assignment #6

Questions

7-1. When comparing models with an information criterion, why must all models be fit to exactly the same observations? What would happen to the information criterion values, if the models were fit to different numbers of observations? Perform some experiments.

It is because deviance is the measure of model performance. It accrued over observations. If the models fit to different observations, the model fit to more observations will probably have higher deviance, seemingly less accurate. However, that inaccuracy comes from application but not from the model itself. It would be an unfair comparison to contrast models fit to different numbers of observations.

7-2. What happens to the effective number of parameters, as measured by PSIS or WAIC, as a prior becomes more concentrated? Why? Perform some experiments.

The effective number of parameters decreases as the prior becomes more concentrated.In the case of PSIS, with more concentrated priors, the model becomes less flexible. In the case of WAIC, the likelihood will become more concentrated as well and thus variance will decrease with more concentrated priors.

7-3. Consider three fictional Polynesian islands. On each there is a Royal Ornithologist charged by the king with surveying the bird population. They have each found the following proportions of 5 important bird species:

# kable(d, align = "cccccc")
# three islands
library(tidyverse)
dat_island <- tibble("Island" = c("Island 1", "Island 2", "Island 3"),
       "Species A" = c(0.2, 0.8, 0.05), 
       "Species B" = c(0.2, 0.1, 0.15), 
       "Species C" = c(0.2, 0.05, 0.7), 
       "Species D" = c(0.2, 0.025, 0.05), 
       "Species E" = c(0.2, 0.025, 0.05))

dat_island %>% knitr::kable()

Island	Species A	Species B	Species C	Species D	Species E
Island 1	0.20	0.20	0.20	0.200	0.200
Island 2	0.80	0.10	0.05	0.025	0.025
Island 3	0.05	0.15	0.70	0.050	0.050

Notice that each row sums to 1, all the birds. This problem has two parts. It is not computationally complicated. But it is conceptually tricky. First, compute the entropy of each island’s bird distribution. Interpret these entropy values. Second, use each island’s bird distribution to predict the other two. This means to compute the KL divergence of each island from the others, treating each island as if it were a statistical model of the other islands. You should end up with 6 different KL divergence values. Which island predicts the others best? Why?

# calculate entropy of each island
dat_island_long <- dat_island %>% 
  janitor::clean_names() %>% 
  pivot_longer(cols = -island, names_to = "species", values_to = "proportion") 

inf_entropy <- function(p){-sum(p*log(p))}

dat_island_long %>% 
  group_by(island) %>% 
  summarise(entropy = inf_entropy(proportion)) %>% 
  mutate(across(is.numeric, round, 2)) %>% 
  knitr::kable()

island	entropy
Island 1	1.61
Island 2	0.74
Island 3	0.98

The first island has largest entropy,followed by the third, and then the second in last place.

7-4. Recall the marriage, age, and happiness collider bias example from Chapter 6. Run models m6.9 and m6.10 again (page 178). Compare these two models using WAIC (or PSIS, they will produce identical results). Which model is expected to make better predictions? Which model provides the correct causal inference about the influence of age on happiness? Can you explain why the answers to these two questions disagree?

library(dagitty)
library(ggdag)

hma_dag <- dagitty("dag{H -> M <- A}")
coordinates(hma_dag) <- list(x = c(H = 1, M = 2, A = 3),
                             y = c(H = 1, M = 1, A = 1))

ggplot(hma_dag, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_text(color = "black", size = 10) +
  geom_dag_edges(edge_color = "black", edge_width = 2,
                 arrow_directed = grid::arrow(length = grid::unit(15, "pt"),
                                              type = "closed")) +
  theme_void()

# generate data and estimate model
library(rethinking)
library(brms)
d <- sim_happiness(seed = 1977, N_years = 1000)
dat <- d %>% filter(age > 17) %>%mutate(a = (age - 18) / (65 - 18),
                                        mid = factor(married + 1, labels = c("single", "married")))

b6.9 <- brm(happiness ~ 0 + mid + a, data = dat, family = gaussian,
            prior = c(prior(normal(0, 1), class = b, coef = midmarried),
                      prior(normal(0, 1), class = b, coef = midsingle),
                      prior(normal(0, 2), class = b, coef = a),
                      prior(exponential(1), class = sigma)),
            iter = 4000, warmup = 2000, chains = 4, cores = 4, seed = 1234)

b6.10 <- brm(happiness ~ 1 + a, data = dat, family = gaussian,
             prior = c(prior(normal(0, 1), class = Intercept),
                       prior(normal(0, 2), class = b, coef = a),
                       prior(exponential(1), class = sigma)),
             iter = 4000, warmup = 2000, chains = 4, cores = 4, seed = 1234)

# compare model
b6.9 <- add_criterion(b6.9, criterion = "loo")
b6.10 <- add_criterion(b6.10, criterion = "loo")

loo_compare(b6.9, b6.10)

##       elpd_diff se_diff
## b6.9     0.0       0.0 
## b6.10 -194.1      17.6

PSIS shows a strong preference for b6.9, which is the model that includes both age and marriage status. However, b6.10 provides the correct causal inference, as no additional conditioning is needed.

adjustmentSets(hma_dag, exposure = "A", outcome = "H")

##  {}

7-5. Revisit the urban fox data, data(foxes), from the previous chapter’s practice problems. Use WAIC or PSIS based model comparison on five different models, each using weight as the outcome, and containing these sets of predictor variables:

avgfood + groupsize + area
avgfood + groupsize
groupsize + area
avgfood
area

Can you explain the relative differences in WAIC scores, using the fox DAG from the previous chapter? Be sure to pay attention to the standard error of the score differences (dSE).

library(rethinking)
library(brms)
data(foxes)

fox_dat <- foxes %>%
  as_tibble() %>%
  select(area, avgfood, weight, groupsize) %>%
  mutate(across(everything(), standardize))

b7h5_1 <- brm(weight ~ 1 + avgfood + groupsize + area, data = fox_dat,
              family = gaussian,
              prior = c(prior(normal(0, 0.2), class = Intercept),
                        prior(normal(0, 0.5), class = b),
                        prior(exponential(1), class = sigma)),
              iter = 4000, warmup = 2000, chains = 4, cores = 4, seed = 1234)

b7h5_2 <- brm(weight ~ 1 + avgfood + groupsize, data = fox_dat,
              family = gaussian,
              prior = c(prior(normal(0, 0.2), class = Intercept),
                        prior(normal(0, 0.5), class = b),
                        prior(exponential(1), class = sigma)),
              iter = 4000, warmup = 2000, chains = 4, cores = 4, seed = 1234)

b7h5_3 <- brm(weight ~ 1 + groupsize + area, data = fox_dat,
              family = gaussian,
              prior = c(prior(normal(0, 0.2), class = Intercept),
                        prior(normal(0, 0.5), class = b),
                        prior(exponential(1), class = sigma)),
              iter = 4000, warmup = 2000, chains = 4, cores = 4, seed = 1234)

b7h5_4 <- brm(weight ~ 1 + avgfood, data = fox_dat,
              family = gaussian,
              prior = c(prior(normal(0, 0.2), class = Intercept),
                        prior(normal(0, 0.5), class = b),
                        prior(exponential(1), class = sigma)),
              iter = 4000, warmup = 2000, chains = 4, cores = 4, seed = 1234)

b7h5_5 <- brm(weight ~ 1 + area, data = fox_dat,
              family = gaussian,
              prior = c(prior(normal(0, 0.2), class = Intercept),
                        prior(normal(0, 0.5), class = b),
                        prior(exponential(1), class = sigma)),
              iter = 4000, warmup = 2000, chains = 4, cores = 4, seed = 1234)

# calculate WAIC for each model
b7h5_1 <- add_criterion(b7h5_1, criterion = "waic")
b7h5_2 <- add_criterion(b7h5_2, criterion = "waic")
b7h5_3 <- add_criterion(b7h5_3, criterion = "waic")
b7h5_4 <- add_criterion(b7h5_4, criterion = "waic")
b7h5_5 <- add_criterion(b7h5_5, criterion = "waic")

comp <- loo_compare(b7h5_1, b7h5_2, b7h5_3, b7h5_4, b7h5_5, criterion = "waic")
comp

##        elpd_diff se_diff
## b7h5_1  0.0       0.0   
## b7h5_3 -0.4       1.4   
## b7h5_2 -0.5       1.7   
## b7h5_4 -5.3       3.4   
## b7h5_5 -5.4       3.4

Assignment #6

Chapter 7

Yuman Liang

2021-09-07

Chapter 7 - Ulysses’ Compass

Questions

The first island has largest entropy,followed by the third, and then the second in last place.

PSIS shows a strong preference for b6.9, which is the model that includes both age and marriage status. However, b6.10 provides the correct causal inference, as no additional conditioning is needed.