Assignment #6

Questions

7-1. When comparing models with an information criterion, why must all models be fit to exactly the same observations? What would happen to the information criterion values, if the models were fit to different numbers of observations? Perform some experiments.

#Log-pointwise-predictive density (LPPD) considers log only after adding up all the sample probabilities and divde the resultant by the total number of samples. 
# The larger the sample makes the small values of LPPD.

7-2. What happens to the effective number of parameters, as measured by PSIS or WAIC, as a prior becomes more concentrated? Why? Perform some experiments.

#The penalty term of the WAIC should be equivalent to effective number of parameters. The prior becomes more concentrated, the variance of the log probability reduces,thus resulting in smaller value of  effective number of parameters.

7-3. Consider three fictional Polynesian islands. On each there is a Royal Ornithologist charged by the king with surveying the bird population. They have each found the following proportions of 5 important bird species:

Island	Species A	Species B	Species C	Species D	Species E
1	0.2	0.2	0.2	0.2	0.2
2	0.8	0.1	0.05	0.025	0.025
3	0.05	0.15	0.7	0.05	0.05

Notice that each row sums to 1, all the birds. This problem has two parts. It is not computationally complicated. But it is conceptually tricky. First, compute the entropy of each island’s bird distribution. Interpret these entropy values. Second, use each island’s bird distribution to predict the other two. This means to compute the KL divergence of each island from the others, treating each island as if it were a statistical model of the other islands. You should end up with 6 different KL divergence values. Which island predicts the others best? Why?

library(magrittr)
library(dplyr)
# H(p) = −∑(p_i x log(p_i))
#Compute the entropies

N <- function(p) -1 * sum(p * log(p) )
Island_1 <- c( 0.2 , 0.2 , 0.2 , 0.2 , 0.2)
Island_2 <- c( 0.8 , 0.1 , 0.05 , 0.025 , 0.025)
Island_3 <- c( 0.05 , 0.15 , 0.7 , 0.05 , 0.05)

N(Island_1)

## [1] 1.609438

N(Island_2)

## [1] 0.7430039

N(Island_3)

## [1] 0.9836003

#The entropy, N, can be defined as the uncertainty contained in a probability distribution. 
#Island 1 has the five bird species evenly distributed. Which means, the island will be hardest to predict.
# On the other hand, Island 2 is easiest to predict because the vast majority is species A, leading to the lowest entropy.

D_kl <- function(p, q) sum(p * (log(p) - log(q)))
D_kl(Island_1,Island_2)

## [1] 0.9704061

D_kl(Island_1,Island_3)

## [1] 0.6387604

D_kl(Island_2,Island_1)

## [1] 0.866434

D_kl(Island_2,Island_3)

## [1] 2.010914

D_kl(Island_3,Island_1)

## [1] 0.6258376

D_kl(Island_3,Island_2)

## [1] 1.838845

# The highest entropy is with Island 1 when the KL distance is shorter.

7-4. Recall the marriage, age, and happiness collider bias example from Chapter 6. Run models m6.9 and m6.10 again (page 178). Compare these two models using WAIC (or PSIS, they will produce identical results). Which model is expected to make better predictions? Which model provides the correct causal inference about the influence of age on happiness? Can you explain why the answers to these two questions disagree?

library(rethinking)
data <- sim_happiness(seed = 1977, N_years = 1000)
df <- data[data$age>17,] # only adults
df$A <- (df$age - 18) / (65 - 18)
df$mid <- df$married + 1

model1 <- quap(
    alist(
        happiness ~ dnorm(mu, sigma),
        mu <- a[mid] + bA*A,
        a[mid] ~ dnorm(0, 1),
        bA ~ dnorm(0, 2),
        sigma ~ dexp(1)
    ) , data=df)

precis(model1, depth = 2)

##             mean         sd       5.5%      94.5%
## a[1]  -0.2350877 0.06348986 -0.3365568 -0.1336186
## a[2]   1.2585517 0.08495989  1.1227694  1.3943340
## bA    -0.7490274 0.11320112 -0.9299447 -0.5681102
## sigma  0.9897080 0.02255800  0.9536559  1.0257600

model2 <- quap(
    alist(
        happiness ~ dnorm(mu, sigma),
        mu <- a + bA*A,
        a ~ dnorm(0, 1),
        bA ~ dnorm(0, 2),
        sigma ~ dexp(1)
    ) , data=df )

precis(model2)

##                mean         sd       5.5%     94.5%
## a      1.649248e-07 0.07675015 -0.1226614 0.1226617
## bA    -2.728620e-07 0.13225976 -0.2113769 0.2113764
## sigma  1.213188e+00 0.02766080  1.1689803 1.2573949

compare(model1, model2)

##            WAIC       SE    dWAIC      dSE    pWAIC       weight
## model1 2713.971 37.54465   0.0000       NA 3.738532 1.000000e+00
## model2 3101.906 27.74379 387.9347 35.40032 2.340445 5.768312e-85

#By WAIC, model1 seems to performs better than model2 at prediction. Still, model2 gives us more insight that it provides us the information about the casual relationship.

7-5. Revisit the urban fox data, data(foxes), from the previous chapter’s practice problems. Use WAIC or PSIS based model comparison on five different models, each using weight as the outcome, and containing these sets of predictor variables:

avgfood + groupsize + area
avgfood + groupsize
groupsize + area
avgfood
area

Can you explain the relative differences in WAIC scores, using the fox DAG from the previous chapter? Be sure to pay attention to the standard error of the score differences (dSE).

data(foxes)

fox_dat <- foxes[,-1] %>%
           as_tibble() %>%
           mutate(across(everything(), standardize))

set.seed(12)
N<-100
a <- rnorm(N, 0, 0.3)
b <- rnorm(N, 0, 0.5)
plot(NULL, xlim=range(fox_dat$area),ylim=c(-4,4))
xbar<-mean(fox_dat$area)

for(i in 1:N) curve(a[i] + b[i]*(x- xbar),
                   from=min(fox_dat$area), to = max(fox_dat$area), add = TRUE,
                   col = col.alpha("black",0.2)
                   )

# 1 avgFood + Groupsize + area
model_1 = quap(
  alist(
      weight ~ dnorm(mu, sigma),
      mu <- a + b_avgfood*avgfood + b_groupsize*groupsize + b_area*area,
      a ~ dnorm(0,3),
      b_avgfood ~ dnorm(0,5),
      b_groupsize ~ dnorm(0,5),
      b_area ~ dnorm(0,5), 
     sigma ~ dexp(1)
  ), data = fox_dat
)

# 2 avgFood + Groupsize
model_2 = quap(
  alist(
      weight ~ dnorm(mu, sigma),
      mu <- a + b_avgfood*avgfood + b_groupsize*groupsize,
      a ~ dnorm(0,3),
      b_avgfood ~ dnorm(0,5),
      b_groupsize ~ dnorm(0,5),
     sigma ~ dexp(1)
  ), data = fox_dat
)

# 3 Groupsize and area
model_3 = quap(
  alist(
      weight ~ dnorm(mu, sigma),
      mu <- a + b_area*area + b_groupsize*groupsize,
      a ~ dnorm(0,3),
      b_area ~ dnorm(0,5),
      b_groupsize ~ dnorm(0,5),
     sigma ~ dexp(1)
  ), data = fox_dat
)

#4 avgFood
model_4 = quap(
  alist(
      weight ~ dnorm(mu, sigma),
      mu <- a + b_avgfood*avgfood,
      a ~ dnorm(0,3),
      b_avgfood ~ dnorm(0,5),
     sigma ~ dexp(1)
  ), data = fox_dat
)

#5 Area
model_5 = quap(
  alist(
      weight ~ dnorm(mu, sigma),
      mu <- a + b_area*area,
      a ~ dnorm(0,3),
      b_area ~ dnorm(0,5),
     sigma ~ dexp(1)
  ), data = fox_dat
)

compare(model_1, model_2, model_3, model_4, model_5)

##             WAIC       SE      dWAIC      dSE    pWAIC      weight
## model_2 324.1728 16.78047 0.00000000       NA 4.212216 0.342314447
## model_3 324.1861 16.10454 0.01327581 7.253697 4.004111 0.340049721
## model_1 324.3567 16.98640 0.18388907 3.850939 5.679543 0.312244088
## model_4 333.6923 13.76667 9.51950837 8.425137 2.546818 0.002932853
## model_5 334.0448 13.67321 9.87203871 8.645791 2.792357 0.002458891

plot(compare(model_1, model_2, model_3, model_4, model_5))

# model_1, model_2, and model_3 nearly have the same dWAIC because they all have groupsize parameter in their model, which tells us there is no back door route from the rest variables. 

# model_4 and model_5 nearly the same dWAIC since the group size is conditioned for, area and avgFood information flows in same direction to weight.

Assignment #6

Chapter 7

Nishitha Vinnakota_ANLY505

2022-03-28

Chapter 7 - Ulysses’ Compass

Questions