Assignment #6

Questions

7-1. When comparing models with an information criterion, why must all models be fit to exactly the same observations? What would happen to the information criterion values, if the models were fit to different numbers of observations? Perform some experiments.

#Information criterion are based on deviance. They are accrued over observations and hence, are a sum and not a mean. 

#For a model with higher number of observations, it will have a higher deviance and hence, lower accuracy. This is not a good metric to contract models with different number of observations.


library(rethinking)
data(Howell1)
str(Howell1)

## 'data.frame':    544 obs. of  4 variables:
##  $ height: num  152 140 137 157 145 ...
##  $ weight: num  47.8 36.5 31.9 53 41.3 ...
##  $ age   : num  63 63 65 41 51 35 32 27 19 54 ...
##  $ male  : int  1 0 0 1 0 1 0 1 0 1 ...

data = Howell1[complete.cases(Howell1), ]
data1 = data[sample(1:nrow(data), size = 400, replace = FALSE), ]
data2 = data[sample(1:nrow(data), size = 300, replace = FALSE), ]
data3 = data[sample(1:nrow(data), size = 200, replace = FALSE), ]
m1 = map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * log(weight)
  ),
  data = data1,
  start = list(a = mean(data1$height), b = 0, sigma = sd(data1$height))
)
m2 = map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * log(weight)
  ),
  data = data2,
  start = list(a = mean(data2$height), b = 0, sigma = sd(data2$height))
)
m3 = map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * log(weight)
  ),
  data = data3,
  start = list(a = mean(data3$height), b = 0, sigma = sd(data3$height))
)
#Compare the models
(model.compare = compare(m1, m2, m3))

##        WAIC       SE     dWAIC      dSE    pWAIC        weight
## m3 1221.562 20.14594    0.0000       NA 2.972003  1.000000e+00
## m2 1848.658 28.67660  627.0955 38.12319 3.343490 6.728735e-137
## m1 2419.567 33.22059 1198.0051 43.00450 3.342340 7.186257e-261

7-2. What happens to the effective number of parameters, as measured by PSIS or WAIC, as a prior becomes more concentrated? Why? Perform some experiments.

# As prior becomes more concentrated, the effective number of parameters decreases. This is due to the fact that pWAIC is a measure of the variance in the log-likelihood for each observation. Thus, with more concentrated priors, the likelihood will become more concentrated as well and as result, variance will decrease.


#Experiment
data$height.log = log(data$height)
data$height.log.z = (data$height.log - mean(data$height.log)) / sd(data$height.log)
data$weight.log = log(data$weight)
data$weight.log.z = (data$weight.log - mean(data$weight.log)) / sd(data$weight.log)
m1 = map(
  alist(
    height.log.z ~ dnorm(mu, sigma),
    mu <- a + b * weight.log.z,
    a ~ dnorm(0, 10),
    b ~ dnorm(1, 10),
    sigma ~ dunif(0, 10)
  ),
  data = data
)
m2 = map(
  alist(
    height.log.z ~ dnorm(mu, sigma),
    mu <- a + b * weight.log.z,
    a ~ dnorm(0, 0.10),
    b ~ dnorm(1, 0.10),
    sigma ~ dunif(0, 1)
  ),
  data = data
)
WAIC(m1, refresh = 0)

##        WAIC     lppd  penalty  std_err
## 1 -102.8644 55.60839 4.176195 36.58277

WAIC(m2, refresh = 0)

##        WAIC     lppd  penalty std_err
## 1 -102.8186 55.60479 4.195488 36.6638

#Thus,the pWAIC decreases as the priors become more concentrated.

7-3. Consider three fictional Polynesian islands. On each there is a Royal Ornithologist charged by the king with surveying the bird population. They have each found the following proportions of 5 important bird species:

Island	Species A	Species B	Species C	Species D	Species E
1	0.2	0.2	0.2	0.2	0.2
2	0.8	0.1	0.05	0.025	0.025
3	0.05	0.15	0.7	0.05	0.05

Notice that each row sums to 1, all the birds. This problem has two parts. It is not computationally complicated. But it is conceptually tricky. First, compute the entropy of each island’s bird distribution. Interpret these entropy values. Second, use each island’s bird distribution to predict the other two. This means to compute the KL divergence of each island from the others, treating each island as if it were a statistical model of the other islands. You should end up with 6 different KL divergence values. Which island predicts the others best? Why?

#Entropies
x_entropy=-sum(x * log(x))
x_entropy

## [1] 1.609438

y_entropy=-sum(y * log(y))
y_entropy

## [1] 0.7430039

z_entropy=-sum(z * log(z))
z_entropy

## [1] 0.9836003

#The higher the entropy, the more uncertain we are of the probability density distribution at hand. The entropy of the second island (y) is the lowest and hence, certainty regarding the bird proportions for this island is higher than the others.

#Island 1 is the target
yz = apply(cbind(y_entropy, z_entropy), 1, mean)
D_yz_x= sum(x_entropy * log(x_entropy / yz))
D_x_yz=sum(yz * log(yz / x_entropy))

output = data.frame(Approximated=D_yz_x,Approximator=D_x_yz)


#Island 2 is the target
xz = apply(cbind(x_entropy, z_entropy), 1, mean)
D_xz_y= sum(y_entropy * log(y_entropy / xz))
D_y_xz=sum(xz * log(xz / y_entropy))


output = rbind(output, c(D_xz_y, D_y_xz))

#Island 3 is the target
xy = apply(cbind(x_entropy, y_entropy), 1, mean)
D_xy_z= sum(z_entropy * log(z_entropy / xy))
D_z_xy=sum(xy * log(xy / z_entropy))

output = rbind(output, c(D_xy_z, D_y_xz))

rownames(output) = c("Island 1", "Island 2", "Island 3")
output

##          Approximated Approximator
## Island 1    1.0024795   -0.5377298
## Island 2   -0.4136578    0.7218202
## Island 3   -0.1759094    0.7218202

#Island 1 is the best predictor for orger islands.

7-4. Recall the marriage, age, and happiness collider bias example from Chapter 6. Run models m6.9 and m6.10 again (page 178). Compare these two models using WAIC (or PSIS, they will produce identical results). Which model is expected to make better predictions? Which model provides the correct causal inference about the influence of age on happiness? Can you explain why the answers to these two questions disagree?

## R code 6.21
d <- sim_happiness(seed = 1977, N_years = 1000)
## R code 6.22
d2 <- d[d$age > 17, ] # only adults
d2$A <- (d2$age - 18) / (65 - 18)
## R code 6.23
d2$mid <- d2$married + 1
m6.9 <- quap(
  alist(
    happiness ~ dnorm(mu, sigma),
    mu <- a[mid] + bA * A,
    a[mid] ~ dnorm(0, 1),
    bA ~ dnorm(0, 2),
    sigma ~ dexp(1)
  ),
  data = d2
)
## R code 6.24
m6.10 <- quap(
  alist(
    happiness ~ dnorm(mu, sigma),
    mu <- a + bA * A,
    a ~ dnorm(0, 1),
    bA ~ dnorm(0, 2),
    sigma ~ dexp(1)
  ),
  data = d2
)
## Comparison
compare(m6.9, m6.10)

##           WAIC       SE    dWAIC      dSE    pWAIC       weight
## m6.9  2713.971 37.54465   0.0000       NA 3.738532 1.000000e+00
## m6.10 3101.906 27.74379 387.9347 35.40032 2.340445 5.768312e-85

#Based on this, it appears that mc.9 does a better job at predicting as it contains both age and marriage status.
#However, m6.10 is better as mc.9 fails to find casuality.

7-5. Revisit the urban fox data, data(foxes), from the previous chapter’s practice problems. Use WAIC or PSIS based model comparison on five different models, each using weight as the outcome, and containing these sets of predictor variables:

avgfood + groupsize + area
avgfood + groupsize
groupsize + area
avgfood
area

Can you explain the relative differences in WAIC scores, using the fox DAG from the previous chapter? Be sure to pay attention to the standard error of the score differences (dSE).

# data loading and prepping
data(foxes)
data = foxes
data$area = scale( data$area )
data$food = scale( data$avgfood )
data$groupsize = scale( data$groupsize )
data$weight = scale( data$weight)

## Model for`avgfood + groupsize + area`
m1 = quap(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bFood * avgfood + bGroup * groupsize + bArea * area,
    a ~ dnorm(0, .2),
    c(bFood, bGroup, bArea) ~ dnorm(0, 5),
    sigma ~ dexp(1)
  ),
  data = data
)
# Model for`avgfood + groupsize`
m2 = quap(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bFood * avgfood + bGroup * groupsize,
    a ~ dnorm(0, .2),
    c(bFood, bGroup) ~ dnorm(0, 5),
    sigma ~ dexp(1)
  ),
  data = data
)
# Model for`groupsize + area`
m3 = quap(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bGroup * groupsize + bArea * area,
    a ~ dnorm(0, .2),
    c(bGroup, bArea) ~ dnorm(0, 5),
    sigma ~ dexp(1)
  ),
  data = data
)
# `Model for avgfood`
m4 = quap(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bFood * avgfood,
    a ~ dnorm(0, .2),
    bFood ~ dnorm(0, 5),
    sigma ~ dexp(1)
  ),
  data = data
)
# Model for `area`
m5 = quap(
  alist(
    weight ~ dnorm(mu, sigma),
    mu <- a + bArea * area,
    a ~ dnorm(0, .2),
    bArea ~ dnorm(0, 5),
    sigma ~ dexp(1)
  ),
  data = data
)
## Comparison
(model.compare=compare(m1, m2, m3, m4, m5))

##        WAIC       SE     dWAIC       dSE    pWAIC      weight
## m1 323.7983 16.03770 0.0000000        NA 3.961414 0.518084659
## m3 324.0666 16.18771 0.2682736 0.4015293 3.945774 0.453049710
## m2 330.4839 14.82790 6.6855673 6.5270897 3.105662 0.018308309
## m4 332.3905 13.78571 8.5922089 7.4951036 1.946376 0.007057100
## m5 333.7929 13.79707 9.9946241 7.4644674 2.684646 0.003500223

#DAG
library(dagitty)
library(tidygraph)
quest <- dagitty( "dag {area -> avgfood
    avgfood -> groupsize
    avgfood -> weight
    groupsize -> weight
}")

coordinates (quest) <- list(x=c(area=1,avgfood=0,groupsize=2, weight=1),
                          y=c(area=0, avgfood=1, groupsize=1, weight=2))

drawdag(quest, lwd = 2)

#Thus from the above, we can see that WAIC scores all fall well within the 99% intervals of the differences. 
# Models m1, m2, and m3 are mostly identical in their out-of-sample deviance.All three models use groupsize and two of them use area and two use avgfood. From DAG, we can see that the effect of area while adjusting for groupsize is the same as the effect of avgfood while adjusting for groupsize, because the effect of area is routed entirely through avgfood.

#Model m4 and m5 are nearly identical because these two only contain area or avgfood in isolation and all information of area onto weight must pass through avgfood.

Assignment #6

Chapter 7

Anushmita Roy Choudhury

2021-09-03

Chapter 7 - Ulysses’ Compass

Questions