Assignment #6

Questions

7E1. State the three motivating criteria that define information entropy. Try to express each in your own words.

#1) Uncertainty should be measured with a continuous scale of equal intervals 
#2) capture the size of the possibility space is the value scales with the number of possible outcomes
#3) be additive to independent events such that it does not matter how the events are divided

7E2. Suppose a coin is weighted such that, when it is tossed and lands on a table, it comes up heads 70% of the time. What is the entropy of this coin?

library(rethinking)
data(Howell1)
p <- c(0.7, 1 - 0.7)
(H <- -sum(p * log(p)))

## [1] 0.6108643

# Entropy: 0.6108643

7E3. Suppose a four-sided die is loaded such that, when tossed onto a table, it shows “1” 20%, “2” 25%, “3” 25%, and “4” 30% of the time. What is the entropy of this die?

p <- c(0.20, 0.25, 0.25, 0.30)
(H <- -sum(p * log(p)))

## [1] 1.376227

# Entropy: 1.376227

7E4. Suppose another four-sided die is loaded such that it never shows “4”. The other three sides show equally often. What is the entropy of this die?

p <- c(1/3, 1/3, 1/3)
(H <- -sum(p * log(p)))

## [1] 1.098612

# entropy:1.098612

7M1. Write down and compare the definitions of AIC and WAIC. Which of these criteria is most general? Which assumptions are required to transform the more general criterion into a less general one?

#AIC = D_train + 2p
#Dtrain = in-sample training deviance  
#p = number of free parameters estimated in the model

#WAIC is defined as −2(lppd−pWAIC) where 
#Pr(yi) = average likelihood of observation i in the training sample and 
#V(yi) = variance in log-likelihood for observation i in the training sample.

# Therefore WAIC a more general one.

7M2. Explain the difference between model selection and model comparison. What information is lost under model selection?

#In model selection, we pick the model with best information criteria value.
#In model comparison, the DIC or WAIC are used to make a posterior predictive distribution that combines all models. It helps understand casual relationships and identification of confounds in the different models.

7M3. When comparing models with an information criterion, why must all models be fit to exactly the same observations? What would happen to the information criterion values, if the models were fit to different numbers of observations? Perform some experiments, if you are not sure.

str(Howell1)

## 'data.frame':    544 obs. of  4 variables:
##  $ height: num  152 140 137 157 145 ...
##  $ weight: num  47.8 36.5 31.9 53 41.3 ...
##  $ age   : num  63 63 65 41 51 35 32 27 19 54 ...
##  $ male  : int  1 0 0 1 0 1 0 1 0 1 ...

d <- Howell1[complete.cases(Howell1), ]
d_500 <- d[sample(1:nrow(d), size = 500, replace = FALSE), ]
d_400 <- d[sample(1:nrow(d), size = 400, replace = FALSE), ]
d_300 <- d[sample(1:nrow(d), size = 300, replace = FALSE), ]
m_500 <- map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * log(weight)
  ),
  data = d_500,
  start = list(a = mean(d_500$height), b = 0, sigma = sd(d_500$height))
)
m_400 <- map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * log(weight)
  ),
  data = d_400,
  start = list(a = mean(d_400$height), b = 0, sigma = sd(d_400$height))
)
m_300 <- map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * log(weight)
  ),
  data = d_300,
  start = list(a = mean(d_300$height), b = 0, sigma = sd(d_300$height))
)
(model.compare <- compare(m_500, m_400, m_300))

## Warning in compare(m_500, m_400, m_300): Different numbers of observations found for at least two models.
## Model comparison is valid only for models fit to exactly the same observations.
## Number of observations for each model:
## m_500 500 
## m_400 400 
## m_300 300

## Warning in ic_ptw1 - ic_ptw2: longer object length is not a multiple of shorter
## object length

## Warning in ic_ptw1 - ic_ptw2: longer object length is not a multiple of shorter
## object length

## Warning in ic_ptw1 - ic_ptw2: longer object length is not a multiple of shorter
## object length

##           WAIC       SE     dWAIC      dSE    pWAIC        weight
## m_300 1845.814 29.21955    0.0000       NA 3.551442  1.000000e+00
## m_400 2431.522 27.47776  585.7087 42.92657 2.981206 6.530843e-128
## m_500 3062.800 35.60809 1216.9864 54.34028 3.306391 5.429541e-265

# Number of observation increases in WAIC

7M4. What happens to the effective number of parameters, as measured by PSIS or WAIC, as a prior becomes more concentrated? Why? Perform some experiments, if you are not sure.

d <- Howell1[complete.cases(Howell1), ]
d$height.log <- log(d$height)
d$height.log.z <- (d$height.log - mean(d$height.log)) / sd(d$height.log)
d$weight.log <- log(d$weight)
d$weight.log.z <- (d$weight.log - mean(d$weight.log)) / sd(d$weight.log)
m_wide <- map(
  alist(
    height.log.z ~ dnorm(mu, sigma),
    mu <- a + b * weight.log.z,
    a ~ dnorm(0, 10),
    b ~ dnorm(1, 10),
    sigma ~ dunif(0, 10)
  ),
  data = d
)
m_narrow <- map(
  alist(
    height.log.z ~ dnorm(mu, sigma),
    mu <- a + b * weight.log.z,
    a ~ dnorm(0, 0.10),
    b ~ dnorm(1, 0.10),
    sigma ~ dunif(0, 1)
  ),
  data = d
)
WAIC(m_wide, refresh = 0)

##        WAIC     lppd  penalty  std_err
## 1 -102.5687 55.66558 4.381244 36.36683

WAIC(m_narrow, refresh = 0)

##        WAIC    lppd  penalty  std_err
## 1 -103.0307 55.5743 4.058923 36.30458

# As priors become more concentrated, PWAIC decreases.

7M5. Provide an informal explanation of why informative priors reduce overfitting.

# because informative priors constrain the flexibility of the model; they make it less likely for extreme parameter values to be assigned high posterior probability. They reduce overfitting by forcing the model to learn less from the sample data.

7M6. Provide an informal explanation of why overly informative priors result in underfitting.

#This is because overly informative priors constrain the flexibility of the model too much. Thus they make it less likely for "correct" parameter values to be assigned high posterior probability.

Assignment #6

Si Min Conny Chan

2021-07-12

Chapter 7 - Ulysses’ Compass

Questions