ST406 - Bayesian LRM

P.I.N.Kehelbedda

2021-01-14

1 Baye’s Theorem

\[\overbrace{p(\theta/D)}^{Posterior}=\frac{\overbrace{p(D/\theta)}^{Likelihood}.\overbrace{p(\theta)}^{Prior}}{\underbrace{p(D)}_{Evidence}}\]

2 Data

SolarRadPrediction <- read_csv("Data/SolarRadPrediction.csv", 
    col_types = cols(Data = col_datetime(format = "%m/%d/%Y %H:%M:%S %p"), 
        Time = col_time(format = "%H:%M:%S"), 
        TimeSunRise = col_time(format = "%H:%M:%S"), 
        TimeSunSet = col_time(format = "%H:%M:%S")))

df <- SolarRadPrediction[, -1]
colnames(df)[1] <- "Date"
colnames(df)[7] <- "WindDirection"
colnames(df)[8] <- "WindSpeed"

## Select numerical data for only model fitting part
df1 <- df %>%
  select(Radiation, Temperature, Pressure, Humidity, WindDirection, WindSpeed)
summary(df1)
##    Radiation        Temperature      Pressure        Humidity     
##  Min.   :   1.11   Min.   :34.0   Min.   :30.19   Min.   :  8.00  
##  1st Qu.:   1.23   1st Qu.:46.0   1st Qu.:30.40   1st Qu.: 56.00  
##  Median :   2.66   Median :50.0   Median :30.43   Median : 85.00  
##  Mean   : 207.12   Mean   :51.1   Mean   :30.42   Mean   : 75.02  
##  3rd Qu.: 354.24   3rd Qu.:55.0   3rd Qu.:30.46   3rd Qu.: 97.00  
##  Max.   :1601.26   Max.   :71.0   Max.   :30.56   Max.   :103.00  
##  WindDirection      WindSpeed     
##  Min.   :  0.09   Min.   : 0.000  
##  1st Qu.: 82.23   1st Qu.: 3.370  
##  Median :147.70   Median : 5.620  
##  Mean   :143.49   Mean   : 6.244  
##  3rd Qu.:179.31   3rd Qu.: 7.870  
##  Max.   :359.95   Max.   :40.500
str(df1)
## tibble [32,686 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Radiation    : num [1:32686] 1.21 1.21 1.23 1.21 1.17 1.21 1.2 1.24 1.23 1.21 ...
##  $ Temperature  : num [1:32686] 48 48 48 48 48 48 49 49 49 49 ...
##  $ Pressure     : num [1:32686] 30.5 30.5 30.5 30.5 30.5 ...
##  $ Humidity     : num [1:32686] 59 58 57 60 62 64 72 71 80 85 ...
##  $ WindDirection: num [1:32686] 177 177 159 138 105 ...
##  $ WindSpeed    : num [1:32686] 5.62 3.37 3.37 3.37 5.62 5.62 6.75 5.62 4.5 4.5 ...
  • Looking at from the summary data we can see that there is no missing value problem for the data set.

3 Model Fitting

  • From our data we are going to fit a model to predict Radiation. So we take Radiation as out dependent variable and others as independent variables.

3.1 Linear Regression Model

#pairs(df1)
model_freq <- lm(Radiation ~ ., data = df1)
summary(model_freq)
## 
## Call:
## lm(formula = Radiation ~ ., data = df1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -752.37 -132.65  -20.19  102.98 1121.15 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.098e+04  6.959e+02  30.153  < 2e-16 ***
## Temperature    3.837e+01  2.064e-01 185.930  < 2e-16 ***
## Pressure      -7.471e+02  2.290e+01 -32.622  < 2e-16 ***
## Humidity      -2.691e-01  4.844e-02  -5.555 2.79e-08 ***
## WindDirection -2.694e-01  1.462e-02 -18.432  < 2e-16 ***
## WindSpeed      7.875e+00  3.416e-01  23.053  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 208.2 on 32680 degrees of freedom
## Multiple R-squared:  0.5659, Adjusted R-squared:  0.5659 
## F-statistic:  8522 on 5 and 32680 DF,  p-value: < 2.2e-16
tidy(model_freq)
term estimate std.error statistic p.value
(Intercept) 20983.8434973 695.9213762 30.152607 0
Temperature 38.3721364 0.2063796 185.929918 0
Pressure -747.0686858 22.9010571 -32.621581 0
Humidity -0.2691005 0.0484402 -5.555311 0
WindDirection -0.2694445 0.0146181 -18.432218 0
WindSpeed 7.8748844 0.3416022 23.052793 0
  • From the model summary we can see all the regressors are significant (by p-value).

3.1.1 Model Selection

step(lm(Radiation ~ 1, data = df1), 
     direction = "forward", 
     scope = ~ Temperature+Pressure+Humidity+WindDirection+WindSpeed)
## Start:  AIC=376248.1
## Radiation ~ 1
## 
##                 Df  Sum of Sq        RSS    AIC
## + Temperature    1 1762032872 1500033533 350857
## + WindDirection  1  173049157 3089017249 374468
## + Humidity       1  166865010 3095201395 374534
## + Pressure       1   46206279 3215860126 375784
## + WindSpeed      1   17683387 3244383019 376072
## <none>                        3262066405 376248
## 
## Step:  AIC=350857.2
## Radiation ~ Temperature
## 
##                 Df Sum of Sq        RSS    AIC
## + Pressure       1  43450981 1456582552 349898
## + WindSpeed      1  30563243 1469470290 350186
## + WindDirection  1   5501379 1494532154 350739
## + Humidity       1    986446 1499047087 350838
## <none>                       1500033533 350857
## 
## Step:  AIC=349898.4
## Radiation ~ Temperature + Pressure
## 
##                 Df Sum of Sq        RSS    AIC
## + WindSpeed      1  25312425 1431270128 349327
## + WindDirection  1  11943121 1444639431 349631
## + Humidity       1   3976752 1452605800 349811
## <none>                       1456582552 349898
## 
## Step:  AIC=349327.4
## Radiation ~ Temperature + Pressure + WindSpeed
## 
##                 Df Sum of Sq        RSS    AIC
## + WindDirection  1  13998169 1417271959 349008
## + Humidity       1    615021 1430655107 349315
## <none>                       1431270128 349327
## 
## Step:  AIC=349008.2
## Radiation ~ Temperature + Pressure + WindSpeed + WindDirection
## 
##            Df Sum of Sq        RSS    AIC
## + Humidity  1   1337143 1415934816 348979
## <none>                  1417271959 349008
## 
## Step:  AIC=348979.3
## Radiation ~ Temperature + Pressure + WindSpeed + WindDirection + 
##     Humidity
## 
## Call:
## lm(formula = Radiation ~ Temperature + Pressure + WindSpeed + 
##     WindDirection + Humidity, data = df1)
## 
## Coefficients:
##   (Intercept)    Temperature       Pressure      WindSpeed  WindDirection  
##    20983.8435        38.3721      -747.0687         7.8749        -0.2694  
##      Humidity  
##       -0.2691
  • From the model selection choose the lowest AIC value model. So the full model is the best model.

3.1.2 Linear Regression Assumption check ?

par(mfrow = c(2, 2))
plot(model_freq)

  • From above fitted plots we can convence that the full model has satisfied the assumptions.

3.2 Bayesian Regression

  • Here we use the function stan_glm from the rstanarm package.
  • In this function,
    • family: default use gaussian distribution.
    • prior: default normal prior is used. If we need a flat uniform prior we put it to be NULL.
    • prior_intercept: can be normal, student_t, or cauchy. Here also if we need a flat uniform prior we put it to be NULL.
# Fitting bayesian model & remember 1234
bayesian_model <- stan_glm(Radiation ~ ., data = df1, seed = 1234) 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.121 seconds (Warm-up)
## Chain 1:                3.007 seconds (Sampling)
## Chain 1:                3.128 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.168 seconds (Warm-up)
## Chain 2:                3 seconds (Sampling)
## Chain 2:                3.168 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.142 seconds (Warm-up)
## Chain 3:                2.998 seconds (Sampling)
## Chain 3:                3.14 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.135 seconds (Warm-up)
## Chain 4:                2.985 seconds (Sampling)
## Chain 4:                3.12 seconds (Total)
## Chain 4:
#summary(bayesian_model)
print(bayesian_model, digits = 3)
## stan_glm
##  family:       gaussian [identity]
##  formula:      Radiation ~ .
##  observations: 32686
##  predictors:   6
## ------
##               Median    MAD_SD   
## (Intercept)   20998.757   704.571
## Temperature      38.373     0.211
## Pressure       -747.457    23.291
## Humidity         -0.269     0.050
## WindDirection    -0.270     0.014
## WindSpeed         7.863     0.338
## 
## Auxiliary parameter(s):
##       Median  MAD_SD 
## sigma 208.152   0.788
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
  • From this output, Median estimator is the median computed from the MCMC simulation.
  • Also MAD_SD is median absolute deviation computed from the same simulation.
  • To know more we plot bayesian plots.
# Temperature
p1 <- mcmc_dens(x = bayesian_model, pars = c("Temperature")) +
  vline_at(v = 38.373, col = "red")
# Pressure
p2 <- mcmc_dens(x = bayesian_model, pars = c("Pressure")) +
  vline_at(v = -747.457, col = "red")
# Humidity
p3 <- mcmc_dens(x = bayesian_model, pars = c("Humidity")) +
  vline_at(v = -0.269, col = "red")
# WindDirection
p4 <- mcmc_dens(x = bayesian_model, pars = c("WindDirection")) +
  vline_at(v = -0.270, col = "red")
# WindSpeed
p5 <- mcmc_dens(x = bayesian_model, pars = c("WindSpeed")) +
  vline_at(v = 7.863, col = "red")

ggarrange(p1,p2,p3,p4,p5)

  • Point estimates of variables falls on the median of this distribution.

3.2.1 Evaluate Model Parameter

describe_posterior(bayesian_model)
Parameter Median CI CI_low CI_high pd ROPE_CI ROPE_low ROPE_high ROPE_Percentage Rhat ESS
1 (Intercept) 20998.7566265 89 19939.2044911 22192.5496212 1 89 -31.59164 31.59164 0 0.9995457 4587.185
4 Temperature 38.3730415 89 38.0401060 38.7029959 1 89 -31.59164 31.59164 0 0.9992385 4383.438
3 Pressure -747.4572219 89 -786.9067070 -712.7380389 1 89 -31.59164 31.59164 0 0.9995353 4597.143
2 Humidity -0.2692544 89 -0.3420957 -0.1841775 1 89 -31.59164 31.59164 1 0.9992849 4391.914
5 WindDirection -0.2698797 89 -0.2924052 -0.2459000 1 89 -31.59164 31.59164 1 1.0001522 4880.826
6 WindSpeed 7.8633510 89 7.3266028 8.4053666 1 89 -31.59164 31.59164 1 0.9995283 5318.336
  • In above description,
    • Median: Median estimator is the median computed from the MCMC simulation.
    • 89% CI: Creadible Interval, used to quantify the uncertainty about the regression coefficients. with 89% probability (given the data) that a coefficient lies above the CI_low value and under CI_high value.
    • pd: Probability of Direction, which is the probability that the effect goes to the positive or to the negative direction, and it is considered as the best equivalent for the p-value.
    • 89% ROPE: Region of Practice Equavilance.
    • Rhat: scale reduction factor \(\hat R\).
    • ESS: effective sample size.
poster <- get_parameters(bayesian_model)
print(purrr::map_dbl(poster,median),digits = 3)
##   (Intercept)   Temperature      Pressure      Humidity WindDirection 
##     20998.757        38.373      -747.457        -0.269        -0.270 
##     WindSpeed 
##         7.863
print(purrr::map_dbl(poster, map_estimate),digits = 3)
##   (Intercept)   Temperature      Pressure      Humidity WindDirection 
##     20973.002        38.374      -746.499        -0.272        -0.271 
##     WindSpeed 
##         7.841
print(purrr::map_dbl(poster, mean),digits = 3)
##   (Intercept)   Temperature      Pressure      Humidity WindDirection 
##     20996.666        38.373      -747.488        -0.269        -0.270 
##     WindSpeed 
##         7.869
  • As we see the values are closer to each other due to the like normality of the distribution of the posteriors where all the central statistics (mean, median, mode) are closer to each other.
p1 <- mcmc_dens(bayesian_model, pars=c("Temperature"))+
  vline_at(median(poster$Temperature), col="red")+
  vline_at(mean(poster$Temperature), col="yellow")+
  vline_at(map_estimate(poster$Temperature), col="green")

p2 <- mcmc_dens(bayesian_model, pars=c("Pressure"))+
  vline_at(median(poster$Pressure), col="red")+
  vline_at(mean(poster$Pressure), col="yellow")+
  vline_at(map_estimate(poster$Pressure), col="green")

p3 <- mcmc_dens(bayesian_model, pars=c("Humidity"))+
  vline_at(median(poster$Humidity), col="red")+
  vline_at(mean(poster$Humidity), col="yellow")+
  vline_at(map_estimate(poster$Humidity), col="green")

p4 <- mcmc_dens(bayesian_model, pars=c("WindDirection"))+
  vline_at(median(poster$WindDirection), col="red")+
  vline_at(mean(poster$WindDirection), col="yellow")+
  vline_at(map_estimate(poster$WindDirection), col="green")

p5 <- mcmc_dens(bayesian_model, pars=c("WindSpeed"))+
  vline_at(median(poster$WindSpeed), col="red")+
  vline_at(mean(poster$WindSpeed), col="yellow")+
  vline_at(map_estimate(poster$WindSpeed), col="green")

ggarrange(p1,p2,p3,p4,p5)

  • As expected they are approximately on top of each other.

3.3 Bayesian Inferences

  • We need to check the significance of Bayesian Regression coefficients.
  • That is done by checking whether the corresponding credible interval contains zero or not, if no then this coefficient is significant.
  • Now let’s see significance of out model coefficients.
hdi(bayesian_model)
Parameter CI CI_low CI_high Effects Component
1 (Intercept) 89 19939.2044911 22192.5496212 fixed conditional
4 Temperature 89 38.0401060 38.7029959 fixed conditional
3 Pressure 89 -786.9067070 -712.7380389 fixed conditional
2 Humidity 89 -0.3420957 -0.1841775 fixed conditional
5 WindDirection 89 -0.2924052 -0.2459000 fixed conditional
6 WindSpeed 89 7.3266028 8.4053666 fixed conditional
eti(bayesian_model)
Parameter CI CI_low CI_high Effects Component
1 (Intercept) 89 19863.9719033 22132.6123101 fixed conditional
4 Temperature 89 38.0404871 38.7037444 fixed conditional
3 Pressure 89 -784.6436591 -710.2065951 fixed conditional
2 Humidity 89 -0.3492317 -0.1898372 fixed conditional
5 WindDirection 89 -0.2929182 -0.2463227 fixed conditional
6 WindSpeed 89 7.3341335 8.4143058 fixed conditional
  • From the both result we can see all the coefficients are significant.

  • NOTE: We got the satisfied results duo to the normal prior assumption. But in real world it is less often to be sure about the normality assumption.

  • Another way of testing significance is it is less often to be sure about the normality assumption.

rope(poster$Temperature)
CI ROPE_low ROPE_high ROPE_Percentage
89 -0.1 0.1 0
rope(poster$Pressure)
CI ROPE_low ROPE_high ROPE_Percentage
89 -0.1 0.1 0
rope(poster$Humidity)
CI ROPE_low ROPE_high ROPE_Percentage
89 -0.1 0.1 0
rope(poster$WindDirection)
CI ROPE_low ROPE_high ROPE_Percentage
89 -0.1 0.1 0
rope(poster$WindSpeed)
CI ROPE_low ROPE_high ROPE_Percentage
89 -0.1 0.1 0
rope(poster$`(Intercept)`)
CI ROPE_low ROPE_high ROPE_Percentage
89 -0.1 0.1 0
rope_range(bayesian_model)
## [1] -31.59164  31.59164
  • For all the variables including intercept, almost all the credible interval (HDI) is outside the ROPE range, which means that coefficient is highly significant.

3.4 PD and P-value

  • Interest in checking the direction of the coefficient.
  • This done using pd statistic.
  • If the pd is high value means that the associated effect is concentrated on the same side as the median.
  • In our model almost all the posteriors are on same side.
df2 <- select(tidy(model_freq), c(term,p.value))
df2$p.value <- round(df2$p.value, digits = 3)
df3 <- 1 - purrr::map_dbl(poster, p_direction)
df4 <- cbind(df2,df3)
df4
term p.value df3
(Intercept) (Intercept) 0 0
Temperature Temperature 0 0
Pressure Pressure 0 0
Humidity Humidity 0 0
WindDirection WindDirection 0 0
WindSpeed WindSpeed 0 0

4 Questions I got ???

  • When we fitting the bayesian model, how we put prior, family detail.
    • i.e. How we know about the family, prior details.