Introduction to Project and Dataset

Column

Project Overview

This project uses the data collected at Lake Tahoe Basin regarding the Jeffrey pine beetle outbreak between 1991-1996. From 1991 to 1996, Jeffrey pine beetles (JPB) caused tree mortality throughout the Lake Tahoe Basin during a severe drought. The data set describes the dynamics within the Lake Tahoe Basin of a 60-acre study area with 10,722 trees followed annually and assesses patterns of JPB-caused mortality. This project uses the information in the ‘pine beetle’ dataset to analyze and predict the minimum linear distance to the nearest brood tree (DeadDist).

Jeffery Pine Beetle

Model Evaluation

The pine beetle dataset was used to predict the minimum distance to the nearest brood tree (DeadDist). The predictors used for these models were infestation severity closest to the response tree (Infes_Sever1), stand density index @ 1/20th-acre neighborhood surrounding response tree (SDI_20th), the basal area total for all infested trees within 1-acre neighborhood (BA_infest_1), and the basal area total summed for all trees within 1/2-acre neighborhood of response tree (Neigh_1/2th)).

Column

Data Variable Explanation

Variables Description
TreeDiam Tree diameter/size
Infest_sever1 Infestation severity nearest to response tree
Invest_sever2 Infestation severity nearest to response tree
Ind_DeadDist Indicator if nearest brood tree is within 50m effective distance found
DeadDist Minimum linear distance to nearest brood tree
SDI_20th Stand Density Index @ 1/20th-acre neighborhood surrounding response tree
Neigh_SDI_1/4th Stand Density Index @ 1/4th-acre neighborhood surrounding response tree
BA_20th Basal Area @ 1/20th-acre neighborhood surrounding response tree
Neigh_1/4th Basal area total summed for all trees within 1/4th-acre neighborhood of response tree
Neigh_1/2th Basal area total summed for all trees within 1/2-acre neighborhood of response tree
Neigh_1 Basal area total summed for all trees within 1-acre neighborhood of response tree
Neigh_1.5 Basal area total summed for all trees within 1.5-acre neighborhood of response tree
BA_Inf_20th Basal area total for all infested trees within 1/20th-acre neighborhood
BA_infest_1/4 Basal area total for all infested trees within 1/4th-acre neighborhood
BA_infest_1/2 Basal area total for all infested trees within 1/2-acre neighborhood
BA_infest_1 Basal area total for all infested trees within 1-acre neighborhood
BA_infest_1.5 Basal area total for all infested trees within 1.5-acre neighborhood
IND_BA_Infest_20th Binary indicator for if a response tree has any infested trees within neigborhood
IND_BA_infest_1/4th Indicator of any infested trees within 1/4th-acre neighborhood of response tree
IND_BA_infest_1/2th Indicator of any infested trees within 1/2-acre neighborhood of response tree
IND_BA_infest_1 Indicator of any infested trees within 1-acre neighborhood of response tree
IND_BA_infest_1.5 Indicator of any infested trees within 1.5-acre neighborhood of response tree

Predictor Overview

Linear Regression Model

Column

Tidy Model Fit Table

# A tibble: 5 × 5
  term          estimate std.error statistic  p.value
  <chr>            <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    7.97     0.0641      124.   0       
2 TreeDiam      -0.00549  0.00269      -2.04 4.14e- 2
3 Infest_Serv1  -0.0157   0.000901    -17.4  9.05e-67
4 SDI_20th      -0.0132   0.00164      -8.05 9.12e-16
5 `Neigh_1/2th` -0.0232   0.000513    -45.3  0       

VIP Plot

Column

Ridge Regression Model

Column

Tidy Model Fit Ridge Table

# A tibble: 5 × 3
  term         estimate penalty
  <chr>           <dbl>   <dbl>
1 (Intercept)    4.50     0.163
2 TreeDiam      -0.0488   0.163
3 Infest_Serv1  -0.256    0.163
4 SDI_20th      -0.231    0.163
5 Neigh_1/2th   -0.817    0.163

VIP Ridge Plot

Column

---
title: "HGEN 612 Project 2 "
output:
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    theme: journal
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(DT)
library(ggplot2)
library(plotly)
library(corrr)
library(emo)
library(tidyverse)
library(readxl)
library(broom)
library(car)
library(ggfortify)
library(tidymodels)
library(vip)
library(performance)
library(GGally)

pine_tbl <- read_excel("Data/Data_1993.xlsx", sheet = 1)

```

Introduction to Project and Dataset 
========================================

Column {data-width=400} 
-----------------------------------------------------------------------

### **Project Overview** 
```{r}

```

This project uses the data collected at Lake Tahoe Basin regarding the Jeffrey 
pine beetle outbreak between 1991-1996. From 1991 to 1996, Jeffrey pine beetles 
(JPB) caused tree mortality throughout the Lake Tahoe Basin during a severe drought. 
The data set describes the dynamics within the Lake Tahoe Basin of a 60-acre study 
area with 10,722 trees followed annually and assesses patterns of JPB-caused mortality. 
This project uses the information in the 'pine beetle' dataset to analyze and predict 
the minimum linear distance to the nearest brood tree (DeadDist).

### **Jeffery Pine Beetle**

```{r photo input, out.width='100%'}
knitr::include_graphics("~/Desktop/HGEN_612_Project2/Dendroctonus_ponderosae.jpg")

```

### **Model Evaluation**
The pine beetle dataset was used to predict the minimum distance to the nearest 
brood tree (DeadDist). The predictors used for these models were infestation severity 
closest to the response tree (Infes_Sever1), stand density index @ 1/20th-acre 
neighborhood surrounding response tree (SDI_20th), the basal area total for all 
infested trees within 1-acre neighborhood (BA_infest_1), and the basal area total 
summed for all trees within 1/2-acre neighborhood of response tree (Neigh_1/2th)).

Column {data-width=600} 
-----------------------------------------------------------------------

### **Data Variable Explanation**

```{r variable input, out.width='100%'}

data.frame(Variables = c("TreeDiam", "Infest_sever1", "Invest_sever2", "Ind_DeadDist", 
                         "DeadDist", "SDI_20th", "Neigh_SDI_1/4th", "BA_20th", 
                         "Neigh_1/4th", "Neigh_1/2th", "Neigh_1", "Neigh_1.5", 
                         "BA_Inf_20th", "BA_infest_1/4", "BA_infest_1/2", "BA_infest_1", 
                         "BA_infest_1.5", "IND_BA_Infest_20th", "IND_BA_infest_1/4th", 
                         "IND_BA_infest_1/2th", "IND_BA_infest_1", "IND_BA_infest_1.5"),
           
           Description = c("Tree diameter/size", 
                           "Infestation severity nearest to response tree", 
                           "Infestation severity nearest to response tree", 
                           "Indicator if nearest brood tree is within 50m effective distance found", 
                           "Minimum linear distance to nearest brood tree",
                           "Stand Density Index @ 1/20th-acre neighborhood surrounding response tree", 
                           "Stand Density Index @ 1/4th-acre neighborhood surrounding response tree",
                           "Basal Area @ 1/20th-acre neighborhood surrounding response tree", 
                           "Basal area total summed for all trees within 1/4th-acre neighborhood of response tree",
                           "Basal area total summed for all trees within 1/2-acre neighborhood of response tree", 
                           "Basal area total summed for all trees within 1-acre neighborhood of response tree", 
                           "Basal area total summed for all trees within 1.5-acre neighborhood of response tree", 
                           "Basal area total for all infested trees within 1/20th-acre neighborhood", 
                           "Basal area total for all infested trees within 1/4th-acre neighborhood", 
                           "Basal area total for all infested trees within 1/2-acre neighborhood", 
                           "Basal area total for all infested trees within 1-acre neighborhood", 
                           "Basal area total for all infested trees within 1.5-acre neighborhood", 
                           "Binary indicator for if a response tree has any infested trees within neigborhood", 
                           "Indicator of any infested trees within 1/4th-acre neighborhood of response tree", 
                           "Indicator of any infested trees within 1/2-acre neighborhood of response tree", 
                           "Indicator of any infested trees within 1-acre neighborhood of response tree", 
                           "Indicator of any infested trees within 1.5-acre neighborhood of response tree")) %>% 
  knitr::kable()


```

Predictor Overview
========================================
``` {r predictor overview}
pine_tbl_select <- pine_tbl %>% 
  select("DeadDist", "TreeDiam", "Infest_Serv1", "SDI_20th", "Neigh_1/2th")
ggpairs(pine_tbl_select)
```


Linear Regression Model
========================================

Column {data-width=400} 
-----------------------------------------------------------------------

### **Tidy Model Fit Table**
``` {r Linear Regression model build out}

pine_tbl_recipe <- 
  recipe(DeadDist ~ ., data = pine_tbl_select) %>% 
  step_sqrt(all_outcomes()) %>% 
  step_corr(all_predictors()) 

lm_model <- 
  linear_reg() %>% 
  set_engine("lm")

pine_wflow <- 
  workflow() %>% 
  add_model(lm_model) %>% 
  add_recipe(pine_tbl_recipe)

pine_fit <- 
  pine_wflow %>% 
  fit(data = pine_tbl_select)

pine_fit %>% 
  extract_fit_parsnip() %>% 
  tidy()


```

### **VIP Plot**
``` {r VIP Plot }

pine_fit %>% 
  extract_fit_parsnip() %>% 
  vip::vip()

```

Column {data-width=600} 
-----------------------------------------------------------------------

``` {r check model, out.width='100%'}

pine_fit %>% 
  extract_fit_parsnip() %>% 
  check_model()

```

Ridge Regression Model
========================================

Column {data-width=400} 
-----------------------------------------------------------------------
``` {r ridge set up, include=FALSE}

pine_split <- initial_split(pine_tbl_select)
pine_train <- training(pine_split)
pine_test <- testing(pine_split)

ridge_mod <-
  linear_reg(mixture = 0, penalty = 0.1629751) %>%  
  set_engine("glmnet")

ridge_mod %>% 
  translate()
  
pine_rec <- pine_train %>% 
  recipe(DeadDist ~ ., data = pine_tbl_select) %>% 
  step_sqrt(all_outcomes()) %>% 
  step_corr(all_predictors()) %>% 
  step_normalize(all_numeric(), -all_outcomes()) %>% 
  step_zv(all_numeric(), -all_outcomes()) #%>% 


pine_ridge_wflow <- 
  workflow() %>% 
  add_model(ridge_mod) %>% 
  add_recipe(pine_rec)

pine_ridge_wflow

pine_ridge_fit <- 
  pine_ridge_wflow %>% 
  fit(data = pine_train)



```

### **Tidy Model Fit Ridge Table**

``` {r tidy table ridge}

pine_ridge_fit %>% 
  extract_fit_parsnip() %>% 
  tidy()

```

### **VIP Ridge Plot**
``` {r VIP Plot ridge}

pine_ridge_fit %>% 
  extract_fit_parsnip() %>% 
  vip::vip()

```

Column {data-width=600} 
-----------------------------------------------------------------------

``` {r check model ridge, out.width='100%'}


```