Overview

Row

Overview

Using the predictors listed and described below, the objective of this analysis was to find the best model in predicting the minimum linear distance to the nearest brood tree (DeadDist).

Predictors Description
TreeDiam The diameter/size of the tree.
Infest_Serv1 Infestation severity nearest to response tree.
SDI_20th Stand Density Index 1/20th-acre neighborhood surrounding response tree.
BA_20th Basal Area 1/20th-acre neighborhood surrounding response tree.

After both a Ridge and Lasso Regression, it was determined that the variables with the most importance to the prediction of DeadDist were TreeDiam, Infest_Serv1, and BA_20th.

Additionally, both models determined similar associations between the predictors and DeadDist.

Row

Distance to the Nearest Brood Tree

Model Fits

Both models had incredibly similar RMSE scores, therefore either are useful models for predicting the distance to the nearest brood tree.

Ridge

Row

Model Description

Ridge Regression is a type of multiple linear regression model that enforces a penalty on the coefficient of the predictors based on their size. All of the coefficients are shrunk towards each other and towards zero which can fix poorly determined coefficients.

Important Predictors

Row

Model Evaluation

Metric Estimate
RMSE 1.74
RSQ 0.144

This model was evaluated by two different metrics: RMSE and RSQ. The goodness of the fit is based off of RMSE.

Model Results

Predictor Estimate
TreeDiam 0.0139
Infest_Serv1 -0.147
BA_20th -0.712

The results of the Ridge Regression demonstrated that BA_20th and Infest_Serv1 had a negative association with DeadDist while TreeDiam had a positive association.

Lasso

Row

Model Description

Lasso Regression is similar to Ridge Regression in the way that Lasso will also enforce a penalty on the predictor’s coefficients. However, Lasso Regression tends to lead to a more streamlined model.

This model was evaluated by two different metrics: RMSE and RSQ. The goodness of the fit is based off of RMSE. The figure below demonstrates the effect of the penalty enforced by Lasso Regression on both the RMSE and the RSQ

Important Predictors

Row

Model Evaluation

Model Results

Predictor Estimate
TreeDiam 0.0167
Infest_Serv1 -0.161
BA_20th -0.771

The results of the Lasso Regression demonstrated that BA_20th and Infest_Serv1 had a negative association with DeadDist while TreeDiam had a positive association.

---
title: "Project 2"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    theme: flatly
    logo: 
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(readxl)
library(broom)
library(car)
library(ggfortify)
library(tidymodels)
library(vip)
library(performance)
library(plotly)
library(GGally)
library(corrr)
library(DT)

pine_trees <- read_xlsx("hgen-612/data/Data_1993.xlsx")
```

Overview
=================

Row {data-width=500}
-----------------------------------------------------------------------

### Overview
Using the predictors listed and described below, the objective of this analysis was to
find the best model in predicting the minimum linear distance to the nearest brood tree
(DeadDist).

```{r}

options(knitr.kable.NA = '')
data.frame(Predictors = c("TreeDiam", "Infest_Serv1", "SDI_20th", "BA_20th" ),
           Description = c("The diameter/size of the tree.", 
           "Infestation severity nearest to response tree.", 
           "Stand Density Index 1/20th-acre neighborhood surrounding response tree.", 
           "Basal Area 1/20th-acre neighborhood surrounding response tree.")) |>
  knitr::kable()

#I made this table with the help of Stack Overflow https://stackoverflow.com/questions/76996026/manual-table-in-r-markdown

```
After both a Ridge and Lasso Regression, it was determined that the variables 
with the most importance to the prediction of DeadDist were TreeDiam, Infest_Serv1,
and BA_20th.

Additionally, both models determined similar associations between the predictors
and DeadDist.

Row {data-width=350}
-----------------------------------------------------------------------

### Distance to the Nearest Brood Tree

```{r}
ggplot(pine_trees, aes(DeadDist)) +
  geom_histogram(color = "black", fill = "#5f71cd") +
  theme_bw() +
  xlab("Distance to Nearest Brood Tree") +
  ylab("Number of Trees") +
  ggtitle("Distribution of the Distance to the Nearest Brood Tree")
```

```{r model codes, include=FALSE}

#Ridge
pine_tree_split <- initial_split(pine_trees)
pine_tree_train <- training(pine_tree_split)
pine_tree_test <- testing(pine_tree_split)

ridge_model <-
  linear_reg(mixture = 0, penalty = 0.1629751) %>% 
  set_engine("glmnet")

ridge_model %>% 
  translate()

pine_tree_recipe <- pine_tree_train %>% 
  recipe(DeadDist ~ TreeDiam + Infest_Serv1 + SDI_20th + BA_20th) %>% 
  step_sqrt(all_outcomes()) %>% 
  step_corr(all_predictors()) %>% 
  step_normalize(all_numeric(), -all_outcomes()) %>% 
  step_zv(all_numeric(), -all_outcomes())
  
ridge_workflow <-
  workflow() %>% 
  add_model(ridge_model) %>% 
  add_recipe(pine_tree_recipe)

ridge_fit <-
  ridge_workflow %>% 
  fit(data = pine_tree_train)

ridge_fit %>% 
  extract_fit_parsnip() %>% 
  tidy()

ridge_fit %>% 
  extract_preprocessor()

ridge_fit %>% 
  extract_spec_parsnip()

last_fit(ridge_workflow, pine_tree_split) %>% 
  collect_metrics()

#Lasso
set.seed(1234)
pine_tree_boot <- bootstraps(pine_tree_train)

lamda_grid <- grid_regular(penalty(), levels = 50)

lasso_model <- 
  linear_reg(mixture = 1, penalty = tune ()) %>% 
  set_engine("glmnet")

lasso_model %>% 
  translate()

lasso_workflow <-
  workflow() %>% 
  add_model(lasso_model) %>% 
  add_recipe(pine_tree_recipe)

set.seed(2026)
lasso_grid <- tune_grid(lasso_workflow, 
                        resamples = pine_tree_boot, 
                        grid = lamda_grid)

lasso_grid %>% 
  collect_metrics()

lowest_rmse <- lasso_grid %>% 
  select_best(metric = "rmse")

final_model <- finalize_workflow(lasso_workflow,
                                 lowest_rmse)

final_model %>% 
  fit(pine_tree_train) %>% 
  extract_fit_parsnip() %>% 
  tidy()

last_fit(final_model,
         pine_tree_split) %>%
  collect_metrics()

```

### Model Fits

```{r}

temp <- last_fit(ridge_workflow, pine_tree_split) %>% 
  collect_metrics()

rmse_ridge <- temp$.estimate

gauge(rmse_ridge, min = 0, max = 1.8)

temp_lasso <- last_fit(final_model,
                       pine_tree_split) %>%
  collect_metrics()

rmse_lasso <- temp$.estimate

gauge(rmse_lasso, min = 0, max = 1.8)

```
Both models had incredibly similar RMSE scores, therefore either are useful
models for predicting the distance to the nearest brood tree.

Ridge
=================

Row {data-width=650}
-----------------------------------------------------------------------

### Model Description

Ridge Regression is a type of multiple linear regression model that enforces
a penalty on the coefficient of the predictors based on their size.
All of the coefficients are shrunk towards each other and towards zero which
can fix poorly determined coefficients. 

```{r, include=FALSE}

###Definitions come from Dr. Smirnova's lectures on Multiple regression
```

### Important Predictors
```{r, figure.length = 200}

ridge_fit %>% 
   fit(pine_tree_train) %>% 
   extract_fit_parsnip() %>% 
   vip::vip()

```


Row {data-width=350}
----------------------------------------------------------------------

### Model Evaluation

```{r}
options(knitr.kable.NA = '')
data.frame(Metric = c("RMSE", "RSQ"),
           Estimate = c("1.74", 
           "0.144")) |>
  knitr::kable()
```
This model was evaluated by two different metrics: RMSE and RSQ. The goodness of
the fit is based off of RMSE.

### Model Results

```{r}

options(knitr.kable.NA = '')
data.frame(Predictor = c("TreeDiam", "Infest_Serv1", "BA_20th" ),
           Estimate = c("0.0139", 
           "-0.147", 
           "-0.712")) |>
  knitr::kable()


```
The results of the Ridge Regression demonstrated that BA_20th and Infest_Serv1
had a negative association with DeadDist while TreeDiam had a positive association.

Lasso
=================

Row {data-width=650}
-----------------------------------------------------------------------

### Model Description
Lasso Regression is similar to Ridge Regression in the way that Lasso will also
enforce a penalty on the predictor's coefficients. However, Lasso Regression tends to lead to a more streamlined model.

This model was evaluated by two different metrics: RMSE and RSQ. The goodness of
the fit is based off of RMSE.
The figure below demonstrates the effect of the penalty enforced by Lasso Regression 
on both the RMSE and the RSQ


```{r, include=FALSE}

###Definitions come from Dr. Smirnova's lectures on Multiple regression
```

### Important Predictors
```{r}
final_model %>% 
  fit(pine_tree_train) %>% 
  extract_fit_parsnip() %>% 
  vip::vip()
```

Row {data-width=350}
-----------------------------------------------------------------------

### Model Evaluation
```{r, figure.length = 12}
lasso_grid %>%
  collect_metrics() %>%
  ggplot(aes(penalty, mean, color = .metric)) +
  geom_errorbar(aes(
    ymin = mean - std_err,
    ymax = mean + std_err
  ),
  alpha = 0.5
  ) +
  geom_line(size = 1.5) +
  facet_wrap(~.metric, scales = "free", nrow = 2) +
  scale_x_log10() +
  theme(legend.position = "none")
```

### Model Results

```{r}

options(knitr.kable.NA = '')
data.frame(Predictor = c("TreeDiam", "Infest_Serv1", "BA_20th" ),
           Estimate = c("0.0167", 
           "-0.161", 
           "-0.771")) |>
  knitr::kable()

```
The results of the Lasso Regression demonstrated that BA_20th and Infest_Serv1
had a negative association with DeadDist while TreeDiam had a positive association.