About the vairable

A deciduous forest is defined as a forest that is mostly field with trees that lose their leaves seasonally. Deciduous Forest are essential to humans well-being since they help clean air and reduce carbon dioxide. Also, deciduous forest prevents diseases, like asthma from forming. One that in the Deciduous Forest can absorb 10 pounds of air pollutants a year as well as create nearly 260 pounds of fresh oxygen. It must not go unmentioned most of the trees produced in Deciduous Forest can be used to develop homes too. Lastly, as seen in the marietta.edu website, Deciduous Forest can act as an indicator for the health of the globe. In this project, we will be Deciduous Forest will act as my dependent variable and the counties across the United States is my independent variable.

About the Dataset

To complete this study, we will be using the 2011 Environmental summaries dataset from the social explorer website. Our variable of interest in this dataset is the Area Of Land Cover Class 41, Deciduous Forest variable. The link for this variable can be found using this link https://www.socialexplorer.com/tables/NHS2011 .

Importanting map for display

library(sf)
map <- st_read("Desktop/tl_2016_us_county/tl_2016_us_county.shp",stringsAsFactors = FALSE)
## Reading layer `tl_2016_us_county' from data source `/Users/ariel_rosario_jr./Desktop/tl_2016_us_county/tl_2016_us_county.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3233 features and 17 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44106
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

Preparing Data for Analysis

library(readr)
library(tidyverse)
library(sf)
library(tmap)
library(tigris)
library(spdep)

options(tigris_use_cache = TRUE)
options(tigris_progress_bar = FALSE)
options(tidycensus_progress_bar = FALSE)

Enviormental_Data  <- read_csv("Desktop/County_Data.csv")
Enviormental_Data <- Enviormental_Data%>%
  rename("GEOID"="FIPS")
Enviormental_Data$GEOID=as.integer(Enviormental_Data$GEOID)
map$GEOID=as.integer(map$GEOID)
mergedData=left_join(map,Enviormental_Data,by="GEOID")

UnitedStates_map19 =mergedData %>%
  filter(STATEFP != "02") %>%
  filter(STATEFP != "15") %>%
  filter(STATEFP != "60") %>%
  filter(STATEFP != "66") %>%
  filter(STATEFP != "69") %>%
  filter(STATEFP != "72") %>%
  filter(STATEFP != "78")%>%
  mutate(trees=as.integer(`Area Of Land Cover Class 41, Deciduous Forest`))

Mapping Deciduous Forest across USA Counties

tm_shape(UnitedStates_map19, projection = 2163) + 
  tm_fill("trees",palette="BuGn",midpoint=10,border.col = "grey", border.alpha = .3,title='Deciduous Forest')+
  tm_borders(lwd = .28, col = "black", alpha = 1)+
  tm_layout(panel.labels=("Amount of Deciduous Forest in the United States"),legend.position = c("left","bottom"))

The map above contains information the amount of Deciduous Forest in miles across all of the counties in the United States. As seen in the map, most of the United States Deciduous Forest is located on the east coast. Also, the more west one travels, the less Deciduous Forest they will probably see. However, it must not go unmentioned that this map may be difficult to read since a lot of the colors are on the same color palette. With this notion in mind, we will change the palette of the map above so that is can be easier to see distinguish which counties have the most and least Deciduous Forest.

ReMapping Deciduous Forest across USA Counties

tm_shape(UnitedStates_map19, projection = 2163) + 
  tm_fill("trees",palette="RdBu",midpoint=10,border.col = "grey", border.alpha = .3,title='Deciduous Forest')+
  tm_borders(lwd = .28, col = "black", alpha = 1)+
  tm_layout(panel.labels=("Amount of Deciduous Forest in the United States"),legend.position = c("left","bottom"))

This map has the same information as the map above. However, it does not have the same forest palette as the map above. If you direct your attention to the light and darker blue sections of the map, most of these shades are located on the east coast. Interestingly enough, as a result of the new color palette, we notice that most of the United States do not have Deciduous Forest. Also, many states that have a grey section in their counties have a lot of mountain ranges or deserts. In order to present the data in a non-spatial way, this study will use a No Pooling Regression model to display the amount of Deciduous Forest in the United States.

No Pooling Regression to understand Deciduous Forest

library(readr)
library(dplyr)
library(ggplot2)
library(nlme)
library(lme4)

Data_Prep <- UnitedStates_map19 %>%
  
  select(GEOID,
         trees,
         `Name of Area`) %>%
  
  filter(!is.na(GEOID),
         !is.na(trees),
         !is.na(`Name of Area`))%>%
  
  rename(County = `Name of Area`)

Intercpet and Interpretation

dcoef  <- Data_Prep %>% 
    group_by(County) %>% 
    do(mod = lm(trees ~ GEOID, data = .))
coef <- dcoef %>% do(data.frame(Number_Deciduous_Forest= coef(.$mod)[1]))
ggplot(coef, aes(x = Number_Deciduous_Forest)) + geom_histogram()

This graph contains information about intercepts of many regressions ran for all of the counties in the united states. As you can see, most counties in the USA do not have any Deciduous Forest. In addition, this graph shows that less than 500 counties in the United States produce all of the Deciduous Forest. This chart supports maps above since it suggests that most of the United States do not have Deciduous Forest.

Slope and Interpretation

dcoef  <- Data_Prep %>% 
    group_by(County) %>% 
    do(mod = lm(trees ~ GEOID, data = .))
coef <- dcoef %>% do(data.frame(slope= coef(.$mod)[2.]))
ggplot(coef, aes(x = slope)) + geom_histogram()

The graph above was generated to display the slopes of all models we ran when investigating the number of Deciduous Forest across all counties in the USA. As you can see, the USA does not have many deviations when it comes to the number of Deciduous Forest in each county. Like the chart above and the maps above, this graph of the slopes suggest that many counties in the United States does not have a Deciduous Forest.

Partial Pooling Regression to understand Deciduous Forest

ML1 <- lme(trees ~ GEOID, data = Data_Prep, random = ~ 1|County, method = "ML")
summary(ML1)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Data_Prep 
##        AIC      BIC    logLik
##   129209.6 129233.8 -64600.82
## 
## Random effects:
##  Formula: ~1 | County
##         (Intercept)  Residual
## StdDev:    51381081 340920236
## 
## Fixed effects: trees ~ GEOID 
##                 Value Std.Error   DF   t-value p-value
## (Intercept) 188355120  14330822 1812 13.143357       0
## GEOID            2876       415 1251  6.925615       0
##  Correlation: 
##       (Intr)
## GEOID -0.893
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.1084568 -0.6916302 -0.3499509  0.4037234  5.2461734 
## 
## Number of Observations: 3065
## Number of Groups: 1813

The model above is known as partial pooling models. This model will combine the strengths of no-pooling models and complete-pooling model. This model will capture the effects of reality as well as remain somewhat conservative in its analysis. Please note that the model above uses a random intercept which allows for group variation. As you can see, the standard deviation for this model is 51381081 units.

ML2 <- lme(trees ~ GEOID, data = Data_Prep, random = ~ GEOID|County, method = "ML")
summary(ML2)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Data_Prep 
##        AIC      BIC   logLik
##   129177.2 129213.4 -64582.6
## 
## Random effects:
##  Formula: ~GEOID | County
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev       Corr  
## (Intercept) 3.869853e+07 (Intr)
## GEOID       3.838191e+03 -0.571
## Residual    3.250061e+08       
## 
## Fixed effects: trees ~ GEOID 
##                 Value Std.Error   DF   t-value p-value
## (Intercept) 191914669  13765367 1812 13.941849       0
## GEOID            2595       424 1251  6.124999       0
##  Correlation: 
##       (Intr)
## GEOID -0.875
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.7025346 -0.6684978 -0.3575153  0.3791874  5.4072237 
## 
## Number of Observations: 3065
## Number of Groups: 1813

The intercept of this model is 3.869853e+07. Depending on the state, the intercpet will increase by 3.838191e+03 unites. In order to complete this analysis, this model used maximum likelihood.

Conclusion

Based on the resulted generated by this study, we can conclude that most of the United States are not covered in Deciduous Forest. The vast majority of Deciduous Forest can be found towards the east coast. Also, most of the Deciduous Forest in the United States can be found in around 300 counties. Again, Deciduous Forest has many health benefits to Humans as well as the entire globe. To provide a better world for everyone on earth, work needs to be conducted to understand the Deciduous Forest role in the environment. Also, policies should be based to create more and preserve the Deciduous Forest we have in the USA.