Project 1- Greater Sage-Grouse Biologically Significant Units (BSUs)

Author

Naomi Surendorj

This dataset is from the U.S. Bureau of Land Management (BLM) and shows information about Greater Sage-Grouse Biologically Significant Units (BSUs) in the western United States. Each row in the data shows an area of land where sage-grouse live. The dataset has variables like political_state, eis_name, shape_area, shape_length, and acres. These describe where the land is, how large it is, and what type of plan or region it belongs to. For this project, I wanted to see how the size of each BSU (in acres) connects to its shape and area. I also wanted to compare which states or EIS regions have the biggest total sage-grouse areas. This dataset has both categorical variables and quantitative variables, so it works well.

setwd("~/Desktop/DATA 110 ") 

library(tidyverse)
library(RColorBrewer)

bsu <- read_csv("Project1.csv")

names(bsu) <- tolower(names(bsu))

bsu <- bsu |> mutate(political_state = as.factor(political_state),
                     eis_name = as.factor(eis_name),
                     unique_id = as.factor(unique_id))

bsu <- bsu |> mutate(acres = as.numeric(acres),
                     sq_miles = acres / 640)

head(bsu)

# A tibble: 6 × 8
  objectid unique_id   political_state eis_name  acres shape__area shape__length
     <dbl> <fct>       <fct>           <fct>     <dbl>       <dbl>         <dbl>
1        1 Bald Hills… Utah            Utah     3.27e5     1.32e 9       268408.
2        2 Black Rock… Nevada          NVCA     4.55e5     1.84e 9       365641.
3        3 Box Elder … Utah            Utah     1.14e6     4.59e 9       714319.
4        4 Butte/Buck… Nevada          NVCA     2.82e6     1.14e10       713223.
5        5 Carbon - U… Utah            Utah     3.75e5     1.52e 9      1150680.
6        6 Central El… Nevada          NVCA     3.56e6     1.44e10       668517.
# ℹ 1 more variable: sq_miles <dbl>

Summarize BSU area by state

state_summary <- bsu |> group_by(political_state) |>
  summarise(n_units = n(),
            total_acres = sum(acres, na.rm = TRUE),
            avg_acres = mean(acres, na.rm = TRUE)) |>
  arrange(desc(total_acres))

head(state_summary)

# A tibble: 6 × 4
  political_state n_units total_acres avg_acres
  <fct>             <int>       <dbl>     <dbl>
1 Nevada               16   34601008.  2162563.
2 Wyoming              88   15799734.   179542.
3 Montana               5    9341188.  1868238.
4 Idaho                 8    8504746.  1063093.
5 Oregon               20    6556062.   327803.
6 Utah                 13    5699903.   438454.

Visualization 1: Top 10 states by total BSU acres

top10 <- state_summary |> slice_max(total_acres, n = 10) |>
  mutate(political_state = fct_reorder(political_state, total_acres))

ggplot(top10, aes(x = political_state, y = total_acres, fill = political_state)) +
  geom_col() +
  coord_flip() +
  scale_fill_brewer(palette = "Paired") +
  labs(title = "Top 10 States by Total Greater Sage-Grouse BSU Acres",
       x = "State",
       y = "Total Acres",
       fill = "State",
       caption = "Source:Bureau of Land Management (2017)") +
  theme_bw(base_size = 12)

Visualization 2: Acres vs Shape_Area

top_regions <- bsu |>
  count(eis_name, sort = TRUE) |>
  head(n = 10) |>
  pull(eis_name)

bsu_small <- bsu |> filter(eis_name %in% top_regions)

ggplot(bsu_small, aes(x = shape__area, y = acres, color = eis_name)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_color_brewer(palette = "Paired") +
  labs(title = "BSU Acres vs. Shape__Area (Top 10 EIS Regions)",
       x = "Shape__Area (map units)",
       y = "Acres",
       color = "EIS Region",
       caption = "Source: Bureau of Land Management (2017)") +
  theme_minimal(base_size = 12)

`geom_smooth()` using formula = 'y ~ x'

fit <- lm(acres ~ shape__area + shape__length + political_state, data = bsu)
summary(fit)


Call:
lm(formula = acres ~ shape__area + shape__length + political_state, 
    data = bsu)

Residuals:
       Min         1Q     Median         3Q        Max 
-1.898e-04 -9.790e-06 -9.230e-06  8.200e-07  6.822e-04 

Coefficients:
                                           Estimate Std. Error   t value
(Intercept)                              -3.462e-06  6.313e-05 -5.50e-02
shape__area                               2.471e-04  2.646e-15  9.34e+10
shape__length                             7.980e-12  2.227e-11  3.58e-01
political_stateColorado                  -1.559e-06  6.865e-05 -2.30e-02
political_stateIdaho                     -3.362e-06  6.951e-05 -4.80e-02
political_stateMontana                   -4.273e-06  7.034e-05 -6.10e-02
political_stateNevada                     7.685e-07  6.645e-05  1.20e-02
political_stateNevada/California         -6.935e-07  7.820e-05 -9.00e-03
political_stateNorth Dakota/South Dakota -1.906e-06  8.941e-05 -2.10e-02
political_stateOregon                     1.034e-06  6.456e-05  1.60e-02
political_stateUtah                      -6.625e-06  6.555e-05 -1.01e-01
political_stateWyoming                    1.272e-05  6.340e-05  2.01e-01
                                         Pr(>|t|)    
(Intercept)                                 0.956    
shape__area                                <2e-16 ***
shape__length                               0.721    
political_stateColorado                     0.982    
political_stateIdaho                        0.961    
political_stateMontana                      0.952    
political_stateNevada                       0.991    
political_stateNevada/California            0.993    
political_stateNorth Dakota/South Dakota    0.983    
political_stateOregon                       0.987    
political_stateUtah                         0.920    
political_stateWyoming                      0.841    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.3e-05 on 148 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 3.519e+21 on 11 and 148 DF,  p-value: < 2.2e-16

plot(fit, which = 1)

plot(fit, which = 2)

Warning: not plotting observations with leverage one:
  8, 22

plot(fit, which = 3)

Warning: not plotting observations with leverage one:
  8, 22

plot(fit, which = 5)

Warning: not plotting observations with leverage one:
  8, 22

For this project, I started by cleaning the dataset from the U.S. Bureau of Land Management, which focused on sage-grouse habitat areas. I first changed all the column names to lowercase to make them easier to read and use. Then I turned the variables political state, EIS name, and unique ID into factors because they represent categories instead of numbers. I also made sure that the numeric variables, like acres, shape area, and shape length, were set as numbers so that calculations would work correctly. After that, I created a new variable called square miles by dividing acres by 640. This helped show land area in square miles, which made the dataset easier to understand and compare.

The first graph I made was a bar chart showing the top ten states with the largest sage-grouse habitat areas. It showed that a few states, such as Wyoming and Nevada, had a much larger total area than others, meaning most of the land for sage-grouse protection is concentrated in those regions. The second graph was a scatterplot that compared acres and shape area across the top ten EIS regions. The scatterplot showed that as the shape area increased, the number of acres also increased, which makes sense because larger mapped areas usually hold more total land. Each color represented a different EIS region, and this made it easier to compare regions visually.

There were a few things I wanted to add if I had more time. I wanted to create a map to show where the EIS regions were located because it would make the data easier to visualize. If I had started earlier, I probably could have gotten it done. Even though I couldn’t include that part, the two graphs still turned out really well in my opinion. They show how sage-grouse habitats are spread across different states and how the area and land size are connected