Name: ____________________

Time: 1 hour 25 minutes | Total: 100 points

Rules:


Quick Reference: Data Files

fia_eastern31_recent.csv — Tree Data (each row = one tree)

Variable Type Description
TREE_CN numeric Unique tree identifier
PLT_CN numeric Unique plot identifier
STATECD integer State FIPS code (e.g., 47 = Tennessee)
STATE_ABBR character Two-letter state abbreviation (e.g., “TN”, “FL”)
SPCD integer Species code — key to join with REF_SPECIES
DIA numeric Diameter at breast height (inches)
HT numeric Total tree height (feet)
STATUSCD integer 1 = live tree, 2 = standing dead
LAT numeric Plot latitude (decimal degrees)
LON numeric Plot longitude (decimal degrees, negative)
BALIVE numeric Basal area of live trees on the plot (sq ft/acre)
MAJOR_SPGRPCD integer 1=Pine, 2=Other softwood, 3=Soft hardwood, 4=Hard hardwood

REF_SPECIES_trimmed.csv — Species Reference (each row = one species)

Variable Type Description
SPCD integer Species code — key to join with tree data
COMMON_NAME character Common name (e.g., “red maple”, “loblolly pine”)
GENUS character Genus name (e.g., “Acer”, “Pinus”)
SPECIES character Species epithet (e.g., “rubrum”, “taeda”)
SPECIES_SYMBOL character USDA PLANTS symbol (e.g., “ACRU”)
E_SPGRPCD integer Eastern species group code
MAJOR_SPGRPCD integer 1=Pine, 2=Other softwood, 3=Soft hardwood, 4=Hard hardwood
SFTWD_HRDWD character “S” = softwood, “H” = hardwood
WOODLAND character “Y” if woodland species

Getting started: Load your packages and read the data first.

library(readr)   # or library(data.table)
library(dplyr)

tree <- read_csv("fia_eastern31_recent.csv", show_col_types = FALSE)
ref  <- read_csv("REF_SPECIES_trimmed.csv", show_col_types = FALSE)

Question 1: Vectors and Basic R (10 points)

(a) (3 pts) Create a numeric vector called diameters containing these 8 values:

5.2, 12.0, 8.7, 3.1, 15.4, 9.9, 22.6, 7.3

Calculate the mean, maximum, and length of this vector.

(b) (3 pts) Using logical subsetting on diameters, extract only values greater than 10. How many values meet this condition?

(c) (4 pts) Create a character vector called states containing: "TN", "NC", "FL", "GA", "VA".


Question 2: Data Frame Inspection (15 points)

(a) (5 pts) After reading fia_eastern31_recent.csv into an object called tree, answer the following by writing R code:

(b) (5 pts) Use summary() on the DIA column:

(c) (5 pts) Display the first 10 rows of just these four columns: STATE_ABBR, SPCD, DIA, HT.


Question 3: Modifying Data with dplyr (20 points)

(a) (5 pts) First, join the species reference table to the tree data using SPCD as the key:

tree <- tree %>% left_join(ref, by = "SPCD")

Now create a data frame called tn_live containing only live trees (STATUSCD == 1) in Tennessee (STATE_ABBR == "TN"). How many rows does tn_live have?

(b) (5 pts) Using mutate(), add two new columns to tn_live:

Show the first 6 rows of COMMON_NAME, DIA, DIA_cm, and BA_sqft.

(c) (5 pts) Write a single chained command using the pipe operator (%>%) that:

  1. Starts with tn_live
  2. Filters to trees with DIA > 20 inches
  3. Selects only COMMON_NAME, DIA, and HT
  4. Arranges by DIA in descending order
  5. Shows the top 10 rows

(d) (5 pts) How many unique species (unique COMMON_NAME values) are among live trees in Tennessee?


Question 4: Collapsing Data — Group Summaries (20 points)

(a) (7 pts) Using the full tree data (all 31 states, live trees only: STATUSCD == 1), calculate the following for each state (STATE_ABBR):

Sort the result by n_species in descending order. Which state has the most species?

(b) (7 pts) For live trees in Tennessee only, find the top 5 most common species by tree count. Your output should show:

Sort by count (largest first) and display only the top 5.

(c) (6 pts) Using all 31-state live tree data, calculate the number of live trees and mean diameter for each MAJOR_SPGRPCD group.

Code Group
1 Pine
2 Other softwood
3 Soft hardwood
4 Hard hardwood

Question 5: Merging and Appending (20 points)

(a) (5 pts) Concept question. In your own words (as code comments), explain the difference between left_join() and inner_join().

Suppose the tree data has 100 rows and 3 of those rows have an SPCD value that does not exist in ref:

(b) (5 pts) Use anti_join() to find species codes (SPCD) in the reference table (ref) that are not present in the tree data (tree). How many? Why might this be? (Answer “why” as a comment.)

(c) (5 pts) Create two separate data frames:

Use bind_rows() to combine them into fl_ga. Verify that the row count matches.

(d) (5 pts) You have the following small data frame:

state_info <- data.frame(
  STATE_ABBR = c("TN", "NC", "FL", "GA", "VA"),
  region = c("Mid-South", "Southeast", "Southeast", "Southeast", "Mid-Atlantic")
)

Write code to:

  1. Create a summary with one row per state showing the total number of live trees per state.
  2. Join state_info onto that summary using STATE_ABBR as the key.

Answer in a comment: Which type of join keeps all 31 states in the result, even those not in state_info? What will the region column show for states like “ME” or “WI”?


Question 6: Putting It All Together (15 points)

A colleague asks: “Among the 10 most common tree species in the eastern US, which species tends to grow at the highest latitudes and which at the lowest?”

Write a complete pipeline that:

  1. Starts with the full tree data (already joined with ref).
  2. Filters to live trees only.
  3. Identifies the 10 most common species by total tree count (use COMMON_NAME).
  4. Filters the data to include only trees belonging to those top-10 species.
  5. For each of those 10 species, calculates:
    • n_trees: total number of trees
    • mean_lat: mean latitude (LAT)
    • mean_DIA: mean diameter
    • wood_type: softwood/hardwood classification (SFTWD_HRDWD)
  6. Arranges by mean_lat in descending order (highest latitude first).
  7. Displays the final table.

Then answer in a comment: Which species has the highest mean latitude? Which has the lowest? Is there a pattern between softwood/hardwood and latitude?


— END OF EXAM —

Save your file and make sure your name is at the top. Good luck!