Name: ____________________
Time: 1 hour 25 minutes | Total: 100 points
Rules:
help() or ? in the console to
check syntax.fia_eastern31_recent.csv — tree-level FIA data (~2.87
million rows, 40 columns)REF_SPECIES_trimmed.csv — species reference table
(2,677 rows, 9 columns)fia_eastern31_recent.csv — Tree Data (each row = one
tree)| Variable | Type | Description |
|---|---|---|
TREE_CN |
numeric | Unique tree identifier |
PLT_CN |
numeric | Unique plot identifier |
STATECD |
integer | State FIPS code (e.g., 47 = Tennessee) |
STATE_ABBR |
character | Two-letter state abbreviation (e.g., “TN”, “FL”) |
SPCD |
integer | Species code — key to join with REF_SPECIES |
DIA |
numeric | Diameter at breast height (inches) |
HT |
numeric | Total tree height (feet) |
STATUSCD |
integer | 1 = live tree, 2 = standing dead |
LAT |
numeric | Plot latitude (decimal degrees) |
LON |
numeric | Plot longitude (decimal degrees, negative) |
BALIVE |
numeric | Basal area of live trees on the plot (sq ft/acre) |
MAJOR_SPGRPCD |
integer | 1=Pine, 2=Other softwood, 3=Soft hardwood, 4=Hard hardwood |
REF_SPECIES_trimmed.csv — Species Reference (each row =
one species)| Variable | Type | Description |
|---|---|---|
SPCD |
integer | Species code — key to join with tree data |
COMMON_NAME |
character | Common name (e.g., “red maple”, “loblolly pine”) |
GENUS |
character | Genus name (e.g., “Acer”, “Pinus”) |
SPECIES |
character | Species epithet (e.g., “rubrum”, “taeda”) |
SPECIES_SYMBOL |
character | USDA PLANTS symbol (e.g., “ACRU”) |
E_SPGRPCD |
integer | Eastern species group code |
MAJOR_SPGRPCD |
integer | 1=Pine, 2=Other softwood, 3=Soft hardwood, 4=Hard hardwood |
SFTWD_HRDWD |
character | “S” = softwood, “H” = hardwood |
WOODLAND |
character | “Y” if woodland species |
Getting started: Load your packages and read the data first.
library(readr) # or library(data.table)
library(dplyr)
tree <- read_csv("fia_eastern31_recent.csv", show_col_types = FALSE)
ref <- read_csv("REF_SPECIES_trimmed.csv", show_col_types = FALSE)
(a) (3 pts) Create a numeric vector called
diameters containing these 8 values:
5.2, 12.0, 8.7, 3.1, 15.4, 9.9, 22.6, 7.3
Calculate the mean, maximum, and length of this vector.
(b) (3 pts) Using logical subsetting on
diameters, extract only values greater than
10. How many values meet this condition?
(c) (4 pts) Create a character vector called
states containing: "TN", "NC",
"FL", "GA", "VA".
"AL".(a) (5 pts) After reading
fia_eastern31_recent.csv into an object called
tree, answer the following by writing R code:
STATE_ABBR column? What data type
is DIA?(b) (5 pts) Use summary() on the
DIA column:
NA values? How can you tell from the
summary output?(c) (5 pts) Display the first 10
rows of just these four columns: STATE_ABBR,
SPCD, DIA, HT.
(a) (5 pts) First, join the species reference table
to the tree data using SPCD as the key:
tree <- tree %>% left_join(ref, by = "SPCD")
Now create a data frame called tn_live containing only
live trees (STATUSCD == 1) in
Tennessee (STATE_ABBR == "TN"). How many
rows does tn_live have?
(b) (5 pts) Using mutate(), add two new
columns to tn_live:
DIA_cm: diameter converted from inches to centimeters
(multiply DIA by 2.54)BA_sqft: individual tree basal area in square feet,
calculated as: \[BA = \frac{\pi}{4} \times
\left(\frac{DIA}{12}\right)^2\] (DIA is in
inches; dividing by 12 converts to feet.)Show the first 6 rows of COMMON_NAME, DIA,
DIA_cm, and BA_sqft.
(c) (5 pts) Write a single chained
command using the pipe operator (%>%) that:
tn_liveDIA > 20 inchesCOMMON_NAME, DIA, and
HTDIA in descending
order(d) (5 pts) How many unique species
(unique COMMON_NAME values) are among live trees in
Tennessee?
(a) (7 pts) Using the full tree data (all 31 states,
live trees only: STATUSCD == 1), calculate
the following for each state
(STATE_ABBR):
n_trees: total number of live treesn_species: number of unique species (use
SPCD)mean_DIA: mean diameter (watch out for NA
values)Sort the result by n_species in descending order.
Which state has the most species?
(b) (7 pts) For live trees in Tennessee only, find the top 5 most common species by tree count. Your output should show:
COMMON_NAMEn: number of treesmean_DIA: mean diametermean_HT: mean heightSort by count (largest first) and display only the top 5.
(c) (6 pts) Using all 31-state live tree data,
calculate the number of live trees and mean
diameter for each MAJOR_SPGRPCD group.
| Code | Group |
|---|---|
| 1 | Pine |
| 2 | Other softwood |
| 3 | Soft hardwood |
| 4 | Hard hardwood |
(a) (5 pts) Concept question. In
your own words (as code comments), explain the difference between
left_join() and inner_join().
Suppose the tree data has 100 rows and 3 of
those rows have an SPCD value that does not exist
in ref:
left_join(tree, ref, by = "SPCD")
return?inner_join(tree, ref, by = "SPCD")
return?COMMON_NAME column for those 3
unmatched rows in a left_join?(b) (5 pts) Use anti_join() to find
species codes (SPCD) in the reference
table (ref) that are not present
in the tree data (tree). How many? Why might this be?
(Answer “why” as a comment.)
(c) (5 pts) Create two separate data frames:
fl_trees: live trees in Floridaga_trees: live trees in GeorgiaUse bind_rows() to combine them into fl_ga.
Verify that the row count matches.
(d) (5 pts) You have the following small data frame:
state_info <- data.frame(
STATE_ABBR = c("TN", "NC", "FL", "GA", "VA"),
region = c("Mid-South", "Southeast", "Southeast", "Southeast", "Mid-Atlantic")
)
Write code to:
state_info onto that summary using
STATE_ABBR as the key.Answer in a comment: Which type of join keeps
all 31 states in the result, even those not in
state_info? What will the region column show
for states like “ME” or “WI”?
A colleague asks: “Among the 10 most common tree species in the eastern US, which species tends to grow at the highest latitudes and which at the lowest?”
Write a complete pipeline that:
tree data (already joined with
ref).COMMON_NAME).n_trees: total number of treesmean_lat: mean latitude (LAT)mean_DIA: mean diameterwood_type: softwood/hardwood classification
(SFTWD_HRDWD)mean_lat in descending
order (highest latitude first).Then answer in a comment: Which species has the highest mean latitude? Which has the lowest? Is there a pattern between softwood/hardwood and latitude?
— END OF EXAM —
Save your file and make sure your name is at the top. Good luck!