FACTS Common Attributes and FACTS Hazardous Fuels are both datasets that contain records of forest management activities. Both are maintained by the USFS, and are distributed as ESRI geodatabases (among other formats).
But what are the key differences between these databases? FACTS Hazardous Fuels records are ostensibly a subset of FACTS Common Attributes records, but the reality isn’t so simple. Here, we will explore the key differences between these databases to understand how they relate to one another.
The USFS offers the following definition of a fuels treatment:
Vegetative manipulation designed to create and maintain resilient and sustainable landscapes, including burning, mechanical treatments, and/or other methods that reduce the quantity or change the arrangement of living or dead fuel so that the intensity, severity, or effects of wildland fire are reduced within acceptable ecological parameters and consistent with land management plan objectives, or activities that maintain desired fuel conditions.
However, this definition isn’t very useful for subsetting the tabular records of FACTS Common Attributes. To define the subset of records in FACTS Hazardous Fuels, we need a combination of attribute values that are unique to fuels treatments. Given how complex the USFS’s activity records can be, this is no easy task.
Matt Tansey, in his document “Hazardous Fuels Reduction Treatments: Tracking and Accomplishment Reporting Requirements” (2024), provides a list of valid activity codes and fund codes that are ostensibly used to define FACTS Hazardous Fuels records.
Let’s examine those attribute fields here:
va <- fread("valid_activities.csv")
vf <- fread("valid_funds.csv")
print("Valid activity codes:")
## [1] "Valid activity codes:"
va$ACTIVITY_CODE %>% sort
## [1] 1102 1111 1112 1113 1117 1119 1120 1130 1136 1139 1150 1152 1153 1154 1160
## [16] 1180 2000 2341 2360 2370 2510 2530 2540 2560 3132 3340 3370 3380 4101 4102
## [31] 4110 4111 4112 4113 4114 4115 4116 4117 4120 4121 4122 4123 4131 4132 4133
## [46] 4134 4140 4141 4142 4143 4144 4145 4146 4147 4148 4150 4151 4152 4154 4160
## [61] 4162 4175 4176 4177 4183 4186 4192 4193 4194 4195 4196 4200 4210 4211 4220
## [76] 4230 4231 4232 4233 4240 4241 4242 4270 4455 4470 4471 4472 4473 4474 4475
## [91] 4480 4481 4482 4483 4484 4485 4490 4491 4492 4493 4494 4495 4511 4512 4521
## [106] 4522 4530 4540 4541 4542 4580 6101 6103 6104 6105 6106 6107 6133 6584 6684
## [121] 7015 7050 7065 7067 9008 9400
print("Valid fund codes:")
## [1] "Valid fund codes:"
vf$FUND_CODE %>% sort
## [1] "BDBD" "CFIX" "CFLN" "CMLG"
## [5] "CMRD" "CONT" "CWFS" "CWK2"
## [9] "CWKV" "CWOC" "DS01" "DS02"
## [13] "DS03" "DS05" "DS06" "DSSE"
## [17] "GBGB" "GNPI" "GSRV" "HTBW"
## [21] "HX01" "HX04" "HX05" "HX08"
## [25] "HX10" "HX13" "HX25" "HX28"
## [29] "HX29" "HX36" "HX51/NIXX" "HX52/NIJX"
## [33] "HX59/NIAX" "HX61/NIFX" "HX63/NICX" "IRHF"
## [37] "IRVM" "NFHF" "NFN3" "NFRG"
## [41] "NFRW" "NFSE" "NFSO" "NFTM"
## [45] "NFVW" "NFWF" "NFXF" "NFXN"
## [49] "OFED" "PPPP" "RBRB" "RIRI"
## [53] "RTRT" "SNPL" "SPCH" "SPFH"
## [57] "SPID" "SPXF" "SPXN" "SRS2"
## [61] "SRSA" "SSCC" "SSSS" "TPPS"
## [65] "TX70" "TX71" "TX72" "TX73"
## [69] "VX70" "VX72" "VX74" "VX75"
## [73] "VX76" "WFHF/NFHF****" "WFPR" "WFSE"
## [77] "WFSU" "WFXF" "WFXN" "WISX"
## [81] "XXXX"
This seems sensible, but clearly this can’t be correct - there are many records in FACTS Hazardous Fuels that don’t match these codes. Let’s examine the records that don’t match these codes. But first, we’ll need to tidy up the tables from Tansey’s document to make them usable in R.
# clean up list of valid fund codes
good_FC <- vf$FUND_CODE %>%
unique %>% # remove duplicates
str_split("/") %>% # split by slashes (some cells contained multiple codes)
unlist %>% # unlist to give each code its own element
str_replace("\\*\\*\\*", "") # clean up asterisks from original table
head(good_FC)
## [1] "BDBD" "CFLN" "CFIX" "CMLG" "CMRD" "CONT"
# define a concatenated string that can be used for regex matching
good_FC_str <- good_FC %>% str_c(collapse = "|")
# print the first 50 characters
substr(good_FC_str, 1, 50)
## [1] "BDBD|CFLN|CFIX|CMLG|CMRD|CONT|CWFS|CWK2|CWKV|CWOC|"
Now, we can check FACTS Hazardous Fuels for any records that don’t match these codes.
hf_nomatch <- hf %>%
# create columns to note whether activity and fund codes are valid
mutate(
"activity_match" = ACTIVITY_CODE %in% va$ACTIVITY_CODE,
"fund_match" = str_detect(FUND_CODE, good_FC_str),
"all_match" = activity_match & fund_match
)
# tabulate the results
hf_nomatch %>%
select(activity_match, fund_match) %>%
table
## fund_match
## activity_match FALSE TRUE
## FALSE 120 1282
## TRUE 25249 560043
# calculate the percentage of records that "don't belong"
nrow(hf_nomatch[!hf_nomatch$all_match, ]) / nrow(hf_nomatch) * 100
## [1] 4.233893
From this, we can see that 4.26% of records in FACTS Hazardous Fuels don’t match the activity and fund codes provided by Tansey. This is a significant portion of the data, and suggests that the codes provided by Tansey are not exhaustive.
Most (25249) non-matching records are due to non-matched fund codes. A smaller number of records (1282) have non-matched activity codes. There are even a few (120) that have non-matching fund codes AND activity codes.
So, what are these “extra” codes? Let’s examine them, starting with the activity codes.
# extract the non-matching activity codes
hf_nomatch_activity <- hf_nomatch[!hf_nomatch$activity_match, ] %>%
pull(ACTIVITY_CODE) %>%
unique %>%
sort
hf_nomatch_activity
## [1] 1115 1116 1118
The non-matching activity codes are: - 1115 : Wildfire - Fuels Benefit - 1116 : Wildland Fire Use - 1118 : Wildfire - Human Ignition
Again, these codes are not included in Tansey’s list of valid activity codes, but they are clearly in the Hazardous Fuels geodatabase.
Next, let’s examine the non-matching fund codes.
# extract the non-matching fund codes
hf_nomatch_fund <- hf_nomatch[!hf_nomatch$fund_match, ] %>%
pull(FUND_CODE) %>%
unique %>%
str_split(",") %>%
unlist %>%
unique %>%
sort
hf_nomatch_fund
## [1] "ACAC" "CFLR" "CMFD" "CMIX" "CMRO" "CMTR" "CRRD" "CRWE" "CWCD" "ER21"
## [11] "ETCC" "FRF2" "HFDS" "LEXC" "MULT" "NFCC" "NFDD" "NFDH" "NFFP" "NFIX"
## [21] "NFND" "NFPN" "NFRC" "NFRR" "NFSF" "NFSH" "NIKX" "NIPX" "NISX" "NXFX"
## [31] "PEP2" "PIPI" "PRPP" "PSCP" "PSRS" "RMET" "SIFI" "SPCF" "SPOT" "SPS4"
## [41] "SPS5" "SPS6" "SPSE" "SPST" "WFCF" "WFEX" "WFIX" "WFW3" "WRFH" "WRHN"
## [51] "WRHR" "WRWB"
There are lots of codes here. I actually have no idea what these codes mean, but I know they are not in Tansey’s list.
So, if we are to recreate the Hazardous Fuels data by subsetting the Common Attributes data, we’re not going to get there by simply following the rules described by Tansey. We’ll have to try to reverse-engineer them.
To start, we’ll need to add these “extra” activity and fund codes to our list of valid codes.
# add the extra activity codes to the list of valid activity codes
good_AC_full <- c(va$ACTIVITY_CODE, hf_nomatch_activity)
# add the extra fund codes to the list of valid fund codes
good_FC_full <- c(good_FC, hf_nomatch_fund)
# create a new concatenated string for regex matching
good_FC_full_str <- good_FC_full %>% str_c(collapse = "|")
Now that we have complete lists of “valid” activity and fund codes, we can try to recreate FACTS Hazardous Fuels by subsetting FACTS Common Attributes.
# filter "all" to only include records that match the valid activity and fund
# codes
valid_all <- all %>%
filter(
ACTIVITY_CODE.y %in% good_AC_full, # valid activity
str_detect(FUND_CODES, good_FC_full_str) # at least one valid fund code
)
In theory, this should give us a dataset that is identical to FACTS Hazardous Fuels, since it uses the same set of activity and fund codes. Let’s check whether this is true by simply counting the number of records in both datasets.
# number of records in FACTS Hazardous Fuels
nrow(hf)
## [1] 629964
# number of records in "valid_all" - the filtered FACTS Common Attributes data
nrow(valid_all)
## [1] 586686
# how do the sizes compare?
nrow(hf) / nrow(valid_all)
## [1] 1.073767
Okay, so they’re pretty close, but not quite the same. The Hazardous Fuels dataset is 7% larger than our filtered Common Attributes dataset.
Where could these extra records be coming from? Here’s a clue: let’s check the PROC_REGION_CODE field in the Hazardous Fuels dataset.
hf$PROC_REGION_CODE %>% table(useNA = "always")
## .
## 1 2 3 4 5 6 8 9 10 15 <NA>
## 85251 65057 14899 41877 113697 144297 87812 76585 481 8 0
So, there are 8 records in “Region 15”. There is no Common Attributes geodatabase for “Region 15,” so these records are partly responsible for the discrepancy. However, these are just 8 of 629964 records, or about 0.00001%. That’s not enough to explain a 7% difference.
We need to find the other records that are missing from our filtered Common Attributes dataset.
# anti-join the filtered Common Attributes data with the Hazardous Fuels data to
# identify the missing records
missing <- hf %>%
anti_join(
valid_all,
by = c("ACTIVITY_CN" = "EVENT_CN") # join by the activity code
)
# check that the number of records matches the expected 7%
nrow(missing) / nrow(valid_all) * 100 # perfect
## [1] 7.376689
Now, let’s explore this 7%. Let’s summarize some of the key attributes to see if we can identify any patterns.
# Region code
missing %>%
pull(PROC_REGION_CODE) %>%
table(useNA = "always")
## .
## 1 2 3 4 5 6 8 9 10 15 <NA>
## 3377 4681 1007 5329 8455 13451 3189 3675 106 8 0
# Activity code
missing %>%
pull(ACTIVITY_CODE) %>%
table(useNA = "always")
## .
## 1102 1111 1112 1113 1117 1118 1119 1120 1130 1136 1139 1150 1152 1153 1154 1160
## 242 2415 518 3174 247 21 8 1250 9402 258 230 2076 524 5164 355 1791
## 1180 2000 2341 2360 2370 2510 2530 3132 3370 3380 4102 4111 4113 4115 4117 4121
## 422 2 30 17 2 15 4 54 1 3 176 363 212 44 498 27
## 4122 4131 4132 4141 4143 4145 4146 4148 4151 4152 4162 4177 4183 4192 4193 4194
## 2 138 67 51 68 39 1 1 84 143 7 8 48 2 28 21
## 4196 4210 4211 4220 4231 4232 4241 4242 4270 4455 4471 4472 4473 4474 4475 4481
## 5 164 16 2451 406 57 28 12 17 18 90 75 12 438 49 4
## 4491 4492 4493 4494 4495 4511 4521 4530 4540 4541 6101 6103 6104 6105 6106 6107
## 164 68 31 100 85 1842 5809 255 182 39 367 24 2 63 16 84
## 6133 7050 7065 7067 9008 9400 <NA>
## 1 2 2 16 29 2 0
# Fund code
missing %>%
pull(FUND_CODE) %>%
table(useNA = "always")
## .
## CFLN CFLN,OFED WFHF <NA>
## 6 1 1 43270
Okay, look at that last table: the fund codes. The 8 records from “Region 15” have valid fund codes. All of the other records have NA fund codes. This is a problem, because NA fund codes are one of the ways we determine whether a Common Attributes record is valid. Since we’re filtering out all NA fund codes, we’re missing 7% of the Hazardous Fuels data.
Well, it seems to me that the missing fund codes in Hazardous Fuels is an issue that needs to be addressed by USFS. But, if we’re going to recreate Hazardous Fuels from Common Attributes, we need to include these records.
Although we were able to “reverse-engineer” the correct list of activity codes and fund codes from the Hazardous Fuels data, there is no way for us to circumvent the issue of incorrect or incomplete data. I can’t understand how the Forest Service pulls together their Hazardous Fuels data with those missing fund codes.
It seems that the way forward is to continue downloading a copy of the Hazardous Fuels data and manually checking it against the Common Attributes data to ensure that we don’t miss any records.
In the data paper, we can explain to the reader that the Hazardous Fuels data is ostensibly a subset of FACTS, but the rules for subsetting aren’t perfectly followed, and it’s unclear from all the documentation how the USFS actually does it. Then, we can just be sure to explain our rules more clearly.