In my meeting with Jim on 08/16/21, the following figure of the balance of effort across GEAR_CODE x POOL_REACH x SAMPLE_PERIOD combinations prompted Jim to notice that effort seemed low or missing for many combinations in the 2019 site-level data.
Jim specifically mentioned Mini-Fyke and Hoop Net gear types as low or missing. In trying to track down the missing Mini-Fykes and Hoop Nets, I realized we had two problems.
1) Based on conversations with Brandon, I decided to filter out all observations with Site Type values not included in the LTRMP metadata. Specifically for 2019, I ended up filtering out a few hundred observations with Site Type values of 5 and 7. Maybe these held the missing data?
2) I made a mistake in labeling the Effort Balance figure that prompted Jim’s thoughts on the missing data. The figure’s x-axis was mislabeled. Specifically, Mini Fyke and Fyke labels were swapped. This is an embarrassing coding mistake on my end that resulted from trying to reorder the factors by total effort to be more visually appealing, but not reordering the labels.
So we’ve got two problems to fix. The second is an easy one. I’ve corrected the labeling of the Effort Balance figure’s x-axis. See that figure below.
Now we’ve filled in a lot of the Mini-Fyke data that were “missing” before. Maybe that labeling fix alone solves Jim’s qualms about the figure and therefore the underlying data? My guess is no. We’re still missing quite a few Mini Fykes from Period 1.Jim also mentioned missing Hoop net samples, which wasn’t addressed by the labeling fix. Plus we know we still are filtering out a lot of data by omitting the observations with Site Types of 5 and 7. So let’s try to address that filtering issue, too.
First let’s look at some tables that help breakdown the observations that were filtered out for having “weird” SITE_TYPE values.
| Site Type | Count |
|---|---|
| 5 | 31 |
| 7 | 391 |
| Gear Code | Count |
|---|---|
| D | 115 |
| F | 8 |
| HL | 72 |
| HS | 61 |
| M | 166 |
| Agency | Count |
|---|---|
| IDNR | 90 |
| USFWS | 331 |
| Pool/Reach | Count |
|---|---|
| LP | 8 |
| BN | 7 |
| DR | 47 |
| MA | 156 |
| ST | 128 |
| PE | 17 |
| LG | 0 |
| AL | 59 |
So we can see that the vast majority of weird Site types have values of 7 and were entered by IDNR or USFWS samplers. These observations spread out over many pools (except LaGrange), and likely contain many of our “missing” Hoop net samples. It also contains a lot of Mini Fykes, a large number of Day Electrofishing runs, and a handful of Fyke nets.
NOTE: They all contain the same (correct) project code, which is something we should keep in mind when we look at the 2020 data.
| Project Code | Count |
|---|---|
| R-99 | 422 |
So let’s take another look at that Effort Balance figure, but this time with those 5s and 7s added back in. Note: They won’t ALL be added back in if they were unsampled (i.e. SUMMARY_CODE %in% c(1,2)).
OK, so we’ve added a lot more effort here, and visually it seems to match up with the data tables above. The sample sizes of each combination (in white) can be difficult to read, but it’s clear that we have at least some effort in each POOL_REACH X SAMPLE_PERIOD combination for Efishing, Mini Fykes, and Large Hoop Nets. We’re only missing Sample Period 2 in Dresden for Small Hoop Nets. For Fyke effort, we have no effort in LP, BN, or ST, relatively balanced effort in DR, MA, PE, and LG, and only Period 1 effort in AL.
Perhaps Jim or Brandon can weigh in on if this is what we expect for Fyke, and why Period 2 is missing for Small Hoop Nets in Dresden.
What we learned from the 2019 data:
1) Just because it has an abnormal SITE_TYPE value doesn’t mean it should be automatically filtered out
2) I need to be more careful applying manual labels to figures
So what about the 2020 data? Well, the Effort Balance figure for 2020 looks a lot like the 2019 figure after I made the two fixes to the 2019 data. So hopefully it’s in a good place. Let’s look.
This 2020 figure has effort values similar to the fixed 2019 figure, with the exception of no Fyke effort for the Alton pool which we had in 2019.
However, if we look at the SITE_TYPEs for the 2020 data, we also filtered out a lot of observations for having SITE_TYPE values not included in the metadata, just like we did for 2019. Could these “weird” SITE_TYPE observations also hold valuable data?
Let’s look at how these filtered-out observations break down in tables, like we did for 2019.
| Site Type | Count |
|---|---|
| 3 | 806 |
| 4 | 955 |
| 6 | 20 |
| Gear Code | Count |
|---|---|
| CG | 1719 |
| CS | 1 |
| CT | 30 |
| D | 18 |
| GT | 13 |
| Agency | Count |
|---|---|
| IDNR | 956 |
| INHS | 800 |
| USACE | 6 |
| Pool/Reach | Count |
|---|---|
| LP | 212 |
| BN | 226 |
| DR | 430 |
| MA | 245 |
| ST | 378 |
| PE | 0 |
| LG | 0 |
| AL | 0 |
Here we can see there are a lot of observations filtered out for having SITE_TYPE values other than the ones listed in the metadata (i.e. other than 0, 1, or 2). These observations are from different sampling agencies, including INHS and span mostly the upper river. However we’ll notice that most of the GEAR_CODEs do not match the typical GEAR_CODES of this project. Are they from a different project?
| Project Code | Count |
|---|---|
| E-001 | 826 |
| E-002 | 919 |
| E-004 | 22 |
| E-006 | 14 |
Sure enough, they all have project codes other than the ones recognized for the Lock and Dam project (M- or R-99). If I understand correctly from the metadata, any observation with a project code starting with an “E” is “ad hoc exploratory sampling.” So, again if I understand correctly, these would have and should have been filtered out for having the wrong project code in addition to being filtered out for having the wrong site type.
It would seem the 2020 data is fine as stands with respect to the observations we’re keeping versus omitting based on SITE_TYPE values.
To wrap up, it would be great to get confirmation from Jim or Brandon on the state of these two figures (2020 and 2019) now that I’ve worked through the SITE_TYPE and labeling issues.
Thanks for continuing to bear with me as I work through cleaning/organizing the data before we jump into analyses!
Mike