PADEP 2022 Bio Meeting - RBP Training Results

Physical Habitat Assessment Results

README

PADEP conducted a physical habitat assessment training/calibration during the 2022 Biologists Meeting in Williamsport on October 20, 2022. The results were compiled and examined to identify parameters with weak precision that may deserve attention during future training/calibration events. Another objective was to begin investigating method precision estimates.

Site results are organized in tabbed format. Click tabs across the top to examine results from each site - and an overall summary.

If you would like to recreate this analysis, or complete additional analysis, the dataset and R code are located on the last tab

Bottle Run

Bottle Run - Low Gradient (Glide/Pool) Site

Before interpreting results, keep in mind that Bottle Run is impaired for:
1) FLOW REGIME MODIFICATION – RURAL (RESIDENTIAL AREAS),
2) ORGANIC ENRICHMENT – RURAL (RESIDENTIAL AREAS), and
3) PATHOGENS – SOURCE UNKNOWN.

Figure 1. Bottle Run raw parameters, ordered from smallest to largest standard deviation.

Figure 2. Bottle Run couplet and total scores, ordered from smallest to largest standard deviation. Dashed horizontal lines indicate impairment thresholds for couplets (20) and total score (105).

Table 1. Summary Statistics of Bottle Run raw parameter, couplet, and total scores. Rank column sorted by ascending coefficient of variation (cv) of parameter.

SEC_STATION_ID	parameter	mean	sd	cv	n	min	max	rank
Bottle	total_score	88.2	9.6	26.4	24	69	106	12
Bottle	instr_coup_GP	17.1	4.1	9.6	24	8	26	11
Bottle	SED_DEP	6.0	2.7	9.0	24	2	15	10
Bottle	bank_coup_GP	26.4	4.4	6.8	24	17	33	9
Bottle	RIP_VEG_ZONE	7.5	2.5	6.6	24	4	16	8
Bottle	POOL_VAR	6.0	1.8	6.0	24	3	10	7
Bottle	CHAN_ALT	8.8	2.3	5.2	24	5	14	6
Bottle	SUB_AVAIL_COVER	8.0	1.9	4.8	24	5	11	5
Bottle	POOL_SUB_CHAR	11.1	2.5	4.6	24	6	16	4
Bottle	CHAN_FLOW	14.5	3.0	4.2	24	5	19	3
Bottle	BANK_VEG	13.2	2.6	4.0	24	9	17	2
Bottle	COND_BANKS	13.2	2.5	3.8	24	8	17	1

Hagermans Upstream

Hagermans Run Upstream - Riffle/Run Site

Before interpreting results, keep in mind that Hagermans Run Upstream is attaining Aquatic Life, Recreational, and Potable Water Supply uses.

Figure 1. Hagermans Upstream raw parameters, ordered from smallest to largest standard deviation.

Figure 2. Hagermans Upstream couplet and total scores, ordered from smallest to largest standard deviation. Dashed horizontal lines indicate impairment thresholds for couplets (24) and total score (140).

Table 1. Summary Statistics of Hagermans Upstream raw parameter, couplet, and total scores. Rank column sorted by ascending coefficient of variation (cv) of parameter.

SEC_STATION_ID	parameter	mean	sd	cv	n	min	max	rank
Hagermans UP	total_score	182.9	10.1	14.4	25	165	204	15
Hagermans UP	bank_coup_RR	26.6	4.1	6.0	25	19	34	14
Hagermans UP	COND_BANKS	12.2	2.5	4.0	25	7	16	13
Hagermans UP	instr_coup_RR	26.6	2.6	4.0	25	19	31	12
Hagermans UP	BANK_VEG	14.4	2.2	3.0	25	11	19	11
Hagermans UP	SED_DEP	12.7	1.9	3.0	25	7	15	10
Hagermans UP	VELO_DEPTH	14.3	2.2	3.0	25	10	19	9
Hagermans UP	CHAN_FLOW	15.9	2.0	2.6	25	13	19	8
Hagermans UP	CHAN_ALT	14.8	1.8	2.4	25	11	18	7
Hagermans UP	EMBEDDED	14.0	1.4	2.0	25	11	16	6
Hagermans UP	EPI_SUB	17.1	1.3	1.6	25	15	20	5
Hagermans UP	GRAZING	17.2	1.4	1.6	25	15	20	4
Hagermans UP	RIP_VEG_ZONE	15.7	1.2	1.6	25	13	18	3
Hagermans UP	INST_COVER	16.6	1.1	1.4	25	15	19	2
Hagermans UP	FREQ_RIFF	18.2	1.1	1.2	25	16	20	1

Hagermans Downstream

Before interpreting results, keep in mind that Hagermans Run Downstream is impaired for:
1) HABITAT ALTERATIONS – CHANNELIZATION,
2) CAUSE UNKNOWN – URBAN RUNOFF/STORM SEWERS, and
3) PATHOGENS – SOURCE UNKNOWN.

Hagermans Run Downstream - Riffle/Run Site

Figure 1. Hagermans Downstream raw parameters, ordered from smallest to largest standard deviation.

Figure 2. Hagermans Downstream couplet and total scores, ordered from smallest to largest standard deviation. Dashed horizontal lines indicate impairment thresholds for couplets (24) and total score (140).

Table 1. Summary Statistics of Hagermans Downstream raw parameter, couplet, and total scores. Rank column sorted by ascending coefficient of variation (cv) of parameter.

SEC_STATION_ID	parameter	mean	sd	cv	n	min	max	rank
Hagermans DOWN	total_score	124.1	11.8	24.0	27	94	146	15
Hagermans DOWN	bank_coup_RR	14.2	4.4	12.4	27	9	22	14
Hagermans DOWN	COND_BANKS	6.3	2.5	8.0	27	2	10	13
Hagermans DOWN	GRAZING	6.4	2.2	6.8	27	3	12	12
Hagermans DOWN	RIP_VEG_ZONE	4.0	1.2	6.0	27	2	7	11
Hagermans DOWN	CHAN_ALT	8.7	2.4	5.6	27	4	12	10
Hagermans DOWN	instr_coup_RR	20.6	2.8	5.6	27	15	26	9
Hagermans DOWN	BANK_VEG	7.9	2.1	5.4	27	5	12	8
Hagermans DOWN	SED_DEP	8.7	2.2	5.0	27	5	12	7
Hagermans DOWN	FREQ_RIFF	15.7	3.0	3.8	27	5	18	6
Hagermans DOWN	CHAN_FLOW	13.1	2.1	3.2	27	9	18	5
Hagermans DOWN	EMBEDDED	11.9	1.5	2.6	27	10	15	4
Hagermans DOWN	VELO_DEPTH	12.4	1.6	2.6	27	10	16	3
Hagermans DOWN	INST_COVER	13.8	1.6	2.4	27	11	17	2
Hagermans DOWN	EPI_SUB	15.0	1.3	1.8	27	12	17	1

Method Precision

Unlike other indices, the physical habitat assessment does not have an associated precision estimate that would allow for spatial or temporal comparisons of to determine if differences in scores are significant. This training represents the beginnings of a dataset to examine intrasite precision (\(PE_I\)). \(PE_I\) is defined as multiple samples taken at the same site at the same time.

Most staff used the low gradient form to assess Bottle Run. There are very few staff that have current audits for the low gradient protocol, so this site was removed from PE analysis. The riffle/run sites (Hagermans DOWN and UP) were included. CVs were calculated for each parameter at each site. CVs were calculated for all staff, staff that have current habitat assessment method audits, and staff that have not been audited.

A number of meaningful observations were made using this dataset:

Variability (CVs) in scores at sites was dependent on site condition. The Hagermans UP site with more intact habitat had lower variability for all parameters, couplets, and total score compared to Hagermans DOWN, which had more impacted habitat.
For most parameters, the CV for audited staff (n = 11) was less than the CV for staff that have not been audited (n = 17) and all staff (n = 28).

Figure 1. Coefficient of variation (CV) for individual parameters, couplets, and total score at Hagermans UP and DOWN sites. Parameters are in descending order of CV and displayed depending on whether the staff were audited, not audited, and the aggregate.

Figure 2. Coefficient of variation (CV) for surrounding the mean score (black triangle) for individual parameters, couplets, and total score at Hagermans UP and DOWN sites. Parameters are in descending order of CV and displayed depending on whether the staff were audited , not audited, and the aggregate. Individual staff scores are overplotted and symbolized according to DEP Region.

Recommendations

The format of the workshop allowed interesting observations of variability in physical habitat assessment scores to be understood. Additionally, this format allowed for the beginning of a dataset that can be built upon to generate PEs for the physical habitat assessment method. Some recommendations for continuing to improve the statistical validity of the physical habitat assessment method include:

Recommendation 1 Increase the number of staff that are trained and audited in the physical habitat assessment method.

Recommendation 2 The size of this dataset needs to be increased in order to better gage variability and quantify PEs. Increased frequency of physical habitat data collections at the same site, same time are needed generate additional data to calculate intrasite precision (\(PE_I\)). When audited staff are in the field with other staff (audited or non-audited), independent habitat assessments should be conducted and data provided to increase n.

Recommendation 3 Explore temporal precision (\(PE_T\)), defined as precision at the same site, through time. A number of confounding factors such as seasonality (e.g. vegetation cover, flow regime) and antecedent scouring flows can drastically affect habitat scores at the same site. Other indices have index periods when data can be collected and considered comparable; the physical habitat assessment has no such index period. We need to understand this variability and its contribution to physical habitat scores.

Recommendation 4 Investigate differential variability between least-disturbed and impacted sites. The data from Hagermans UP and DOWN showed clearly that variability in scoring was highest at the impacted (DOWN) site. Do we need to bin sites based on their condition prior to examining CVs?

Data and R Code

Click here to download a spreadsheet of the data included herein

Below is the code used to load necessary packages, import and clean data, create graphics, and generate statistics.

getwd()


options(scipen = 999)

# rm(list = ls())
# load packages -----------------------------------------------------------

library(readxl)
library(RColorBrewer)
library(plotly)
library(tidyverse)
library(tidylog)



# import data -------------------------------------------------------------

# vector of staff that have been audited for riff/run
audit_vec <-
  read_xlsx('20221020_RBP_Workshop_Data.xlsx', sheet = 'audits') %>% 
  drop_na(username, FreestoneHab) %>% 
  pull(username)
audit_vec


# RBP data from 2022 Bio Meeting
hab_df_wide <- 
  read_xlsx('20221020_RBP_Workshop_Data.xlsx', sheet = 'Sheet1') %>% 
  mutate(total_score = rowSums(across(SUB_AVAIL_COVER:RIP_VEG_ZONE), na.rm = T),
         instr_coup_RR = ifelse(BLN_GLIDE == 'N', EMBEDDED + SED_DEP, NA),
         bank_coup_RR = ifelse(BLN_GLIDE == 'N', COND_BANKS + BANK_VEG, NA),
         instr_coup_GP = ifelse(BLN_GLIDE == 'Y', POOL_SUB_CHAR + SED_DEP, NA),
         bank_coup_GP = ifelse(BLN_GLIDE == 'Y', COND_BANKS + BANK_VEG, NA),
         username = sub("^([^-]+-){2}", "", STR_STATION_ID), # remove username from STR_STATION_ID - text after second '-'
         riff_run_audit = case_when(
           username %in% audit_vec & BLN_GLIDE == 'N' ~ 'Y',
           BLN_GLIDE == 'N' ~ 'N',
           BLN_GLIDE == 'Y' ~ NA_character_,
           TRUE ~ 'panic')) %>% 
  relocate(c(username, riff_run_audit), .after = STR_STATION_ID)
hab_df_wide
sort(unique(hab_df_wide$SEC_STATION_ID))
sort(unique(hab_df_wide$REGION))


hab_df_wide %>% 
  filter(SEC_STATION_ID != 'Bottle') %>% 
  distinct(username, riff_run_audit) %>% 
  group_by(riff_run_audit) %>% 
  summarize(n = n())

# change to long format for plotting
hab_df_long <- 
  hab_df_wide %>% 
  pivot_longer(cols = 8:27, names_to = 'parameter', values_to = 'score', values_drop_na = T)
hab_df_long
sort(unique(hab_df_long$parameter))




# visualize ---------------------------------------------------------------

# Bottle - raw parameters
bottle_raw_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Bottle') %>% #distinct(parameter)
  filter(!(parameter %in% c('total_score', 'instr_coup_GP', 'bank_coup_GP'))) %>% 

  # ggplot(aes(x = parameter, y = score)) +
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  # geom_hline(yintercept = c(1, 5, 10, 15, 20), lty = 'dashed') +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 1, ymax = 5.5, fill = '#969696') + 
  annotate('text', x =0.5, y = 3.5, label = 'Poor', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 5.5, ymax = 10.5, fill = '#bdbdbd') + 
  annotate('text', x =0.5, y = 8, label = 'Marginal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 10.5, ymax = 15.5, fill = '#d9d9d9') + 
  annotate('text', x =0.5, y = 13, label = 'Suboptimal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 15.5, ymax = 20, fill = '#f0f0f0') + 
  annotate('text', x =0.5, y = 18, label = 'Optimal', angle = 90) +
  
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  scale_y_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal() 

bottle_raw_plot
  

# Bottle - couplets and total score
bottle_tot_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Bottle') %>% #distinct(parameter)
  filter(parameter %in% c('total_score', 'instr_coup_GP', 'bank_coup_GP')) %>% 
  
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  geom_hline(yintercept = c(20, 105), lty = 'dashed') +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

bottle_tot_plot

# stats

bottle_summ <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Bottle') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = round(mean(score, na.rm = T), 1), 
            sd = round(sd(score, na.rm = T), 1),
            n = n(),
            min = min(score, na.rm = T),
            max = max(score, na.rm = T)) %>% 
  
  mutate(cv = round(sd/mean, 2),
         total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
    
  select(-c(total, cv)) %>% 
  rename(cv = cv_pts) %>% 
  arrange(-cv) %>% 
  relocate(cv, .after = 'sd') %>% 
  mutate(rank = nrow(.):1) 

bottle_summ



# Hagermans UP - raw parameters
hag_up_raw_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans UP') %>% #distinct(parameter)
  filter(!(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR'))) %>% 

  # ggplot(aes(x = parameter, y = score)) +
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 1, ymax = 5.5, fill = '#969696') + 
  annotate('text', x =0.5, y = 3.5, label = 'Poor', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 5.5, ymax = 10.5, fill = '#bdbdbd') + 
  annotate('text', x =0.5, y = 8, label = 'Marginal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 10.5, ymax = 15.5, fill = '#d9d9d9') + 
  annotate('text', x =0.5, y = 13, label = 'Suboptimal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 15.5, ymax = 20, fill = '#f0f0f0') + 
  annotate('text', x =0.5, y = 18, label = 'Optimal', angle = 90) +
  
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  scale_y_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_up_raw_plot

# Hagermans UP - couplets and total score
hag_up_tot_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans UP') %>% #distinct(parameter)
  filter(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')) %>% 
  
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  geom_hline(yintercept = c(24, 140), lty = 'dashed') +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_up_tot_plot

# stats

hag_up_summ <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans UP') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = round(mean(score, na.rm = T), 1), 
            sd = round(sd(score, na.rm = T), 1),
            n = n(), 
            min = min(score, na.rm = T),
            max = max(score, na.rm = T)) %>% 
  
  mutate(cv = round(sd/mean, 2),
         total = case_when(
           str_detect(parameter, 'coup') ~ 40,
           str_detect(parameter, 'total') ~ 240,
           TRUE ~ 20),
         cv_pts = cv * total) %>% 
  
  select(-c(total, cv)) %>% 
  rename(cv = cv_pts) %>% 
  arrange(-cv) %>% 
  relocate(cv, .after = 'sd') %>% 
  mutate(rank = nrow(.):1) 


hag_up_summ



# Hagermans DOWN - raw parameters
hag_dn_raw_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans DOWN') %>% #distinct(parameter)
  filter(!(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR'))) %>% 
  
  # ggplot(aes(x = parameter, y = score)) +
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 1, ymax = 5.5, fill = '#969696') + 
  annotate('text', x =0.5, y = 3.5, label = 'Poor', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 5.5, ymax = 10.5, fill = '#bdbdbd') + 
  annotate('text', x =0.5, y = 8, label = 'Marginal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 10.5, ymax = 15.5, fill = '#d9d9d9') + 
  annotate('text', x =0.5, y = 13, label = 'Suboptimal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 15.5, ymax = 20, fill = '#f0f0f0') + 
  annotate('text', x =0.5, y = 18, label = 'Optimal', angle = 90) +
  
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  scale_y_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_dn_raw_plot


# Hagermans DOWN - couplets and total score
hag_dn_tot_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans DOWN') %>% #distinct(parameter)
  filter(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')) %>% 
  
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  geom_hline(yintercept = c(24, 140), lty = 'dashed') +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_dn_tot_plot


# stats
hag_dn_summ <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans DOWN') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = round(mean(score, na.rm = T), 1), 
            sd = round(sd(score, na.rm = T), 1),
            n = n(),
            min = min(score, na.rm = T),
            max = max(score, na.rm = T)) %>% 
  
  mutate(cv = round(sd/mean, 2),
         total = case_when(
           str_detect(parameter, 'coup') ~ 40,
           str_detect(parameter, 'total') ~ 240,
           TRUE ~ 20),
         cv_pts = cv * total) %>% 
  
  select(-c(total, cv)) %>% 
  rename(cv = cv_pts) %>% 
  arrange(-cv) %>% 
  relocate(cv, .after = 'sd') %>% 
  mutate(rank = nrow(.):1) 

hag_dn_summ


# overall summary

all_summ <- 
  
  bind_rows(bottle_summ, hag_up_summ, hag_dn_summ) %>% 
  group_by(parameter) %>% 
  summarize(mean_sd = round(mean(sd, na.rm = T), 1),
            n = n()) %>% 
  arrange(-mean_sd) %>% 
  mutate(rank = nrow(.):1) 

all_summ



# PEs ---------------------------------------------------------------------



PE_riff_run_all <- 
  hab_df_long %>% 
  filter(BLN_GLIDE == 'N') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = mean(score),
            sd = sd(score),
            cv = sd / mean) %>% 
  mutate(total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
  arrange(SEC_STATION_ID, -cv_pts) %>% 
  mutate(category = 'all')
PE_riff_run_all


PE_riff_run_audit <- 
  hab_df_long %>% 
  filter(BLN_GLIDE == 'N') %>% 
  filter(riff_run_audit == 'Y') %>%
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = mean(score),
            sd = sd(score),
            cv = sd / mean) %>% 
  mutate(total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
  arrange(SEC_STATION_ID, -cv_pts) %>% 
  mutate(category = 'audited staff')
PE_riff_run_audit


PE_riff_run_non_audit <- 
  hab_df_long %>% 
  filter(BLN_GLIDE == 'N') %>% 
  filter(riff_run_audit == 'N' | is.na(riff_run_audit)) %>%
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = mean(score),
            sd = sd(score),
            cv = sd / mean) %>% 
  mutate(total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
  arrange(SEC_STATION_ID, -cv_pts) %>% 
  mutate(category = 'non-audited staff')
PE_riff_run_non_audit


PE_df <- 
  bind_rows(PE_riff_run_all, PE_riff_run_audit, PE_riff_run_non_audit) %>% 
  mutate(fcategory = factor(category, levels = c('all', 'non-audited staff', 'audited staff')))
PE_df
levels(PE_df$fcategory)

rm(PE_riff_run_all, PE_riff_run_audit, PE_riff_run_non_audit)



cv_Hagermans_plot <- 
  ggplot(PE_df, aes(x = cv_pts, y = reorder(parameter, cv_pts, FUN = 'mean'), 
                    fill = fcategory)) +
  
  facet_wrap(~SEC_STATION_ID) +
  
  geom_bar(stat = 'identity', position = 'dodge') +
  
  scale_fill_manual(values = c('#1b9e77', '#d95f02', '#7570b3'),
                    breaks=c('audited staff', 'non-audited staff', 'all'),
                    name = '') +
  
  labs(x = 'CV (points)', y = 'Parameter - Ordered by Descending CV') + 
  theme_minimal() +
  theme(legend.position = 'bottom')

cv_Hagermans_plot



# show plot with actual scores overtop CVs

cv_Hagermans_points_plot <- 
  ggplot() +
  facet_wrap(~SEC_STATION_ID) +
  
  geom_linerange(data = filter(PE_df, !parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')),
                 aes(xmin = mean - (0.5*cv_pts), xmax = mean + (0.5*cv_pts), 
                     y = reorder(parameter, cv_pts, FUN = 'mean'),
                     color = category, size = category)) +
  
  # add black triangle for mean
  stat_summary(data = filter(PE_df, !parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')),
               aes(x= mean, y = reorder(parameter, cv_pts, FUN = 'mean')), 
               fun = mean, geom = "point", pch = 17, size = 6, color = "black") +
  
  
  scale_color_manual(values = c('#1b9e77', '#d95f02', '#7570b3'),
                     # breaks=c('audited staff', 'non-audited staff', 'all'),
                     name = '') +
  
  scale_size_manual(values = c(12,8,4),
                    # breaks=c('audited staff', 'non-audited staff', 'all'),
                    name = '') +
  
  
  geom_point(data = filter(hab_df_long, SEC_STATION_ID != 'Bottle' &
                             !parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')), 
             aes(x = score, y = parameter, fill = REGION), 
             pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.4, dodge.width = .2),
             inherit.aes = F) +
  
  # scale_x_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  labs(y = 'Parameter - Ordered by Descending CV') + 
  theme_bw()
cv_Hagermans_points_plot