Physical Habitat Assessment Results

README


PADEP conducted a physical habitat assessment training/calibration during the 2022 Biologists Meeting in Williamsport on October 20, 2022. The results were compiled and examined to identify parameters with weak precision that may deserve attention during future training/calibration events. Another objective was to begin investigating method precision estimates.


Site results are organized in tabbed format. Click tabs across the top to examine results from each site - and an overall summary.


If you would like to recreate this analysis, or complete additional analysis, the dataset and R code are located on the last tab





Bottle Run


Bottle Run - Low Gradient (Glide/Pool) Site


Before interpreting results, keep in mind that Bottle Run is impaired for:
1) FLOW REGIME MODIFICATION – RURAL (RESIDENTIAL AREAS),
2) ORGANIC ENRICHMENT – RURAL (RESIDENTIAL AREAS), and
3) PATHOGENS – SOURCE UNKNOWN.


Figure 1. Bottle Run raw parameters, ordered from smallest to largest standard deviation.



Figure 2. Bottle Run couplet and total scores, ordered from smallest to largest standard deviation. Dashed horizontal lines indicate impairment thresholds for couplets (20) and total score (105).



Table 1. Summary Statistics of Bottle Run raw parameter, couplet, and total scores. Rank column sorted by ascending coefficient of variation (cv) of parameter.
SEC_STATION_ID parameter mean sd cv n min max rank
Bottle total_score 88.2 9.6 26.4 24 69 106 12
Bottle instr_coup_GP 17.1 4.1 9.6 24 8 26 11
Bottle SED_DEP 6.0 2.7 9.0 24 2 15 10
Bottle bank_coup_GP 26.4 4.4 6.8 24 17 33 9
Bottle RIP_VEG_ZONE 7.5 2.5 6.6 24 4 16 8
Bottle POOL_VAR 6.0 1.8 6.0 24 3 10 7
Bottle CHAN_ALT 8.8 2.3 5.2 24 5 14 6
Bottle SUB_AVAIL_COVER 8.0 1.9 4.8 24 5 11 5
Bottle POOL_SUB_CHAR 11.1 2.5 4.6 24 6 16 4
Bottle CHAN_FLOW 14.5 3.0 4.2 24 5 19 3
Bottle BANK_VEG 13.2 2.6 4.0 24 9 17 2
Bottle COND_BANKS 13.2 2.5 3.8 24 8 17 1



Hagermans Upstream


Hagermans Run Upstream - Riffle/Run Site


Before interpreting results, keep in mind that Hagermans Run Upstream is attaining Aquatic Life, Recreational, and Potable Water Supply uses.


Figure 1. Hagermans Upstream raw parameters, ordered from smallest to largest standard deviation.



Figure 2. Hagermans Upstream couplet and total scores, ordered from smallest to largest standard deviation. Dashed horizontal lines indicate impairment thresholds for couplets (24) and total score (140).



Table 1. Summary Statistics of Hagermans Upstream raw parameter, couplet, and total scores. Rank column sorted by ascending coefficient of variation (cv) of parameter.
SEC_STATION_ID parameter mean sd cv n min max rank
Hagermans UP total_score 182.9 10.1 14.4 25 165 204 15
Hagermans UP bank_coup_RR 26.6 4.1 6.0 25 19 34 14
Hagermans UP COND_BANKS 12.2 2.5 4.0 25 7 16 13
Hagermans UP instr_coup_RR 26.6 2.6 4.0 25 19 31 12
Hagermans UP BANK_VEG 14.4 2.2 3.0 25 11 19 11
Hagermans UP SED_DEP 12.7 1.9 3.0 25 7 15 10
Hagermans UP VELO_DEPTH 14.3 2.2 3.0 25 10 19 9
Hagermans UP CHAN_FLOW 15.9 2.0 2.6 25 13 19 8
Hagermans UP CHAN_ALT 14.8 1.8 2.4 25 11 18 7
Hagermans UP EMBEDDED 14.0 1.4 2.0 25 11 16 6
Hagermans UP EPI_SUB 17.1 1.3 1.6 25 15 20 5
Hagermans UP GRAZING 17.2 1.4 1.6 25 15 20 4
Hagermans UP RIP_VEG_ZONE 15.7 1.2 1.6 25 13 18 3
Hagermans UP INST_COVER 16.6 1.1 1.4 25 15 19 2
Hagermans UP FREQ_RIFF 18.2 1.1 1.2 25 16 20 1



Hagermans Downstream


Before interpreting results, keep in mind that Hagermans Run Downstream is impaired for:
1) HABITAT ALTERATIONS – CHANNELIZATION,
2) CAUSE UNKNOWN – URBAN RUNOFF/STORM SEWERS, and
3) PATHOGENS – SOURCE UNKNOWN.


Hagermans Run Downstream - Riffle/Run Site


Figure 1. Hagermans Downstream raw parameters, ordered from smallest to largest standard deviation.



Figure 2. Hagermans Downstream couplet and total scores, ordered from smallest to largest standard deviation. Dashed horizontal lines indicate impairment thresholds for couplets (24) and total score (140).



Table 1. Summary Statistics of Hagermans Downstream raw parameter, couplet, and total scores. Rank column sorted by ascending coefficient of variation (cv) of parameter.
SEC_STATION_ID parameter mean sd cv n min max rank
Hagermans DOWN total_score 124.1 11.8 24.0 27 94 146 15
Hagermans DOWN bank_coup_RR 14.2 4.4 12.4 27 9 22 14
Hagermans DOWN COND_BANKS 6.3 2.5 8.0 27 2 10 13
Hagermans DOWN GRAZING 6.4 2.2 6.8 27 3 12 12
Hagermans DOWN RIP_VEG_ZONE 4.0 1.2 6.0 27 2 7 11
Hagermans DOWN CHAN_ALT 8.7 2.4 5.6 27 4 12 10
Hagermans DOWN instr_coup_RR 20.6 2.8 5.6 27 15 26 9
Hagermans DOWN BANK_VEG 7.9 2.1 5.4 27 5 12 8
Hagermans DOWN SED_DEP 8.7 2.2 5.0 27 5 12 7
Hagermans DOWN FREQ_RIFF 15.7 3.0 3.8 27 5 18 6
Hagermans DOWN CHAN_FLOW 13.1 2.1 3.2 27 9 18 5
Hagermans DOWN EMBEDDED 11.9 1.5 2.6 27 10 15 4
Hagermans DOWN VELO_DEPTH 12.4 1.6 2.6 27 10 16 3
Hagermans DOWN INST_COVER 13.8 1.6 2.4 27 11 17 2
Hagermans DOWN EPI_SUB 15.0 1.3 1.8 27 12 17 1



Method Precision


Unlike other indices, the physical habitat assessment does not have an associated precision estimate that would allow for spatial or temporal comparisons of to determine if differences in scores are significant. This training represents the beginnings of a dataset to examine intrasite precision (\(PE_I\)). \(PE_I\) is defined as multiple samples taken at the same site at the same time.


Most staff used the low gradient form to assess Bottle Run. There are very few staff that have current audits for the low gradient protocol, so this site was removed from PE analysis. The riffle/run sites (Hagermans DOWN and UP) were included. CVs were calculated for each parameter at each site. CVs were calculated for all staff, staff that have current habitat assessment method audits, and staff that have not been audited.


A number of meaningful observations were made using this dataset:

  1. Variability (CVs) in scores at sites was dependent on site condition. The Hagermans UP site with more intact habitat had lower variability for all parameters, couplets, and total score compared to Hagermans DOWN, which had more impacted habitat.

  2. For most parameters, the CV for audited staff (n = 11) was less than the CV for staff that have not been audited (n = 17) and all staff (n = 28).


Figure 1. Coefficient of variation (CV) for individual parameters, couplets, and total score at Hagermans UP and DOWN sites. Parameters are in descending order of CV and displayed depending on whether the staff were audited, not audited, and the aggregate.


Figure 2. Coefficient of variation (CV) for surrounding the mean score (black triangle) for individual parameters, couplets, and total score at Hagermans UP and DOWN sites. Parameters are in descending order of CV and displayed depending on whether the staff were audited , not audited, and the aggregate. Individual staff scores are overplotted and symbolized according to DEP Region.

Recommendations


The format of the workshop allowed interesting observations of variability in physical habitat assessment scores to be understood. Additionally, this format allowed for the beginning of a dataset that can be built upon to generate PEs for the physical habitat assessment method. Some recommendations for continuing to improve the statistical validity of the physical habitat assessment method include:


Recommendation 1 Increase the number of staff that are trained and audited in the physical habitat assessment method.


Recommendation 2 The size of this dataset needs to be increased in order to better gage variability and quantify PEs. Increased frequency of physical habitat data collections at the same site, same time are needed generate additional data to calculate intrasite precision (\(PE_I\)). When audited staff are in the field with other staff (audited or non-audited), independent habitat assessments should be conducted and data provided to increase n.


Recommendation 3 Explore temporal precision (\(PE_T\)), defined as precision at the same site, through time. A number of confounding factors such as seasonality (e.g. vegetation cover, flow regime) and antecedent scouring flows can drastically affect habitat scores at the same site. Other indices have index periods when data can be collected and considered comparable; the physical habitat assessment has no such index period. We need to understand this variability and its contribution to physical habitat scores.


Recommendation 4 Investigate differential variability between least-disturbed and impacted sites. The data from Hagermans UP and DOWN showed clearly that variability in scoring was highest at the impacted (DOWN) site. Do we need to bin sites based on their condition prior to examining CVs?



Data and R Code


Click here to download a spreadsheet of the data included herein



Below is the code used to load necessary packages, import and clean data, create graphics, and generate statistics.

getwd()


options(scipen = 999)

# rm(list = ls())
# load packages -----------------------------------------------------------

library(readxl)
library(RColorBrewer)
library(plotly)
library(tidyverse)
library(tidylog)



# import data -------------------------------------------------------------

# vector of staff that have been audited for riff/run
audit_vec <-
  read_xlsx('20221020_RBP_Workshop_Data.xlsx', sheet = 'audits') %>% 
  drop_na(username, FreestoneHab) %>% 
  pull(username)
audit_vec


# RBP data from 2022 Bio Meeting
hab_df_wide <- 
  read_xlsx('20221020_RBP_Workshop_Data.xlsx', sheet = 'Sheet1') %>% 
  mutate(total_score = rowSums(across(SUB_AVAIL_COVER:RIP_VEG_ZONE), na.rm = T),
         instr_coup_RR = ifelse(BLN_GLIDE == 'N', EMBEDDED + SED_DEP, NA),
         bank_coup_RR = ifelse(BLN_GLIDE == 'N', COND_BANKS + BANK_VEG, NA),
         instr_coup_GP = ifelse(BLN_GLIDE == 'Y', POOL_SUB_CHAR + SED_DEP, NA),
         bank_coup_GP = ifelse(BLN_GLIDE == 'Y', COND_BANKS + BANK_VEG, NA),
         username = sub("^([^-]+-){2}", "", STR_STATION_ID), # remove username from STR_STATION_ID - text after second '-'
         riff_run_audit = case_when(
           username %in% audit_vec & BLN_GLIDE == 'N' ~ 'Y',
           BLN_GLIDE == 'N' ~ 'N',
           BLN_GLIDE == 'Y' ~ NA_character_,
           TRUE ~ 'panic')) %>% 
  relocate(c(username, riff_run_audit), .after = STR_STATION_ID)
hab_df_wide
sort(unique(hab_df_wide$SEC_STATION_ID))
sort(unique(hab_df_wide$REGION))


hab_df_wide %>% 
  filter(SEC_STATION_ID != 'Bottle') %>% 
  distinct(username, riff_run_audit) %>% 
  group_by(riff_run_audit) %>% 
  summarize(n = n())

# change to long format for plotting
hab_df_long <- 
  hab_df_wide %>% 
  pivot_longer(cols = 8:27, names_to = 'parameter', values_to = 'score', values_drop_na = T)
hab_df_long
sort(unique(hab_df_long$parameter))




# visualize ---------------------------------------------------------------

# Bottle - raw parameters
bottle_raw_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Bottle') %>% #distinct(parameter)
  filter(!(parameter %in% c('total_score', 'instr_coup_GP', 'bank_coup_GP'))) %>% 

  # ggplot(aes(x = parameter, y = score)) +
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  # geom_hline(yintercept = c(1, 5, 10, 15, 20), lty = 'dashed') +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 1, ymax = 5.5, fill = '#969696') + 
  annotate('text', x =0.5, y = 3.5, label = 'Poor', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 5.5, ymax = 10.5, fill = '#bdbdbd') + 
  annotate('text', x =0.5, y = 8, label = 'Marginal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 10.5, ymax = 15.5, fill = '#d9d9d9') + 
  annotate('text', x =0.5, y = 13, label = 'Suboptimal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 15.5, ymax = 20, fill = '#f0f0f0') + 
  annotate('text', x =0.5, y = 18, label = 'Optimal', angle = 90) +
  
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  scale_y_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal() 

bottle_raw_plot
  

# Bottle - couplets and total score
bottle_tot_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Bottle') %>% #distinct(parameter)
  filter(parameter %in% c('total_score', 'instr_coup_GP', 'bank_coup_GP')) %>% 
  
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  geom_hline(yintercept = c(20, 105), lty = 'dashed') +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

bottle_tot_plot

# stats

bottle_summ <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Bottle') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = round(mean(score, na.rm = T), 1), 
            sd = round(sd(score, na.rm = T), 1),
            n = n(),
            min = min(score, na.rm = T),
            max = max(score, na.rm = T)) %>% 
  
  mutate(cv = round(sd/mean, 2),
         total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
    
  select(-c(total, cv)) %>% 
  rename(cv = cv_pts) %>% 
  arrange(-cv) %>% 
  relocate(cv, .after = 'sd') %>% 
  mutate(rank = nrow(.):1) 

bottle_summ



# Hagermans UP - raw parameters
hag_up_raw_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans UP') %>% #distinct(parameter)
  filter(!(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR'))) %>% 

  # ggplot(aes(x = parameter, y = score)) +
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 1, ymax = 5.5, fill = '#969696') + 
  annotate('text', x =0.5, y = 3.5, label = 'Poor', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 5.5, ymax = 10.5, fill = '#bdbdbd') + 
  annotate('text', x =0.5, y = 8, label = 'Marginal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 10.5, ymax = 15.5, fill = '#d9d9d9') + 
  annotate('text', x =0.5, y = 13, label = 'Suboptimal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 15.5, ymax = 20, fill = '#f0f0f0') + 
  annotate('text', x =0.5, y = 18, label = 'Optimal', angle = 90) +
  
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  scale_y_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_up_raw_plot

# Hagermans UP - couplets and total score
hag_up_tot_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans UP') %>% #distinct(parameter)
  filter(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')) %>% 
  
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  geom_hline(yintercept = c(24, 140), lty = 'dashed') +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_up_tot_plot

# stats

hag_up_summ <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans UP') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = round(mean(score, na.rm = T), 1), 
            sd = round(sd(score, na.rm = T), 1),
            n = n(), 
            min = min(score, na.rm = T),
            max = max(score, na.rm = T)) %>% 
  
  mutate(cv = round(sd/mean, 2),
         total = case_when(
           str_detect(parameter, 'coup') ~ 40,
           str_detect(parameter, 'total') ~ 240,
           TRUE ~ 20),
         cv_pts = cv * total) %>% 
  
  select(-c(total, cv)) %>% 
  rename(cv = cv_pts) %>% 
  arrange(-cv) %>% 
  relocate(cv, .after = 'sd') %>% 
  mutate(rank = nrow(.):1) 


hag_up_summ



# Hagermans DOWN - raw parameters
hag_dn_raw_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans DOWN') %>% #distinct(parameter)
  filter(!(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR'))) %>% 
  
  # ggplot(aes(x = parameter, y = score)) +
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 1, ymax = 5.5, fill = '#969696') + 
  annotate('text', x =0.5, y = 3.5, label = 'Poor', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 5.5, ymax = 10.5, fill = '#bdbdbd') + 
  annotate('text', x =0.5, y = 8, label = 'Marginal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 10.5, ymax = 15.5, fill = '#d9d9d9') + 
  annotate('text', x =0.5, y = 13, label = 'Suboptimal', angle = 90) +
  
  geom_rect(xmin = 0, xmax = Inf, ymin = 15.5, ymax = 20, fill = '#f0f0f0') + 
  annotate('text', x =0.5, y = 18, label = 'Optimal', angle = 90) +
  
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  scale_y_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_dn_raw_plot


# Hagermans DOWN - couplets and total score
hag_dn_tot_plot <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans DOWN') %>% #distinct(parameter)
  filter(parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')) %>% 
  
  ggplot(aes(x = reorder(parameter, score, FUN = sd), y = score)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(aes(fill = REGION, label = username), pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.6, dodge.width = .2)) +
  geom_hline(yintercept = c(24, 140), lty = 'dashed') +
  scale_fill_brewer(palette = 'Set1') +
  labs(x = 'Parameter - Ordered by Ascending Standard Deviation') + 
  theme_minimal()

hag_dn_tot_plot


# stats
hag_dn_summ <- 
  
  hab_df_long %>% 
  filter(SEC_STATION_ID == 'Hagermans DOWN') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = round(mean(score, na.rm = T), 1), 
            sd = round(sd(score, na.rm = T), 1),
            n = n(),
            min = min(score, na.rm = T),
            max = max(score, na.rm = T)) %>% 
  
  mutate(cv = round(sd/mean, 2),
         total = case_when(
           str_detect(parameter, 'coup') ~ 40,
           str_detect(parameter, 'total') ~ 240,
           TRUE ~ 20),
         cv_pts = cv * total) %>% 
  
  select(-c(total, cv)) %>% 
  rename(cv = cv_pts) %>% 
  arrange(-cv) %>% 
  relocate(cv, .after = 'sd') %>% 
  mutate(rank = nrow(.):1) 

hag_dn_summ


# overall summary

all_summ <- 
  
  bind_rows(bottle_summ, hag_up_summ, hag_dn_summ) %>% 
  group_by(parameter) %>% 
  summarize(mean_sd = round(mean(sd, na.rm = T), 1),
            n = n()) %>% 
  arrange(-mean_sd) %>% 
  mutate(rank = nrow(.):1) 

all_summ



# PEs ---------------------------------------------------------------------



PE_riff_run_all <- 
  hab_df_long %>% 
  filter(BLN_GLIDE == 'N') %>% 
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = mean(score),
            sd = sd(score),
            cv = sd / mean) %>% 
  mutate(total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
  arrange(SEC_STATION_ID, -cv_pts) %>% 
  mutate(category = 'all')
PE_riff_run_all


PE_riff_run_audit <- 
  hab_df_long %>% 
  filter(BLN_GLIDE == 'N') %>% 
  filter(riff_run_audit == 'Y') %>%
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = mean(score),
            sd = sd(score),
            cv = sd / mean) %>% 
  mutate(total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
  arrange(SEC_STATION_ID, -cv_pts) %>% 
  mutate(category = 'audited staff')
PE_riff_run_audit


PE_riff_run_non_audit <- 
  hab_df_long %>% 
  filter(BLN_GLIDE == 'N') %>% 
  filter(riff_run_audit == 'N' | is.na(riff_run_audit)) %>%
  group_by(SEC_STATION_ID, parameter) %>% 
  summarize(mean = mean(score),
            sd = sd(score),
            cv = sd / mean) %>% 
  mutate(total = case_when(
    str_detect(parameter, 'coup') ~ 40,
    str_detect(parameter, 'total') ~ 240,
    TRUE ~ 20),
    cv_pts = cv * total) %>% 
  arrange(SEC_STATION_ID, -cv_pts) %>% 
  mutate(category = 'non-audited staff')
PE_riff_run_non_audit


PE_df <- 
  bind_rows(PE_riff_run_all, PE_riff_run_audit, PE_riff_run_non_audit) %>% 
  mutate(fcategory = factor(category, levels = c('all', 'non-audited staff', 'audited staff')))
PE_df
levels(PE_df$fcategory)

rm(PE_riff_run_all, PE_riff_run_audit, PE_riff_run_non_audit)



cv_Hagermans_plot <- 
  ggplot(PE_df, aes(x = cv_pts, y = reorder(parameter, cv_pts, FUN = 'mean'), 
                    fill = fcategory)) +
  
  facet_wrap(~SEC_STATION_ID) +
  
  geom_bar(stat = 'identity', position = 'dodge') +
  
  scale_fill_manual(values = c('#1b9e77', '#d95f02', '#7570b3'),
                    breaks=c('audited staff', 'non-audited staff', 'all'),
                    name = '') +
  
  labs(x = 'CV (points)', y = 'Parameter - Ordered by Descending CV') + 
  theme_minimal() +
  theme(legend.position = 'bottom')

cv_Hagermans_plot



# show plot with actual scores overtop CVs

cv_Hagermans_points_plot <- 
  ggplot() +
  facet_wrap(~SEC_STATION_ID) +
  
  geom_linerange(data = filter(PE_df, !parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')),
                 aes(xmin = mean - (0.5*cv_pts), xmax = mean + (0.5*cv_pts), 
                     y = reorder(parameter, cv_pts, FUN = 'mean'),
                     color = category, size = category)) +
  
  # add black triangle for mean
  stat_summary(data = filter(PE_df, !parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')),
               aes(x= mean, y = reorder(parameter, cv_pts, FUN = 'mean')), 
               fun = mean, geom = "point", pch = 17, size = 6, color = "black") +
  
  
  scale_color_manual(values = c('#1b9e77', '#d95f02', '#7570b3'),
                     # breaks=c('audited staff', 'non-audited staff', 'all'),
                     name = '') +
  
  scale_size_manual(values = c(12,8,4),
                    # breaks=c('audited staff', 'non-audited staff', 'all'),
                    name = '') +
  
  
  geom_point(data = filter(hab_df_long, SEC_STATION_ID != 'Bottle' &
                             !parameter %in% c('total_score', 'instr_coup_RR', 'bank_coup_RR')), 
             aes(x = score, y = parameter, fill = REGION), 
             pch = 21, alpha = 0.8, size = 3,
             position = position_jitterdodge(jitter.width =.4, dodge.width = .2),
             inherit.aes = F) +
  
  # scale_x_continuous(limits = c(1,20)) +
  scale_fill_brewer(palette = 'Set1') +
  labs(y = 'Parameter - Ordered by Descending CV') + 
  theme_bw()
cv_Hagermans_points_plot