Introduction

In this exploratory data analysis, we will be examining the data collected by Carleton University BIOL 5512 students in the Fall of 2022 for their project on the Department of Fisheries and Oceans standards regarding the impact of human development activities on fish habitats. This data is currently unpublished, however, the graduate students involved in the project (including myself), are working towards completing the paper for publication.

In 2019, the Department of Fisheries and Oceans (DFO) updated the Fisheries Act to ensure that development activity that may cause the harmful alteration, disruption, or destruction (HADD) of fish habitats is met with offsetting measures that achieve no net loss (NNL). To assess compliance with the updated fisheries act, we acquired 109 authorizations issued by the DFO in 2020 and examined them as well as any supplementary documentation. Offsetting is a widespread global standard when assessing and authorizing developmental projects, yet Canada only appeared to recognize its significance in recent years. Despite this recognition, no research has been conducted to evaluate the DFO’s ability to uphold this new standard.

HADD authorizations and associated offsetting plans are intended to be accessible, however, of the 109 authorizations we only received plans for nine of the authorizations and only ten authorizations were listed as “to be developed”. In addition, seven authorizations stated no offsetting plan existed and two did not mention an offsetting plan. Remarkably, this leaves 81 offsetting plans omitted by DFO from the initial ATIP request. Several unsuccessful attempts to determine the cause of this discrepancy have been made since the Fall of 2022, thus showing DFO’s lack of transparency when it comes to HADD offsetting.

For the purpose of this EDA, we will proceed with the current data. Despite a lack of specific offsetting details, many authorizations provided details on requirements for the offsetting plans. Therefore, we will proceed under the assumption that offsetting plans submitted by developers meet the “expectations” set by DFO in hopes to determine if DFO is requiring developers to achieve NNL when causing harmful alteration, disruption, or destruction of fish habitats, as per the 2019 Fisheries Act.

Data

The data used in this EDA is the data compiled by graduate students of Carleton University for a project in their BIOL 5512 class. The total dataset includes 20 variables of information extracted from 109 HADD Authorizations issued by the DFO in 2020.

The 20 variables included in the dataset are describe below.

Variable descriptions:

  • ID: the ID number assigned to the authorization for data extraction
  • PROVINCE: the province in which the authorization was issued
  • AUTHORIZATION_TYPE: listed authorization type according to Schedule 1, Section 16 of the Fisheries Act
  • DATE_ISSUED: the date the authorization was issued in mm/dd/yyyy format
  • END_DATE: the end date of the authorization in mm/dd/yyyy format
  • HADD_DESCRIPTION: the form of HADD(s) from destruction (irreversible damage), disruption (temporary change in habitat/waterflow), or alteration (permanet change of habitat/waterflow)
  • DEVELOPMENT_ACTIVITY: the development activity the authorization from on-water development, mining, roads and highways, urban development, industrial, habitat restoration, railways, oil and gas, agriculture, forestry, rural development, and other
  • CONSTRUCTION: the construction action(s) used for the development type
  • HABITAT_VALUE: the habitat value from marginal (low productivity for fish), critical (feeding, growth and migration), or important (part of recreational or commercial fisheries)
  • HABITAT_TYPE: the habitat type(s) from in-channel, off-channel, lacustrine, estuarine, marine, and riparian
  • HABITAT_LOSS: habitat loss area in meters-squared (all other units are NA)
  • OFFSET_AREA: offsetting area in meters-squared (all other units are NA)
  • RATIO: single ratio reported for lost habitat
  • TECHNIQUE: compensation technique from (1) create in-kind habitat, (2) increase in-kind (habitat) productivity, (3) create out-of-kind habitat, (4) increase out-of-kind habitat productivity, (5) create or increase habitat in a different ecological unit (same species), (6) create or increase habitat in a different ecological unit (different species), (7) artificial propagation, (8) other, (9) none
  • PRE_CLASS: classification of pre-impact monitoring technique from basic, type 1, type 2, or research
  • POST_CLASS: classification of post-construction monitoring technique from: basic, type 1, type 2, or research
  • DURATION: duration of post-construction monitoring in years
  • NNL: achievement of no net loss from yes, no, unknown, and NA
  • logHABITAT_LOSS: logarithmic (base 10) of habitat loss area in meters-squared
  • logOFFSET_AREA: logarithmic (base 10) of offset area in meters-squared
Table 1: Example output, first 4 lines, for the dataframe to clarify the structure of the data.
ID PROVINCE AUTHORIZATION_TYPE DATE_ISSUED END_DATE HADD_DESCRIPTION DEVELOPMENT_ACTIVITY CONSTRUCTION HABITAT_VALUE HABITAT_TYPE HABITAT_LOSS OFFSET_AREA RATIO TECHNIQUE PRE_CLASS POST_CLASS DURATION NNL logHABITAT_LOSS logOFFSET_AREA
70 NB 1 08-10-2020 11-15-0202 Destruction, disruption Roads and highways Roadbuilding, culvert NA Off-channel 2293 2293 1.00 3 NA Basic NA Y 3.360404 3.360404
20 QU 1 07-10-2020 09-15-0202 Destruction, disruption Habitat restoration Bank stabalization Critical Riparian 4251 4251 1.00 1 Basic Basic 5 Y 3.628491 3.628491
21 QU 1 07-02-2020 12-31-2020 Destruction, disruption Habitat restoration Bank stabalization Critical Riparian 2250 200 0.09 6 Basic Basic 5 N 3.352183 2.301030
22 QU 1 01-17-2020 08-15-2020 NA On-water development Dredging Important Marine 4990 NA NA NA Basic Type 1 5 U 3.698100 NA

Data hygiene

Before proceeding with the analysis, we will preformed several variable checks to clean up the data. ID, DATE_ISSUED, and END_DATE are not relevant to this EDA and therefore no data hygiene will be preformed. In addition, CONSTRUCTION will not be examined as it is only used to determine DEVELOPMENT_ACTIVITY.

PROVINCE

unique(dat$PROVINCE) # therefore there are 14 provinces with Newfoundland listed as both NL and NFLD
##  [1] "NB"   "QU"   "NS"   "ON"   "BC"   "NT"   "NL"   "NV"   "AB"   "NFLD"
## [11] "PEI"  "MB"   "YT"   "SK"

By examining the unique terms, it is clear that Newfoundland is listed twice as both NL and NFLD. The correct acronym is NL and therefore NFLD will be replaced with NL.

dat$PROVINCE[dat$PROVINCE=="NFLD"]<-"NL" # replace NFLD with NL
unique(dat$PROVINCE) # therefore there are no errors in the province column with a total of 13 provinces listed
##  [1] "NB"  "QU"  "NS"  "ON"  "BC"  "NT"  "NL"  "NV"  "AB"  "PEI" "MB"  "YT" 
## [13] "SK"

HADD_DESCRIPTION

unique(dat$HADD_DESCRIPTION)  # 10 options with different combinations of 3 different terms
##  [1] "Destruction, disruption"             NA                                   
##  [3] "Destruction, alteration"             "Destruction"                        
##  [5] "Destruction, disruption, alteration" "Disruption, alteration"             
##  [7] "Alteration, destruction"             "Alteration"                         
##  [9] "Disruption"                          "Alteratiion, destruction"

By examining the unique terms, it is clear that there are several combinations of HADDs separated with commas with some terms capitalized. In addition, alteration is spelled two different ways. To fix this issue, HADD_DESCRIPTION will be separated in to three columns corresponding to a first, second, and third type of HADD. We will then transform the data into long format and examine the unique terms once again.

dat <- separate(dat, col = HADD_DESCRIPTION, into = c('HADD_description1', 'HADD_description2', 'HADD_description3'), sep = (", ")) # separate into 3 columns for 3 potential HADDs and add to dataframe

      dat$HADD_description1 <- ifelse(dat$HADD_description1 == 'Destruction', 'destruction',
             ifelse(dat$HADD_description1 == 'Disruption', 'disruption',
                    ifelse(dat$HADD_description1 == 'Alteration' | dat$HADD_description1 == 'Alteratiion', 'alteration', NA))) # rename in all lowercase and with alteration corrected

       HADD_DESCRIPTION_DATA <- select(dat, c(HADD_description1, HADD_description2, HADD_description3))
      
      HADD_DESCRIPTION_long <- HADD_DESCRIPTION_DATA %>%  
      pivot_longer(cols = everything(), names_to = "HADD_description_number", values_to = "description") # pivot data longer
      
       unique(HADD_DESCRIPTION_long$description)
## [1] "destruction" "disruption"  NA            "alteration"

DEVELOPMENT_ACTIVITY

 unique(dat$DEVELOPMENT_ACTIVITY) # confirm no errors or repeats in unique terms
##  [1] "Roads and highways"    "Habitat restoration"   "On-water development" 
##  [4] NA                      "Other"                 "Rural development"    
##  [7] "Urban development"     "Industrial"            "Mining"               
## [10] "Shoreline developmemt" "Railways"              "Agriculture"          
## [13] "Oil and gas"           "Forestry"

By examining the unique terms, we can confirm there are no errors or repeats and therefore no hygiene is required for DEVELOPMENT_ACTIVITY. All 13 development activities and NA appear in the data.

AUTHORIZATION_TYPE

unique(dat$AUTHORIZATION_TYPE) # confirm no errors or repeats in unique terms
## [1] 1

By examining the unique terms, we can confirm there are no errors or repeats and therefore no hygiene is required for AUTHORIZATION_TYPES. From this it appears that all authorizations were classified as type 1.

HABITAT_VALUE

unique(dat$HABITAT_VALUE) # confirm no errors or repeats in unique terms
## [1] NA          "Critical"  "Important" "Marginal"

By examining the unique terms, we can confirm there are no errors or repeats and therefore no hygiene is required for HABITAT_VALUE as all are classified as either marginal, important, critical, or NA.

HABITAT_TYPE

unique(dat$HABITAT_TYPE) # confirm no errors or repeats in unique terms
##  [1] "Off-channel"             "Riparian"               
##  [3] "Marine"                  "In-channel"             
##  [5] "Lacustrine"              "Estuarine"              
##  [7] "In-channel \nLacustrine" "In-channel \nRiparian"  
##  [9] "In-channel\nRiparian"    "Lacustrine, off-channel"
## [11] "In-channel, riparian"    "Off-channel, riparian"  
## [13] "In-channel, off-channel" "Esturarine"             
## [15] "Estuarine, marine"       "In-channel, ripirian"

By examining the unique terms, it is clear that there are several combinations of terms separated by three different separators. There is also a combination of capitalized terms and lowercase terms. Therefore, we must first replace all separators with a comma so we can separate the terms before creating two columns for those that have two habitat types listed. Then, we can remove capitalization and fix the spelling of the terms in both columns before transforming the data into long format to view the unique terms once again.

 dat <- dat %>% 
        mutate(HABITAT_TYPE = str_replace_all(HABITAT_TYPE, " \n", ", ")) %>%
        mutate(HABITAT_TYPE = str_replace_all(HABITAT_TYPE, "\n", ', ')) # replace " \n" and "\n" with ", " so we can separate 
dat <- separate(dat, col = HABITAT_TYPE, into = c('habitat1', 'habitat2'), sep = ", ") # separate into 2 columns for 2 potential habitat types and add to dataframe
 dat$habitat1 <- ifelse(dat$habitat1 == 'Estuarine'|dat$habitat1 == 'Esturarine', 'estuarine',
             ifelse(dat$habitat1 == "Off-channel", 'off-channel',
                    ifelse(dat$habitat1 == 'Riparian', 'riparian', 
                           ifelse(dat$habitat1 == 'Marine', 'marine',
                                  ifelse(dat$habitat1 == 'In-channel', 'in-channel',
                                         ifelse(dat$habitat1 == 'Lacustrine', 'lacustrine', NA)))))) # change all to lowercase and fix estuarine
  dat$habitat2 <- ifelse(dat$habitat2 == 'Lacustrine', 'lacustrine', 
                              ifelse(dat$habitat2 == 'riparian' | dat$habitat2 == 'ripirian' | dat$habitat2 == 'Riparian', 'riparian', 
                                     ifelse(dat$habitat2 == 'off-channel', 'off-channel',
                                            ifelse(dat$habitat2 == 'marine', 'marine', NA)))) # channge all to lowercase and fix riparian
  HABITAT_DATA <- select(dat, c(habitat1, habitat2)) # new data frame with only the habitat type data 
       HABITAT_DATA_LONG <- HABITAT_DATA %>%
         pivot_longer(cols = everything(), names_to = "Habitat_number", values_to = "Habitat_type") # pivot data longer
      unique(HABITAT_DATA_LONG$Habitat_type) # confirm no errors or repeats in unique terms
## [1] "off-channel" NA            "riparian"    "marine"      "in-channel" 
## [6] "lacustrine"  "estuarine"

HABITAT_LOSS

range(dat$HABITAT_LOSS, na.rm = T) # no negative numbers and therefore no obvious errors
## [1]      65 2299895

By viewing the range of habitat loss areas, there does not appear to be any obvious errors (i.e. negative numbers). Therefore, no hygiene is required for HABITAT_LOSS.

OFFSET_AREA

range(dat$OFFSET_AREA, na.rm = T) # no negative numbers and therefore no obvious errors
## [1]    100 531522

By viewing the range of offset areas, there does not appear to be any obvious errors (i.e. negative numbers). Therefore, no hygiene is required for OFFSET_AREA.

RATIO

range(dat$RATIO, na.rm = T) # no negative numbers and therefore no obvious errors
## [1]  0.02 28.56

By viewing the range of ratios, there does not appear to be any obvious errors (i.e. negative numbers). Therefore, no hygiene is required for RATIO.

TECHNIQUE

unique(dat$TECHNIQUE) # multiples separated with "," with a max of 2 potential types
##  [1] "3"    "1"    "6"    NA     "2"    "5"    "4"    "1, 3" "8"    "1,4" 
## [11] "1,2"  "2,3"

By examining the unique terms, it is clear that there are several combinations of offsetting techniques separated with commas with a maximum of two techniques. To fix this issue, TECHNIQUE will be separated in to two columns for the authorizations with multiple techniques and the terms will be confirmed. We will then transform the data into long format and examine the unique terms once again.

dat <- separate(dat, col = TECHNIQUE, into = c('technique1', 'technique2'), sep = ",") # separate into 2 columns for 2 potential technique types
 dat$technique2[dat$technique2==" 3"]<-"3" # replace " 3" with "3"
 TECHNIQUE_DATA <- select(dat, c(technique1, technique2))
       TECHNIQUE_DATA_LONG <- TECHNIQUE_DATA %>%
         pivot_longer(cols = everything(), names_to = "technique_number", values_to = "technique_type")
unique(TECHNIQUE_DATA_LONG$technique_type)
## [1] "3" NA  "1" "6" "2" "5" "4" "8"

PRE_CLASS

unique(dat$PRE_CLASS) # confirm no errors or repeats in unique terms
## [1] NA       "Basic"  "Type 2" "Type 1"

By examining the unique terms, we can confirm there are no errors or repeats and therefore no hygiene is required for PRE_CLASS. From this it appears that all authorizations are had basic, type 1, type 2, or no pre-impact monitoring.

POST_CLASS

unique(dat$POST_CLASS) # confirm no errors or repeats in unique terms
## [1] "Basic"    "Type 1"   "Type 2"   NA         "Research"

By examining the unique terms, we can confirm there are no errors or repeats and therefore no hygiene is required for POST_CLASS. From this it appears that all authorizations are had basic, type 1, type 2, research, or no post-construction monitoring.

DURATION

range(dat$DURATION, na.rm = T) # no negative numbers and therefore no obvious errors
## [1]  1 22

By viewing the range of post-construction monitoring duration, there does not appear to be any obvious errors (i.e. negative numbers). Therefore, no hygiene is required for DURATION.

NNL

unique(dat$NNL) # Y listed as "y" and "Y"
## [1] "Y" "N" "U" NA  "y"

By examining the unique terms, it is clear that achievement of NNL is listed as both “Y” and “y”. Therefore to correct this, “y” must be replaced by “Y”.

 dat$NNL[dat$NNL == "y"]<-"Y" # replace "y" with "Y"
       unique(dat$NNL) # Y confirm no errors or repeats in unique terms
## [1] "Y" "N" "U" NA

logHABITAT_LOSS

range(dat$logHABITAT_LOSS, na.rm = T) # no negative numbers and therefore no obvious errors
## [1] 1.812913 6.361708

By viewing the range of log (base 10) of habitat loss areas, there does not appear to be any obvious errors (i.e. negative numbers). Therefore, no hygiene is required for logHABITAT_LOSS.

logOFFSET_AREA

range(dat$logOFFSET_AREA, na.rm = T) # no negative numbers and therefore no obvious errors
## [1] 2.000000 5.725521

By viewing the range of log (base 10) of offset areas, there does not appear to be any obvious errors (i.e. negative numbers). Therefore, no hygiene is required for logOFFSET_AREA.

Clean data
Table 2: Example output, first 4 lines, for the dataframe to clarify the structure of the modified (clean) data.
ID PROVINCE AUTHORIZATION_TYPE DATE_ISSUED END_DATE HADD_description1 HADD_description2 HADD_description3 DEVELOPMENT_ACTIVITY CONSTRUCTION HABITAT_VALUE habitat1 habitat2 HABITAT_LOSS OFFSET_AREA RATIO technique1 technique2 PRE_CLASS POST_CLASS DURATION NNL logHABITAT_LOSS logOFFSET_AREA
70 NB 1 08-10-2020 11-15-0202 destruction disruption NA Roads and highways Roadbuilding, culvert NA off-channel NA 2293 2293 1.00 3 NA NA Basic NA Y 3.360404 3.360404
20 QU 1 07-10-2020 09-15-0202 destruction disruption NA Habitat restoration Bank stabalization Critical riparian NA 4251 4251 1.00 1 NA Basic Basic 5 Y 3.628491 3.628491
21 QU 1 07-02-2020 12-31-2020 destruction disruption NA Habitat restoration Bank stabalization Critical riparian NA 2250 200 0.09 6 NA Basic Basic 5 N 3.352183 2.301030
22 QU 1 01-17-2020 08-15-2020 NA NA NA On-water development Dredging Important marine NA 4990 NA NA NA NA Basic Type 1 5 U 3.698100 NA

Exploratory data analysis

Province

109 authorizations were issued across 10 provinces and 3 territories. The most authorizations were issued in Quebec accounting for 24.77% of all authorizations followed by Ontario and British Columbia accounting for 17.43% and 16.51% of all authorizations, respectively.

 p1 <- ggplot(dat, aes(x = fct_infreq(PROVINCE))) +
         geom_bar(fill = 'lightgrey', color = 'black') +
         xlab('Province or territory') +
         ylab('Number of authorizations') +
         theme_classic() # provincial authorization frequency plot
p1 # view p1
Number of authorizations issued by province or territory

Figure 1: Number of authorizations issued by province or territory

Table 3: Number of authorizations issued by province or territory
PROVINCE freq
QU 27
ON 19
BC 18
NB 16
NS 9
AB 5
PEI 5
YT 3
MB 2
NL 2
NT 1
NV 1
SK 1

HADD description

HADD description is classified as habitat destruction, disruption, and/or alteration. Of the 109 authorizations two did not include HADD descriptions, 46.79% of authorizations contained two forms of HADDs, 33.95% contained one form, and 15.60% contained all three forms of HADDs. Complete destruction of fish habitat occurred the most in 82.57% of authorizations followed by alteration and destruction at 61.47% and 33.95% of authorizations respectively. From this it appears that the more extreme impacts on fish habitat are more common, emphasizing the true importance of no net loss achievement to ensure net destruction does not occur.

      p2 <- ggplot(p2_no_na, aes(x = description)) +
        geom_bar(fill = 'lightgrey', color = 'black') +
        xlab('HADD Description') +
        ylab('Number of authorizations') +
        theme_classic() # HADD description authorization frequency plot
      
      p2 # view p2. Therefore, the most common form of HADD description was destruction followed by alteration and finally disruption. Therefore it seems more extreme impacts on the aquatic environments are more common
Number of authorizations issued by HADD description

Figure 2: Number of authorizations issued by HADD description

Table 4: Number of authorizations issued by HADD description
description freq
destruction 90
alteration 67
disruption 37

Development activity

The 109 authorizations encompassed 13 development activities, with only one authorization missing development activity information. The most common development activity was on-water development accounting for 30.28% of authorizations followed by roads and highways and urban development accounting for 25.69% and 12.85% of authorizations respectively.

p3 <- ggplot(dat, aes(x = fct_infreq(DEVELOPMENT_ACTIVITY))) +
       geom_bar(fill = 'lightgrey', color = 'black') +
       xlab('Development activity') +
       ylab('Number of authorizations') +
       theme_classic()+
       theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Development activity authorization frequency plot
     
p3 # view p3.
Number of authorizations issued by development activity

Figure 3: Number of authorizations issued by development activity

Table 5: Number of authorizations issued by development activity
DEVELOPMENT_ACTIVITY freq
On-water development 33
Roads and highways 28
Urban development 14
Habitat restoration 9
Industrial 8
Oil and gas 5
Mining 4
Railways 2
Agriculture 1
Forestry 1
Other 1
Rural development 1
Shoreline developmemt 1
NA 1

Authorization type

All 109 authorizations are listed as being of authorization type 1.

Table 6: Number of authorizations issued by authorization type
AUTHORIZATION_TYPE freq
1 109

Habitat value

Habitat value is to me classified as critical, important, or marginal. Of the 109 authorizations, 55.05% of the authorizations did not include enough information to assess the habitat value, this is alarming. Of those that could be assessed, critical habitats were impacted by far the most, accounting for 36.07% of all authorizations.

p4 <- ggplot(dat, aes(x = fct_infreq(HABITAT_VALUE))) +
       geom_bar(fill = 'lightgrey', color = 'black') +
       xlab('Habitat value') +
       ylab('Number of authorizations') +
       theme_classic() # Habitat value authorization frequency plot
    
p4 # view p4
Number of authorizations issued by habitat value

Figure 4: Number of authorizations issued by habitat value

Table 7: Number of authorizations issued by habitat value
HABITAT_VALUE freq
NA 60
Critical 40
Important 6
Marginal 3

Habitat type

The authorizations included development activities in six different habitat types: in-channel, marine, riparian, estuarine, off-channel, and lacustrine. 14.68% of the 109 authorizations included two different habitat types. The most common habitat affected was in-channel in 47.41% of authorizations followed by marine and riparian in 20.18% and 14.68% of authorizations, respectively.

p5 <- ggplot(p5_no_na, aes(x= fct_infreq(habitat_type))) +
        geom_bar(fill = 'lightgrey', color = 'black') +
        xlab('Habitat type') +
        ylab('Number of authorizations') +
        theme_classic() # Habitat type authorization frequency plot
      
p5 # view p5
Number of authorizations issued by habitat type

Figure 5: Number of authorizations issued by habitat type

Table 8: Number of authorizations issued by habitat type
habitat_type freq
in-channel 52
marine 22
riparian 16
estuarine 13
off-channel 13
lacustrine 9

The 14.68% of authorizations that include two habitat types rarely specify the habitat loss area and offset area for each habitat type. Therefore, we will list these as “multiples” and repeat the last analysis. With this modification, the most common habitat affected was in-channel with 36.70% of authorizations followed by marine and multiple accounting for 19.27% and 14.68% of authorizations, respectively.

p6 <- ggplot(dat5, aes(x= fct_infreq(HABITAT_TYPE))) +
      geom_bar(fill = 'lightgrey', color = 'black') +
      xlab('Habitat type') +
      ylab('Number of authorizations') +
      theme_classic() # Habitat type authorization frequency plot with multiples listed 
    p6 # view modified p6
Number of authorizations issued by habitat type

Figure 6: Number of authorizations issued by habitat type

Table 9: Number of authorizations issued by habitat type
HABITAT_TYPE freq
in-channel 40
marine 21
multiple 16
estuarine 12
off-channel 9
lacustrine 7
riparian 4

Habitat loss area and offset area

For the purpose of this analysis, any authorization listing multiple habitat types was listed as “multiple” and therefore it is unsurprising that this category accounted for the highest amount of habitat loss at 2,408,949 meters-squared. The authorizations within this category also encompassed the largest net deficit of area with 2,114,925 meters-squared. Out of all 7 habitat type categories, in-channel was the only habitat type in which an additional 28,063 meters-squared of offsetting was required, thus achieving no net loss. In all other categories, more habitat loss occurred than offsetting was to compensate for. Alarmingly, over the 109 authorizations there was 2,544,844 meters-squared of habitat loss that was not to be compensated for, despite offsetting standards of no net loss.

p7 <- ggplot(data = area_summary_log_w, aes(x = HABITAT_TYPE, y = area_, fill = loss_or_offset)) +
  geom_bar(stat = "identity", position = 'stack', colour = 'black') +
  xlab('Habitat type') +
  ylab(expression('Log'[10]*'(Area (m)'^'2'*')')) +
  guides(fill=guide_legend(title="Area type"))+
  theme_classic() +
  scale_fill_grey(start = 0.6, end = 0.8, labels = c('Habitat loss area', 'Offset area')) +
  scale_x_discrete(limits = c('in-channel', 'multiple', 'marine', 'estuarine', 'off-channel', 'lacustrine', 'riparian')) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) # new plot with the loss/offset areas for each habitat type indicated as a single sum value of all authorizations that apply
p7 # view p7
Habitat loss and offsetting areas in meters-squared for each habitat type

Figure 7: Habitat loss and offsetting areas in meters-squared for each habitat type

Table 10: Habitat loss and offsetting areas in meters-squared for each habitat type
HABITAT_TYPE HABITAT_LOSS OFFSET_AREA NET_AREA
in-channel 885027 913090 28063
multiple 2408649 293724 -2114925
marine 556006 456497 -99509
estuarine 358424 150809 -207615
off-channel 97284 7383 -89901
lacustrine 61731 4590 -57141
riparian 8267 4451 -3816

There is a clear, positive linear relationship between habitat loss area and offsetting area. The linear model of this relationship has a p-value less than 0.05, suggesting statistical significance. However, an multiple R-squared value of 0.5351 suggests there is significant variation within the model. This variation is unsurprising considering the findings thus far in this EDA suggest major inconsistencies in offsetting requirements.

p12 <- ggplot(dat, aes(x = logHABITAT_LOSS, y = logOFFSET_AREA)) +
       geom_point() +
       xlab(expression('Log'[10]*'(Habitat loss area (m)'^2*')')) +
       ylab(expression('Log'[10]*'(Offset area (m)'^2*')')) +
       geom_abline(aes(intercept = 0, slope = 1), colour = "black", linetype = "dashed") +
       geom_smooth(method = 'lm', colour = 'black')+
       theme_classic() # scatterplot of habitat loss area vs offset area (log transformed) with linear model and confidence intervals present
p12 # view p12
The base 10 logarithmic relationship between habitat loss and offsetting areas (meters-squared). The dashed line represents NNL acheivement, with values on or to the left of the line indicating sufficent offseting. The solid line is the linear model for the relationship with the shaded area representing the confidence intervals. The following are the regression statistics for the linear model

Figure 8: The base 10 logarithmic relationship between habitat loss and offsetting areas (meters-squared). The dashed line represents NNL acheivement, with values on or to the left of the line indicating sufficent offseting. The solid line is the linear model for the relationship with the shaded area representing the confidence intervals. The following are the regression statistics for the linear model

lm_p12 = lm(formula = logOFFSET_AREA ~ logHABITAT_LOSS, data = dat)
summary(lm_p12) # p-value <0.05 suggesting the LM on p12 is statistically significant and there is correlation between the habitat loss area and offset area 
## 
## Call:
## lm(formula = logOFFSET_AREA ~ logHABITAT_LOSS, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.69665 -0.21686  0.04896  0.30874  1.43681 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.77239    0.33202   2.326    0.023 *  
## logHABITAT_LOSS  0.79495    0.08985   8.848 6.36e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5525 on 68 degrees of freedom
##   (39 observations deleted due to missingness)
## Multiple R-squared:  0.5351, Adjusted R-squared:  0.5283 
## F-statistic: 78.28 on 1 and 68 DF,  p-value: 6.364e-13

Ratio

Of the 109 authorizations, only 70 authorizations include both habitat loss and offset areas, therefore only 65.22% of authorizations contain the necessary information for ratio to be calculated. The mean ratio is 2.40, suggesting that for those authorizations that provide both areas, more is offset than lost and NNL is achieved. Of the remaining authorizations, 33 do not include offsetting requirements, accounting for 2,722,419 meters-squared of lost habitat. This is a likely explanation of the discrepancy between the habitat loss and offset areas seen in Table 10, but suggests that DFO standards regarding authorization completeness are often not being met.

Compensation technique

The authorizations included seven compensation techniques: (1) create in-kind habitat, (2) increase in-kind (habitat) productivity, (3) create out-of-kind habitat, (4) increase out-of-kind habitat productivity, (5) create or increase habitat in a different ecological unit (same species), (6) create or increase habitat in a different ecological unit (different species), and (8) other. Of the 109 authorizations, only four contained multiple authorization techniques. The most common compensation technique used was the creation of in-kind habitat in 31.19% of authorizations followed by creation of out-of-kind habitat and increase of in-kind productivity with 18.35% and 11.01%, respectively. Since the techniques are ranked in preference, it is a good sign that the majority of offsetting measures tend to compensate using the most preferred techniques.

p8 <- ggplot(tech_no_na, aes(x= fct_infreq(technique_type))) +
        geom_bar(fill = 'lightgrey', color = 'black') +
        xlab('Technique') +
        ylab('Number of authorizations') +
        theme_classic() # Offsetting technique authorization frequency plot with multiples listed 
     p8 # view p8
Number of authorizations by compensation technique from: (1) create in-kind habitat, (2) increase in-kind (habitat) productivity, (3) create out-of-kind habitat, (4) increase out-of-kind habitat productivity, (5) create or increase habitat in a different ecological unit (same species), (6)   create or increase habitat in a different ecological unit (different species), (7) artificial propagation, (8) other, (9) none

Figure 9: Number of authorizations by compensation technique from: (1) create in-kind habitat, (2) increase in-kind (habitat) productivity, (3) create out-of-kind habitat, (4) increase out-of-kind habitat productivity, (5) create or increase habitat in a different ecological unit (same species), (6) create or increase habitat in a different ecological unit (different species), (7) artificial propagation, (8) other, (9) none

Table 11: Number of authorizations by compensation technique from: (1) create in-kind habitat, (2) increase in-kind (habitat) productivity, (3) create out-of-kind habitat, (4) increase out-of-kind habitat productivity, (5) create or increase habitat in a different ecological unit (same species), (6) create or increase habitat in a different ecological unit (different species), (7) artificial propagation, (8) other, (9) none
TECHNIQUE freq
1 34
3 20
2 12
4 11
8 7
5 3
6 2

Pre-impact and post-construction monitoring

Pre-impact and post-construction monitoring techniques can be classified as basic, type 1, type 2, or research. Only 55.05% of the 109 authorizations contain both monitoring techniques, once again suggesting authorization incompleteness. Of the authorizations, 51.38% do not list pre-impact assessments, whereas 10.09% do not list post-construction monitoring requirements. This may suggest a degree for leniency from the DFO when requiring pre-impact assessments. From Figure 9 it is clear that almost all pre-impact assessments are non-existent or basic, with very few being type 2 and type 1. Conversely, post-construction monitoring classifications appear authorization specific with no clear relationship to pre-impact assessments classification.

p9 <- ggplot(class, aes(x = PRE_CLASS, y = POST_CLASS))+
       geom_point(position = position_jitter(w = 0.1, h = 0.1), colour = 'black', size = 3, shape = 1)+ 
       xlab('Pre-Impact assessment class') +
       ylab('Post-Construction monitoring class') +
       theme_classic() # plot showing the frequency of each combination of pre and post assessment classes. 
     
       p9 # view p9. Pre are mainly basic or NA. The post assessments are much more varied with most being basic, type 2, or type 1. From this plot, there does not appear to be a significant relationship between the type of pre and post assessment classes. 
Scatter-plot showing the relationship between pre-impact and post-construction assessment class

Figure 10: Scatter-plot showing the relationship between pre-impact and post-construction assessment class

Table 12: Number of authorizations pre-impact and post-construction monitoring classification combinations
PRE_CLASS POST_CLASS freq
Basic Basic 20
NA Type 2 19
NA Basic 15
NA Type 1 14
Basic Type 2 14
Basic Type 1 8
NA NA 7
Type 2 Type 2 5
Basic NA 4
NA Research 1
Type 1 Type 1 1
Type 2 Type 1 1

Post-construction monitoring duration

Post-construction monitoring duration varies significantly among the 109 authorizations with duration ranging from one to 22 years. Of the authorizations, 28 or 25.69% do not include a post-construction monitoring duration. Only eight of these do not include post-construction monitoring classification, indicating 20 authorizations require post-monitoring and yet do not list duration. This is alarming as it would once again indicated DFO has provided incomplete authorizations. The most common post-construction monitoring duration is five-years and is required for 43.12% of authorizations.

  p10 <- ggplot(dat, aes(x = DURATION)) +
       geom_bar(fill = 'lightgrey', color = 'black') +
       xlab('Post-monitoring duration') +
       ylab('Number of authorizations') +
       theme_classic() # frequency of post monitoring durations 
    
    p10 # view p10 
Number of authorizations by post-construction monitoring duration (years)

Figure 11: Number of authorizations by post-construction monitoring duration (years)

Table 13: Number of authorizations by post-construction monitoring duration (years)
DURATION freq
5 47
NA 28
3 18
1 4
4 4
10 3
2 2
6 2
22 1

No net loss requirement status

NNL requirement status was only able to be determined for 81.65% of authorizations. Of the authorizations in which NNL requirement status was able to be determined, only 49.44% of authorizations are required by DFO to achieve NNL. Considering no net loss is the ultimate goal, it this is extremely concerning.

p11 <- ggplot(dat, aes(x = fct_infreq(NNL))) +
       geom_bar(fill = 'lightgrey', color = 'black') +
       xlab('NNL Status') +
       ylab('Number of authorizations') +
       scale_x_discrete(labels = c('Achieved', 'Not achieved', 'Undetermined', "NA")) +
       theme_classic() # frequency of NNL status  
     
     p11 # view p11
Number of authorizations by no net loss requirement status

Figure 12: Number of authorizations by no net loss requirement status

Table 14: Number of authorizations by no net loss requirement status
NNL freq
Y 45
N 44
U 15
NA 5

Conclusion

This EDA explored data extracted from 109 HADD authorizations issued by the DFO in 2020. By doing so, we hoped to analyze DFO compliance with the 2019 updated Fisheries Act standard of ensuring development activity that may cause the harmful alteration, disruption, or destruction (HADD) of fish habitats is met with offsetting measures that achieve no net loss (NNL).

Unfortunately, the lack of DFO cooperation and transparency makes this task difficult. However, the DFO has been contacted on several occasions with requests to provide additional documentation and was informed that conclusions will be made regardless of their cooperation. As of now, DFO has not provided the additional offsetting plans and conclusions will be made based on the information provided.

Repeatedly within this EDA, it was apparent that DFO standards on consistency and completeness of authorizations are severely lacking. Leading one to question not only the accuracy of the information required but also the true reason behind DFO’s lack of cooperation with this study. Regarding our objective, from this EDA it was determined that DFO is only requiring developers to achieve NNL when causing harmful alteration, disruption, or destruction to fish habitats approximately 50% of the time (not including instances in which this could not be assessed). Unfortunately, the lack of DFO cooperation makes it impossible to determine the reasoning behind this finding. Possibilities include a lack of resources during the COVID-19 pandemic, inadequate training upon the indoctrination of the amended fisheries act, and disregard for the importance of fish habitat preservation. For the future of our fisheries, this is alarming, and steps must be taken to ensure such environmental injustices do not continue.