Impacts of event on economic and health

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site: - Storm Data There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined. - Storm Data Documentation National Weather Service - National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The GITHUB is also available on: link

Load the Data

library(tidyverse) # to load required package

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Storm <- read_csv("Data_Science_Project/Storm.csv") # to load correct dataset

## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(Storm)
names(Storm)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Data Processing

Selected Variables: 1. BGN_DATE : The beginning date 2. COUNTY : Number of the County where the event occurred 3. COUNTYNAME : Name of the county where the event occurred 4. STATE : State where the event occurred 5. EVTYPE : Type of events 6. FATALITIES : Numbers of corpses during the events 7. INJURIES : Physical Damages during the events 8. PROPDMG : Value of property damage in USD 9. PROPDMGEXP : Unit multiplier for property damage 10. CROPDMG : Value of crop damage in USD 11. CROPDMGEXP : Unit multiplier for crop damage

data <- Storm %>%
    select(BGN_DATE, COUNTY, COUNTYNAME, STATE, EVTYPE,
           FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
rm(Storm)
str(data)

## tibble [902,297 × 11] (S3: tbl_df/tbl/data.frame)
##  $ BGN_DATE  : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ COUNTY    : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr [1:902297] "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
##  $ CROPDMG   : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr [1:902297] NA NA NA NA ...

summary(data)

##    BGN_DATE             COUNTY       COUNTYNAME           STATE          
##  Length:902297      Min.   :  0.0   Length:902297      Length:902297     
##  Class :character   1st Qu.: 31.0   Class :character   Class :character  
##  Mode  :character   Median : 75.0   Mode  :character   Mode  :character  
##                     Mean   :100.6                                        
##                     3rd Qu.:131.0                                        
##                     Max.   :873.0                                        
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000

data$DATE <- mdy_hms(data$BGN_DATE) # to create new column date
data$MONTH <- month(data$DATE, label = T, abbr = F) # to create new column with the months
data$YEAR <- year(data$DATE) # to extract the Year in the date

table(data$PROPDMGEXP) # to see the letters used to characterise the amount of dollar

## 
##      -      ?      +      0      1      2      3      4      5      6      7 
##      1      8      5    216     25     13      4      4     28      4      5 
##      8      B      h      H      K      m      M 
##      1     40      1      6 424665      7  11330

table(data$CROPDMGEXP) # to see the letters used to characterise the amount of dollar

## 
##      ?      0      2      B      k      K      m      M 
##      7     19      1      9     21 281832      1   1994

# To change the letters into the corresponding number 
data$CROPDMGEXP[!grepl("K|M|B", data$CROPDMGEXP, ignore.case = TRUE)] <- 0
data$PROPDMGEXP[!grepl("K|M|B", data$PROPDMGEXP, ignore.case = TRUE)] <- 0

data$PROPDMGEXP[grep("K", data$PROPDMGEXP, ignore.case = TRUE)] <- "3"
data$PROPDMGEXP[grep("M", data$PROPDMGEXP, ignore.case = TRUE)] <- "6"
data$PROPDMGEXP[grep("B", data$PROPDMGEXP, ignore.case = TRUE)] <- "9"
data$PROPDMGEXP <- as.numeric(as.character(data$PROPDMGEXP))

data$CROPDMGEXP[grep("K", data$CROPDMGEXP, ignore.case = TRUE)] <- "3"
data$CROPDMGEXP[grep("M", data$CROPDMGEXP, ignore.case = TRUE)] <- "6"
data$CROPDMGEXP[grep("B", data$CROPDMGEXP, ignore.case = TRUE)] <- "9"
data$CROPDMGEXP <- as.numeric(as.character(data$CROPDMGEXP))

# To assign the letter into the corresponding amount in dollar
data$PROPCOST <- data$PROPDMG * 10^data$PROPDMGEXP # 
data$CROPCOST <- data$CROPDMG * 10^data$CROPDMGEXP

# To sort the cost of Property damage in a decreasing order
sort(table(data$PROPCOST), decreasing = T)[1:20]

## 
##       0    5000   10000    1000    2000   25000   50000    3000   20000   15000 
##  663123   31731   21787   17544   17186   17104   13596   10364    9179    8617 
##  250000     500   1e+05    2500   30000   5e+05    4000    8000   75000 2500000 
##    8439    6707    6302    5807    4391    4000    3202    2877    2419    2411

# To sort the cost of Crop damage in a decreasing order
sort(table(data$CROPCOST), decreasing = T)[1:20]

## 
##      0   5000  10000  50000  1e+05   1000   2000  25000  20000  5e+05  15000 
## 880198   4097   2349   1984   1233    956    951    830    758    721    598 
##    500   3000 250000  2e+05  1e+06  30000  75000 150000  3e+05 
##    568    554    513    479    447    317    290    268    250

data$EVTYPE <- tolower(data$EVTYPE) # to change EVTYPE into lower case

# To substitute the string in order to have less categories 
data$events <- data$EVTYPE
data$events <- gsub('.avalan.', 'avalanche', data$events)
data$events <- gsub('.storm.', 'storm', data$events)
data$events <- gsub('.*storm.*', 'storm', data$events)
data$events <- gsub('.*avalan.*', 'avalanche', data$events)
data$events <- gsub('.*tornado.*', 'tornado', data$events)
data$events <- gsub('.*tstm.*', 'storm', data$events)
data$events <- gsub('.*flood.*', 'flood', data$events)
data$events <- gsub('.*fire.*', 'fire', data$events)
data$events <- gsub('.*cold.*', 'cold', data$events)
data$events <- gsub('.*freez.*', 'cold', data$events)
data$events <- gsub('.*frost.*', 'cold', data$events)
data$events <- gsub('.*ice.*', 'cold', data$events)
data$events <- gsub('.*low.*', 'cold', data$events)
data$events <- gsub('.*hail.*', 'hail', data$events)
data$events <- gsub('.*dry.*', 'heat', data$events)
data$events <- gsub('.*icy.*', 'cold', data$events)
data$events <- gsub('.*summary.*', 'summary', data$events)
data$events <- gsub('.*warmth.*', 'heat', data$events)
data$events <- gsub('.*warm.*', 'heat', data$events)
data$events <- gsub('.*hurricane.*', 'hurricane', data$events)
data$events <- gsub('.*drought.*', 'heat', data$events)
data$events <- gsub('.*hot.*', 'heat', data$events)
data$events <- gsub('.*dri.*', 'heat', data$events)
data$events <- gsub('.*snow.*', 'snow', data$events)
data$events <- gsub('.*torn.*', 'tornado', data$events)
data$events <- gsub('.*volca.*', 'volcano', data$events)
data$events <- gsub('.*rain.*', 'rain', data$events)
data$events <- gsub('.*heat.*', 'heat', data$events)
data$events <- gsub('.*lightning.*', 'lightning', data$events)
data$events <- gsub('.*wint.*', 'cold', data$events)
data$events <- gsub('.*wind.*', 'wind', data$events)
data$events <- gsub('.*precipitation.*', 'rain', data$events)
data$events <- gsub('.*floo.*', 'flood', data$events)
data$events <- gsub('.*whir.*', 'wind', data$events)
data$events <- gsub('.*precip.*', 'rain', data$events)
data$events <- gsub('.*blizzard.*', 'rain', data$events)
data$events <- gsub('.*wet.*', 'wet', data$events)
data$events <- gsub('.*wat.*', 'wet', data$events)
data$events <- gsub('.*wayt.*', 'wet', data$events)
data$events <- gsub('.*urban.*', 'flood', data$events)
data$events <- gsub('.*cloud.*', 'cloud', data$events)
data$events <- gsub('.*surf.*', 'surf', data$events)
data$events <- gsub('.*fog.*', 'fog', data$events)

head(sort(table(data$events), decreasing = T), n = 15) # to see only the 15 first greatest events

## 
##     storm      hail     flood   tornado      wind      snow lightning      rain 
##    351859    289277     86135     60701     26553     17535     15760     14674 
##      cold     cloud      heat      fire       wet       fog      surf 
##     13045      6944      5779      4240      3897      1834      1063

data$All_Events <- 'OTHER' # to create a column names All_Events

# to set to other all the features less than 1000
data$All_Events[grep("storm", data$events, ignore.case = TRUE)] <- "STORM"
data$All_Events[grep("hail", data$events, ignore.case = TRUE)] <- "HAIL"
data$All_Events[grep("flood", data$events, ignore.case = TRUE)] <- "FLOOD"
data$All_Events[grep("tornado", data$events, ignore.case = TRUE)] <- "TORNADO"
data$All_Events[grep("wind", data$events, ignore.case = TRUE)] <- "WIND"
data$All_Events[grep("snow", data$events, ignore.case = TRUE)] <- "SNOW"
data$All_Events[grep("lightning", data$events, ignore.case = TRUE)] <- "LIGHTNING"
data$All_Events[grep("rain", data$events, ignore.case = TRUE)] <- "RAIN"
data$All_Events[grep("cold", data$events, ignore.case = TRUE)] <- "COLD"
data$All_Events[grep("cloud", data$events, ignore.case = TRUE)] <- "CLOUD"
data$All_Events[grep("heat", data$events, ignore.case = TRUE)] <- "HEAT"
data$All_Events[grep("fire", data$events, ignore.case = TRUE)] <- "FIRE"
data$All_Events[grep("wet", data$events, ignore.case = TRUE)] <- "WET"
data$All_Events[grep("fog", data$events, ignore.case = TRUE)] <- "FOG"
data$All_Events[grep("surf", data$events, ignore.case = TRUE)] <- "SURF"

data$events <- data$All_Events # To assign the value of All_Events to events
data$All_Events <- NULL # To remove the column All_Events

Results

Impact on Health

This table shows the event that causes the most health impact

# To create the table of the of the elements that affect the the Health by percentage
data %>% group_by(events) %>%
    summarise(Total = sum(FATALITIES + INJURIES, na.rm = T)) %>%
    mutate(Percent = round(Total / sum(Total) * 100, 3),
           Type = rep('Fatalities and Injuries')) %>%
    arrange(desc(Total))

## # A tibble: 16 × 4
##    events    Total Percent Type                   
##    <chr>     <dbl>   <dbl> <chr>                  
##  1 TORNADO   97068  62.4   Fatalities and Injuries
##  2 STORM     14909   9.58  Fatalities and Injuries
##  3 HEAT      12457   8.00  Fatalities and Injuries
##  4 FLOOD     10234   6.57  Fatalities and Injuries
##  5 LIGHTNING  6048   3.88  Fatalities and Injuries
##  6 OTHER      3559   2.29  Fatalities and Injuries
##  7 WIND       2362   1.52  Fatalities and Injuries
##  8 COLD       1748   1.12  Fatalities and Injuries
##  9 FIRE       1698   1.09  Fatalities and Injuries
## 10 HAIL       1386   0.89  Fatalities and Injuries
## 11 RAIN       1315   0.845 Fatalities and Injuries
## 12 SNOW       1285   0.825 Fatalities and Injuries
## 13 FOG        1156   0.743 Fatalities and Injuries
## 14 SURF        409   0.263 Fatalities and Injuries
## 15 WET          36   0.023 Fatalities and Injuries
## 16 CLOUD         3   0.002 Fatalities and Injuries

# To create a new variable to plot
ToPlotHealth <- data %>% group_by(events) %>%
    summarise(Total = sum(FATALITIES + INJURIES, na.rm = T)) %>%
    mutate(Percent = round(Total / sum(Total) * 100, 3),
           Type = rep('Fatalities and Injuries')) %>%
    arrange(desc(Total))

# To arrange the percentage in decreasing order to plot
ToPlotHealth$events <- factor(ToPlotHealth$events,
                              levels = ToPlotHealth$events[order(ToPlotHealth$Percent,
                                                                 decreasing = T)])

# The Plot of the Impact on Health per percentage
ggplot(ToPlotHealth, aes(x = events, y = Percent, fill = events)) +
    geom_bar(stat = 'identity') +
    geom_text(aes(label = paste0(round(Percent, 3), '%')), vjust = -.5) +
    labs(x = 'Health Damage', y = 'Percentage (%)') + 
    scale_y_continuous(labels = scales::percent_format(scale = 1)) +
    theme_minimal() + theme(legend.position = 'none') +
    ggtitle('Events that cause the Health damage in percentage in the US \nFrom 1950 - 2011')

Economic impact

This table shows the event that causes the most economic impact.

# To see the economic impacts per type of events
data %>% group_by(events) %>%
    summarise(Total = sum(PROPCOST + CROPCOST, na.rm = T)) %>%
    mutate(Percent = round(Total / sum(Total) * 100, 3),
           Type = rep('Economic Damage')) %>%
    arrange(desc(Total))

## # A tibble: 16 × 4
##    events            Total Percent Type           
##    <chr>             <dbl>   <dbl> <chr>          
##  1 FLOOD     179976102832.  37.8   Economic Damage
##  2 OTHER      91392479990   19.2   Economic Damage
##  3 STORM      84624213627.  17.8   Economic Damage
##  4 TORNADO    59010560049.  12.4   Economic Damage
##  5 HAIL       18780451671.   3.94  Economic Damage
##  6 HEAT       15950224630    3.35  Economic Damage
##  7 FIRE        8904910130    1.87  Economic Damage
##  8 WIND        6840844523    1.44  Economic Damage
##  9 RAIN        4816707440    1.01  Economic Damage
## 10 COLD        3763584450    0.79  Economic Damage
## 11 SNOW        1140377802.   0.239 Economic Damage
## 12 LIGHTNING    940781537.   0.197 Economic Damage
## 13 WET          157104700    0.033 Economic Damage
## 14 SURF         101475000    0.021 Economic Damage
## 15 FOG           22829500    0.005 Economic Damage
## 16 CLOUD           194600    0     Economic Damage

# To create a new variable to plot
ToPlotEcono <- data %>% group_by(events) %>%
    summarise(Total = sum(PROPCOST + CROPCOST, na.rm = T)) %>%
    mutate(Percent = round(Total / sum(Total) * 100, 3), Type = rep('Economic Damage')) %>%
    arrange(desc(Total))


ToPlotEcono$events <- factor(ToPlotEcono$events,
                             levels = ToPlotEcono$events[order(ToPlotEcono$Percent,
                                                               decreasing = T)])
# The plot of the Economic Impact per event 
ggplot(ToPlotEcono, aes(x = events, y = Percent, fill = events)) +
    geom_bar(stat = 'identity') +
    geom_text(aes(label = paste0(round(Percent, 3), '%')), vjust = -.5) +
    labs(x = 'Economic Damage', y = 'Percentage (%)') + 
    scale_y_continuous(labels = scales::percent_format(scale = 1)) +
    theme_minimal() + theme(legend.position = 'none') +
    ggtitle('Events that cause the economic damage in percentage in the US \nFrom 1950 - 2011')

Conclusion

Based on the results of the Analysis, we can conclude that: 1. The most Health damages are caused by Tornado, 62% 2. The most Economic damages are cause by Flood, 38%

JHU Data Science - Reproducible Research - Project 2

Guynemer Cétoute

2025-02-25