Synopsis

In this document, we are going to analyse the storm events from NOAA Storm Database from April 1950 to November 2011.

This will help us get an idea of the impact of these events on human population and economy.From the data, we found that of all the events, Tornados caused a lot of fatalities and injuries to the human population till date.Floods caused greatest economic impact causing the most property and crop damage overall.If we can be made aware of floods and tornados in advance, it would definitely help us in the days ahead.

Reading ,Processing and transforming the raw data

The first step in our analysis will be to download the dataset from Storm Data.Let us use the below R code to read in the data .There is also some documentation of the database available Here and some National Climatic Data Center Storm Events FAQ.

We see that the data given here do not have direct values given for property damage and crop damage that will help us determine the economic impact. Hence there is a need to transform the given storm data to suit our needs . For this transformation I have used this Link as the source of truth.This also explains the methodology used in the data transformation.

#Load this data into steps Data frame in R

storm_d <- read.csv('repdata_data_StormData.csv.bz2')
#class(storm_d)
#str(storm_d)
#902297 rows with 37 columns
#head(storm_d)

#Compute property damage numbers from PROPDMG and PROPDMGEXP column values
#Set all to number 10 ( majority especially for numbers 0 to 8)
storm_d$gpropexp <- "10"
#head(storm_d)
#For all rows where propdmbexp = "+", its 1
storm_d$gpropexp [storm_d$PROPDMGEXP == "+"] <- 1

#For all rows where propdmbexp = "-", its 0, as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "-"] <- 0

#For all rows where propdmbexp = "?", its 0, as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "?"] <- 0

#For all rows where propdmbexp = "B,b", its B,b-1,000,000,000 , as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "B"] <- 1000000000

#For all rows where propdmbexp = "B,b", its B,b-1,000,000,000 , as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "b"] <- 1000000000

#For all rows where propdmbexp = "M,m", its M,m = millions = 1,000,000, as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "M"] <- 1000000


#For all rows where propdmbexp = "M,m", its M,m = millions = 1,000,000, as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "m"] <- 1000000

#For all rows where propdmbexp = "K,k", its M,m =kilos = thousand-1,000 as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "K"] <- 1000


#For all rows where propdmbexp = "K,k", its M,m =kilos = thousand-1,000 as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "k"] <- 1000


#For all rows where propdmbexp = "H,h", its hundreds-100 as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "H"] <- 100


#For all rows where propdmbexp = "H,h", its hundreds-100 as its missing in main data set
storm_d$gpropexp [storm_d$PROPDMGEXP == "h"] <- 100

table(storm_d$gpropexp)
## 
##      0      1     10    100   1000  1e+06  1e+09 
##      9      5 466234      7 424665  11337     40
storm_d$propdmgvalue <- as.numeric(storm_d$PROPDMG) * as.numeric(storm_d$gpropexp)


#################
#Apply the same above Logic for Cropdmg and calculate the total crop damage per record
#################
#Compute property damage numbers from PROPDMG and PROPDMGEXP column values
#Set all to number 10 ( majority especially for numbers 0 ,2)
storm_d$gcropexp <- "10"
table(storm_d$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
#For all rows where propdmbexp = "", blank is 0
storm_d$gcropexp [storm_d$CROPDMGEXP == ""] <- 0


#For all rows where propdmbexp = "?", its 0, as its missing in main data set
storm_d$gcropexp [storm_d$CROPDMGEXP == "?"] <- 0

#For all rows where propdmbexp = "B,b", its B,b-1,000,000,000 , as its missing in main data set
storm_d$gcropexp [storm_d$CROPDMGEXP == "B"] <- 1000000000

#For all rows where propdmbexp = "M,m", its M,m = millions = 1,000,000, as its missing in main data set
storm_d$gcropexp [storm_d$CROPDMGEXP == "M"] <- 1000000



#For all rows where propdmbexp = "M,m", its M,m = millions = 1,000,000, as its missing in main data set
storm_d$gcropexp [storm_d$CROPDMGEXP == "m"] <- 1000000

#For all rows where propdmbexp = "K,k", its  =kilos = thousand-1,000 as its missing in main data set
storm_d$gcropexp [storm_d$CROPDMGEXP == "K"] <- 1000


#For all rows where propdmbexp = "K,k", its M,m =kilos = thousand-1,000 as its missing in main data set
storm_d$gcropexp [storm_d$CROPDMGEXP == "k"] <- 1000

table(storm_d$gcropexp)
## 
##      0     10   1000  1e+06  1e+09 
## 618420     20 281853   1995      9
storm_d$cropdmgvalue <- as.numeric(storm_d$CROPDMG) * as.numeric(storm_d$gcropexp)

######################
human_impact <- storm_d %>%
    group_by(EVTYPE) %>%
    summarise(tot_fatalities=sum(FATALITIES),tot_injuries=sum(INJURIES))
#view(human_impact)

human_impact <-  arrange(human_impact,desc(tot_fatalities), desc(tot_injuries))
# view(human_impact) 

Results

From the data transformation above,let us plot the results in a graph to understand better which events caused more fatalities and injuries to the human population ,along with a drastic economic impact.

 #Pick top 10
 
 human_impact_10_f <- top_n(human_impact, 10,tot_fatalities)
 human_impact_10_f$tot_injuries <- NULL
#view(human_impact_10_f) 
human_impact <-  arrange(human_impact, desc(tot_injuries))
#view(human_impact)
human_impact_10_i <- top_n(human_impact, 10,tot_injuries)
human_impact_10_i$tot_fatalities <- NULL
#view(human_impact_10_i) 


#Plot the graph of human fatalities 
 human_impact_10_f %>%
    arrange(tot_fatalities) %>%
    mutate(EVENT=factor(EVTYPE, EVTYPE)) %>%
    ggplot( aes(x=EVENT, y=tot_fatalities) ) +
    ggtitle("Top 10 Events that caused most fatalities?")+
    geom_bar(stat="identity", fill="#69b3a2") +
    coord_flip() +
    theme(
        panel.grid.minor.y = element_blank(),
        panel.grid.major.y = element_blank(),
        legend.position="none"
    ) +
    ylab("Total number of fatalities") +
    xlab("Events")

#Plot the graph of human Injuries 
human_impact_10_i %>%
    arrange(tot_injuries) %>%
    mutate(EVENT=factor(EVTYPE, EVTYPE)) %>%
    ggplot( aes(x=EVENT, y=tot_injuries) ) +
    ggtitle("Top 10 Events that caused most Injuries?")+
    geom_bar(stat="identity", fill="blue") +
    coord_flip() +
    theme(
        panel.grid.minor.y = element_blank(),
        panel.grid.major.y = element_blank(),
        legend.position="none"
    ) +
    ylab("Total number of Injuries") +
    xlab("Events")

#Common events causing major fatalities and Injuries
common <- intersect(human_impact_10_f$EVTYPE,human_impact_10_i$EVTYPE)
print(common)
## [1] "TORNADO"        "EXCESSIVE HEAT" "FLASH FLOOD"    "HEAT"          
## [5] "LIGHTNING"      "TSTM WIND"      "FLOOD"

From the above plots we see the top 10 events that have caused fatalites and Injuries.The top common event is the TORNADO that has caused the most loss of human lives.

Let us now check on how these events caused an impact on the economy.These events obviously caused a lot of property damage and crop damage as can be seen from the storm data set. Let us plot this as a graph to see which are the top 10 events causing property and crop damage.

#Plots on Property and Crop damage
#head(storm_d)

economic_impact <- storm_d %>%
    group_by(EVTYPE) %>%
    summarise(tot_crpdmg=sum(cropdmgvalue),tot_prpdmg=sum(propdmgvalue))
#view(economic_impact)

economic_impact$tot_dmg <- economic_impact$tot_crpdmg + economic_impact$tot_prpdmg

#view(economic_impact)
economic_impact <-  arrange(economic_impact,desc(tot_dmg))


#Pick top 10

economic_impact_10 <- top_n(economic_impact, 10,tot_dmg)
#view(economic_impact_10)


##PLots start for damages caused
economic_impact_10 %>%
    arrange(tot_dmg) %>%
    mutate(EVENT=factor(EVTYPE, EVTYPE)) %>%
    ggplot( aes(x=EVENT, y=tot_dmg) ) +
    ggtitle("Top 10 Events that caused most damages?")+
    geom_bar(stat="identity", fill="green") +
    coord_flip() +
    theme(
        panel.grid.minor.y = element_blank(),
        panel.grid.major.y = element_blank(),
        legend.position="none"
    ) +
    ylab("Damages in $") +
    xlab("Events")

Conclusion

From the above plot showing the loss in Dollars due to these storm events, we can see that FLOOD is the main event that has caused lot of damages to property and Crops till now,while Tornados clearly were #1 in causing loss of human lives.These above plots hence help us to plan accordingly in future paying special attention for the major events like floods and tornados, such that precautionary measures can be undertaken to minimise fatalities, injuries and damages.Tornados and floods are common among the top 10 destructors of human life and economy.

References

  1. https://stackoverflow.com/questions/25948777/extract-bz2-file-in-r
  2. https://stackoverflow.com/questions/37221184/r-function-unzip-error-1-in-extracting-from-zip-file/37285138
  3. https://www.coursera.org/learn/reproducible-research/discussions/weeks/4/threads/38y35MMiEeiERhLphT2-QA
  4. https://raw.githubusercontent.com/yihui/knitr-examples/master/056-huge-plot.Rmd
  5. https://bookdown.org/yihui/rmarkdown/params-declare.html
  6. https://www.data-to-viz.com/graph/barplot.html

THE END