Synopsis This is an analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database aiming to identify 1) the most harmful with respect to population health and 2) the events with the greatest economic consequences. The analysis uses the data provided by NOAA storm database between 1950 and 2011. We are investigating these events all over the United States during the denoted period. In this analysis, we are using R language for data analysis and statistical programing. We present our results and R code as R markdown file, created using Knitr and published online as HTML file through Rpubs.com. The RStudio aided the whole analysis. This analysis is part of The Reproducible Research Course, part of the Data Sciences Specialization that provided by Courcera.org.

Data Processing

1- Reading NOAA database

if(!file.exists("repdata-data-StormData.csv.bz2"))
  {
    furl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(furl, destfile="repdata_data_StormData.csv.bz2", method="curl")
  }
if(!file.exists("repdata-data-StormData.csv"))
  {
    bunzip2("repdata-data-StormData.csv.bz2", overwrite=FALSE, remove=FALSE)
  }
noaa_data <- read.csv("repdata-data-StormData.csv", stringsAsFactors=FALSE)

2- Getting all available events in the database.

events <- unique (noaa_data$EVTYPE)
print(paste('NOAA database contains', length(events), 'evant types.'))
## [1] "NOAA database contains 985 evant types."

3- Due to the purpos of the study, we will only keep type of events cause the most damages to the of population health and economy. Therefore, we will keep the following data only (according to the provided documentsations at this link and this link ):

For memory optimization, the original data will be removed.

keep_data <- c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')
clean_data <- noaa_data[keep_data]
rm(noaa_data)

4- The data in the new data frame contains the event types (EVTYPE) and the damages in lives (INJURIES) and properties (PROPDMG) and crops (CROPDMG). We will explore the values of each feild.

unique(clean_data$INJURIES)
##   [1]   15    0    2    6    1   14    3   26   12   50    8  195    4   20
##  [15]    5  200   90   35    7   10   17   18   22   11   25   27   24   88
##  [29]    9   19   72   44   47   63   42   60   56   41   29  110  102   80
##  [43]   36  250   49  130   30   13   62   23   51   37   16  463   28   40
##  [57]  325  180   39   57  270  350  257  112   64   76  100   52   45  500
##  [71]   77  450   21   53   94   33   75  300   65  152   34   55   31  115
##  [85]  410   97   69   32  181   58  252  560  275   38  175   73  138  172
##  [99]  156   43   91  177   59  150   70  225   85  165  266 1228   68  785
## [113]  116  224  142   79   87  108  504  140   78  123  192  342  411  154
## [127]   98  176  170  216  101  118  280   74  153  105  103  207  240 1150
## [141]  190   46   81  135   95  120   82  166  137  159  597  111   67 1700
## [155]  121   54   71   89  385  230 1568  185  129   48  109  246  258  145
## [169]   96  122   83  143   61  600  800  550  125  750  119  241  397  160
## [183]  293  234  106  144  316   93   66  780   92  104  215  437  306  519
## [197]  136  700  223  210
unique(clean_data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(clean_data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

5- The expected damage values are either witten in numbers or as K, M or B, that is thousands, millions and billions, respectively. Furthermore, some feilds contains ? or are blank. To calculate the amounts of damages, these data needs to be cleaned as following.

  1. Replace K with 1000.
  2. Replace M with 1000000.
  3. Replace B with 1000000000.
  4. replace ? h, and blank with 0.
clean_data$PROPDMGEXP <- as.character(clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("\\-|\\+|\\?|h|H|0","0",clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("k|K", "1000", clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("m|M", "1000000", clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("b|B", "1000000000", clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP <- as.numeric(clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP[is.na(clean_data$PROPDMGEXP)] = 0

clean_data$CROPDMGEXP <- as.character(clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("\\-|\\+|\\?|h|H|0","0",clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("k|K", "1000", clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("m|M", "1000000", clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("b|B", "1000000000", clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP <- as.numeric(clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP[is.na(clean_data$CROPDMGEXP)] = 0

6- Adding total damages (PROPDMGEXP * PROPDMG and CROPDMGEXP * CropDMGE)

clean_data$PROPDMGTOTAL <- as.numeric(clean_data$PROPDMG * clean_data$PROPDMGEXP)
clean_data$CROPDMGTOTAL <- as.numeric(clean_data$CROPDMG * clean_data$CROPDMGEXP)

Results

1- Calculating the total damages caused by each event.

totals <- aggregate(clean_data[,c(2,3,8,9)], by=list(clean_data$EVTYPE), "sum")
names(totals) <- c('Events','Total_Facility','Total_Injery','Total_Prop','Total_Crop')

2- The event that is most harmful with respect to population health. We will get the top 10 harmful events by sorted the Totals.

# Getting data
top_harmful <- totals[order(totals$Total_Facility, totals$Total_Injery, decreasing = T),]
top_harmful <- head(top_harmful, 10)[1:3]

Table 1. Top 10 harmful events by sorted the Totals.

knitr::kable(head(top_harmful, format = "markdown"))
Events Total_Facility Total_Injery
834 TORNADO 5633 91346
130 EXCESSIVE HEAT 1903 6525
153 FLASH FLOOD 978 1777
275 HEAT 937 2100
464 LIGHTNING 816 5230
856 TSTM WIND 504 6957

Figure 1. Top 10 harmful events stacked.

# Plotting
par (mar=c(10,5,3,3))
cols <- colours()[c(10, 15)]
harms <- t(as.matrix(top_harmful[3:2]))
barplot(harms, main="Most harmful events with respect to population health",names.arg = top_harmful$Events,las=2, col=cols, legend = names(top_harmful)[c(3,2)])

According to this analysis, the most harmful events with respect to population health is Tornado.

3- The events with the greatest economic consequences.

We will get the top 10 events that affect the economy and sort the Totals.

# Getting data
top_economic <- totals[order(totals$Total_Prop, totals$Total_Crop, decreasing = T),]
top_economic  <- head(top_economic, 10)[c(1,4,5)]

Table 2. Top 10 events with great impact on economy by sorted the Totals.

knitr::kable(head(top_economic, format = "markdown"))
Events Total_Facility Total_Injery
170 FLOOD 470 6789
411 HURRICANE/TYPHOON 64 1275
834 TORNADO 5633 91346
670 STORM SURGE 13 38
153 FLASH FLOOD 978 1777
244 HAIL 15 1361

Figure 1. Top 10 harmful events stacked.

# Plotting
par (mar=c(12,5,3,3))
cols <- colours()[c(10, 15)]
economy <- t(as.matrix(top_economic[2:3]))
barplot(economy, main="Events with the greatest economic consequences",names.arg = top_economic$Events,las=2, col=cols, legend = names(top_economic)[c(2,3)])

According to this analysis, the most harmful events with respect to population health is Flood.

Conclusion

According to our analysis tornados and floods are the main events that cause damages to public health and properties.