Synopsis This is an analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database aiming to identify 1) the most harmful with respect to population health and 2) the events with the greatest economic consequences. The analysis uses the data provided by NOAA storm database between 1950 and 2011. We are investigating these events all over the United States during the denoted period. In this analysis, we are using R language for data analysis and statistical programing. We present our results and R code as R markdown file, created using Knitr and published online as HTML file through Rpubs.com. The RStudio aided the whole analysis. This analysis is part of The Reproducible Research Course, part of the Data Sciences Specialization that provided by Courcera.org.
Data Processing
1- Reading NOAA database
if(!file.exists("repdata-data-StormData.csv.bz2"))
{
furl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(furl, destfile="repdata_data_StormData.csv.bz2", method="curl")
}
if(!file.exists("repdata-data-StormData.csv"))
{
bunzip2("repdata-data-StormData.csv.bz2", overwrite=FALSE, remove=FALSE)
}
noaa_data <- read.csv("repdata-data-StormData.csv", stringsAsFactors=FALSE)
2- Getting all available events in the database.
events <- unique (noaa_data$EVTYPE)
print(paste('NOAA database contains', length(events), 'evant types.'))
## [1] "NOAA database contains 985 evant types."
3- Due to the purpos of the study, we will only keep type of events cause the most damages to the of population health and economy. Therefore, we will keep the following data only (according to the provided documentsations at this link and this link ):
For memory optimization, the original data will be removed.
keep_data <- c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')
clean_data <- noaa_data[keep_data]
rm(noaa_data)
4- The data in the new data frame contains the event types (EVTYPE) and the damages in lives (INJURIES) and properties (PROPDMG) and crops (CROPDMG). We will explore the values of each feild.
unique(clean_data$INJURIES)
## [1] 15 0 2 6 1 14 3 26 12 50 8 195 4 20
## [15] 5 200 90 35 7 10 17 18 22 11 25 27 24 88
## [29] 9 19 72 44 47 63 42 60 56 41 29 110 102 80
## [43] 36 250 49 130 30 13 62 23 51 37 16 463 28 40
## [57] 325 180 39 57 270 350 257 112 64 76 100 52 45 500
## [71] 77 450 21 53 94 33 75 300 65 152 34 55 31 115
## [85] 410 97 69 32 181 58 252 560 275 38 175 73 138 172
## [99] 156 43 91 177 59 150 70 225 85 165 266 1228 68 785
## [113] 116 224 142 79 87 108 504 140 78 123 192 342 411 154
## [127] 98 176 170 216 101 118 280 74 153 105 103 207 240 1150
## [141] 190 46 81 135 95 120 82 166 137 159 597 111 67 1700
## [155] 121 54 71 89 385 230 1568 185 129 48 109 246 258 145
## [169] 96 122 83 143 61 600 800 550 125 750 119 241 397 160
## [183] 293 234 106 144 316 93 66 780 92 104 215 437 306 519
## [197] 136 700 223 210
unique(clean_data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(clean_data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
5- The expected damage values are either witten in numbers or as K, M or B, that is thousands, millions and billions, respectively. Furthermore, some feilds contains ? or are blank. To calculate the amounts of damages, these data needs to be cleaned as following.
clean_data$PROPDMGEXP <- as.character(clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("\\-|\\+|\\?|h|H|0","0",clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("k|K", "1000", clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("m|M", "1000000", clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP = gsub("b|B", "1000000000", clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP <- as.numeric(clean_data$PROPDMGEXP)
clean_data$PROPDMGEXP[is.na(clean_data$PROPDMGEXP)] = 0
clean_data$CROPDMGEXP <- as.character(clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("\\-|\\+|\\?|h|H|0","0",clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("k|K", "1000", clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("m|M", "1000000", clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP = gsub("b|B", "1000000000", clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP <- as.numeric(clean_data$CROPDMGEXP)
clean_data$CROPDMGEXP[is.na(clean_data$CROPDMGEXP)] = 0
6- Adding total damages (PROPDMGEXP * PROPDMG and CROPDMGEXP * CropDMGE)
clean_data$PROPDMGTOTAL <- as.numeric(clean_data$PROPDMG * clean_data$PROPDMGEXP)
clean_data$CROPDMGTOTAL <- as.numeric(clean_data$CROPDMG * clean_data$CROPDMGEXP)
Results
1- Calculating the total damages caused by each event.
totals <- aggregate(clean_data[,c(2,3,8,9)], by=list(clean_data$EVTYPE), "sum")
names(totals) <- c('Events','Total_Facility','Total_Injery','Total_Prop','Total_Crop')
2- The event that is most harmful with respect to population health. We will get the top 10 harmful events by sorted the Totals.
# Getting data
top_harmful <- totals[order(totals$Total_Facility, totals$Total_Injery, decreasing = T),]
top_harmful <- head(top_harmful, 10)[1:3]
Table 1. Top 10 harmful events by sorted the Totals.
knitr::kable(head(top_harmful, format = "markdown"))
| Events | Total_Facility | Total_Injery | |
|---|---|---|---|
| 834 | TORNADO | 5633 | 91346 |
| 130 | EXCESSIVE HEAT | 1903 | 6525 |
| 153 | FLASH FLOOD | 978 | 1777 |
| 275 | HEAT | 937 | 2100 |
| 464 | LIGHTNING | 816 | 5230 |
| 856 | TSTM WIND | 504 | 6957 |
Figure 1. Top 10 harmful events stacked.
# Plotting
par (mar=c(10,5,3,3))
cols <- colours()[c(10, 15)]
harms <- t(as.matrix(top_harmful[3:2]))
barplot(harms, main="Most harmful events with respect to population health",names.arg = top_harmful$Events,las=2, col=cols, legend = names(top_harmful)[c(3,2)])
According to this analysis, the most harmful events with respect to population health is Tornado.
3- The events with the greatest economic consequences.
We will get the top 10 events that affect the economy and sort the Totals.
# Getting data
top_economic <- totals[order(totals$Total_Prop, totals$Total_Crop, decreasing = T),]
top_economic <- head(top_economic, 10)[c(1,4,5)]
Table 2. Top 10 events with great impact on economy by sorted the Totals.
knitr::kable(head(top_economic, format = "markdown"))
| Events | Total_Facility | Total_Injery | |
|---|---|---|---|
| 170 | FLOOD | 470 | 6789 |
| 411 | HURRICANE/TYPHOON | 64 | 1275 |
| 834 | TORNADO | 5633 | 91346 |
| 670 | STORM SURGE | 13 | 38 |
| 153 | FLASH FLOOD | 978 | 1777 |
| 244 | HAIL | 15 | 1361 |
Figure 1. Top 10 harmful events stacked.
# Plotting
par (mar=c(12,5,3,3))
cols <- colours()[c(10, 15)]
economy <- t(as.matrix(top_economic[2:3]))
barplot(economy, main="Events with the greatest economic consequences",names.arg = top_economic$Events,las=2, col=cols, legend = names(top_economic)[c(2,3)])
According to this analysis, the most harmful events with respect to population health is Flood.
Conclusion
According to our analysis tornados and floods are the main events that cause damages to public health and properties.