Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA). The goal of this project is to explore the NOAA Storm Database to find out the most harmful events with respect to populationi health and the types of events that have the greatest economic consequences.
Storm Data is downloaded from the course website and directly read in without unzip. After data reading in, use the summary and head function to check the data. Two analyses are performed as shown in the codes below focus on the population health and economic consequences respectively.
## set working directory
setwd("~/Desktop/Coursera/Reproducible Research")
## download the data file
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "~/StormData.csv.bz2")
## read in data
data <- read.csv("StormData.csv.bz2")
summary(data)
## STATE__ BGN_DATE BGN_TIME
## Min. : 1.0 5/25/2011 0:00:00: 1202 12:00:00 AM: 10163
## 1st Qu.:19.0 4/27/2011 0:00:00: 1193 06:00:00 PM: 7350
## Median :30.0 6/9/2011 0:00:00 : 1030 04:00:00 PM: 7261
## Mean :31.2 5/30/2004 0:00:00: 1016 05:00:00 PM: 6891
## 3rd Qu.:45.0 4/4/2011 0:00:00 : 1009 12:00:00 PM: 6703
## Max. :95.0 4/2/2006 0:00:00 : 981 03:00:00 PM: 6700
## (Other) :895866 (Other) :857229
## TIME_ZONE COUNTY COUNTYNAME STATE
## CST :547493 Min. : 0 JEFFERSON : 7840 TX : 83728
## EST :245558 1st Qu.: 31 WASHINGTON: 7603 KS : 53440
## MST : 68390 Median : 75 JACKSON : 6660 OK : 46802
## PST : 28302 Mean :101 FRANKLIN : 6256 MO : 35648
## AST : 6360 3rd Qu.:131 LINCOLN : 5937 IA : 31069
## HST : 2563 Max. :873 MADISON : 5632 NE : 30271
## (Other): 3631 (Other) :862369 (Other):621339
## EVTYPE BGN_RANGE BGN_AZI
## HAIL :288661 Min. : 0 :547332
## TSTM WIND :219940 1st Qu.: 0 N : 86752
## THUNDERSTORM WIND: 82563 Median : 0 W : 38446
## TORNADO : 60652 Mean : 1 S : 37558
## FLASH FLOOD : 54277 3rd Qu.: 1 E : 33178
## FLOOD : 25326 Max. :3749 NW : 24041
## (Other) :170878 (Other):134990
## BGN_LOCATI END_DATE END_TIME
## :287743 :243411 :238978
## COUNTYWIDE : 19680 4/27/2011 0:00:00: 1214 06:00:00 PM: 9802
## Countywide : 993 5/25/2011 0:00:00: 1196 05:00:00 PM: 8314
## SPRINGFIELD : 843 6/9/2011 0:00:00 : 1021 04:00:00 PM: 8104
## SOUTH PORTION: 810 4/4/2011 0:00:00 : 1007 12:00:00 PM: 7483
## NORTH PORTION: 784 5/30/2004 0:00:00: 998 11:59:00 PM: 7184
## (Other) :591444 (Other) :653450 (Other) :622432
## COUNTY_END COUNTYENDN END_RANGE END_AZI
## Min. :0 Mode:logical Min. : 0 :724837
## 1st Qu.:0 NA's:902297 1st Qu.: 0 N : 28082
## Median :0 Median : 0 S : 22510
## Mean :0 Mean : 1 W : 20119
## 3rd Qu.:0 3rd Qu.: 0 E : 20047
## Max. :0 Max. :925 NE : 14606
## (Other): 72096
## END_LOCATI LENGTH WIDTH F
## :499225 Min. : 0.0 Min. : 0 Min. :0
## COUNTYWIDE : 19731 1st Qu.: 0.0 1st Qu.: 0 1st Qu.:0
## SOUTH PORTION : 833 Median : 0.0 Median : 0 Median :1
## NORTH PORTION : 780 Mean : 0.2 Mean : 8 Mean :1
## CENTRAL PORTION: 617 3rd Qu.: 0.0 3rd Qu.: 0 3rd Qu.:1
## SPRINGFIELD : 575 Max. :2315.0 Max. :4400 Max. :5
## (Other) :380536 NA's :843563
## MAG FATALITIES INJURIES PROPDMG
## Min. : 0 Min. : 0 Min. : 0.0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0
## Median : 50 Median : 0 Median : 0.0 Median : 0
## Mean : 47 Mean : 0 Mean : 0.2 Mean : 12
## 3rd Qu.: 75 3rd Qu.: 0 3rd Qu.: 0.0 3rd Qu.: 0
## Max. :22000 Max. :583 Max. :1700.0 Max. :5000
##
## PROPDMGEXP CROPDMG CROPDMGEXP WFO
## :465934 Min. : 0.0 :618413 :142069
## K :424665 1st Qu.: 0.0 K :281832 OUN : 17393
## M : 11330 Median : 0.0 M : 1994 JAN : 13889
## 0 : 216 Mean : 1.5 k : 21 LWX : 13174
## B : 40 3rd Qu.: 0.0 0 : 19 PHI : 12551
## 5 : 28 Max. :990.0 B : 9 TSA : 12483
## (Other): 84 (Other): 9 (Other):690738
## STATEOFFIC
## :248769
## TEXAS, North : 12193
## ARKANSAS, Central and North Central: 11738
## IOWA, Central : 11345
## KANSAS, Southwest : 11212
## GEORGIA, North and Central : 11120
## (Other) :595920
## ZONENAMES
## :594029
## :205988
## GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M : 639
## GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA : 592
## JEFFERSON - JEFFERSON : 303
## MADISON - MADISON : 302
## (Other) :100444
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## Min. : 0 Min. :-14451 Min. : 0 Min. :-14455
## 1st Qu.:2802 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0
## Median :3540 Median : 8707 Median : 0 Median : 0
## Mean :2875 Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.:4019 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. :9706 Max. : 17124 Max. :9706 Max. :106220
## NA's :47 NA's :40
## REMARKS REFNUM
## :287433 Min. : 1
## : 24013 1st Qu.:225575
## Trees down.\n : 1110 Median :451149
## Several trees were blown down.\n : 568 Mean :451149
## Trees were downed.\n : 446 3rd Qu.:676723
## Large trees and power lines were blown down.\n: 432 Max. :902297
## (Other) :588295
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
## The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life and injuries. So we should focus on fatalities and injuries.
## Calculate fatalities and injuries according to the type of the event.
library(plyr)
data1 <- ddply(data, .(EVTYPE), summarize, FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES))
## subset the data to get rid of those with no fatlities and no injuries
data2 <- data1[which(!data1$FATALITIES == 0 | !data1$INJURIES == 0), ]
## sum the fatalities and injuries to creat a new variable fandI
data2$fandI = data2$FATALITIES + data2$INJURIES
## rearrange the database according to fandI
data3 <- data2[order(-data2$fandI), ]
## subset data according to the total number of fatalites and injuries
data4 <- data3[which(data3$fandI >= 1000), ]
library(ggplot2)
ggplot(data = data4, aes(x = EVTYPE, y = fandI, fill = EVTYPE)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle=45, vjust=0.5, size=10)) + xlab("Event Type") + ylab("Fatalities and Injuries")
cat("the most harmful events with respect to the population health are:", as.character(data4$EVTYPE), sep = " ")
## the most harmful events with respect to the population health are: TORNADO EXCESSIVE HEAT TSTM WIND FLOOD LIGHTNING HEAT FLASH FLOOD ICE STORM THUNDERSTORM WIND WINTER STORM HIGH WIND HAIL HURRICANE/TYPHOON HEAVY SNOW
cat("among these events, the most harmful type of storms is TORNADO")
## among these events, the most harmful type of storms is TORNADO
## The occurrence of storms and other significant weather phenomena having sufficient intensity to significant property damage and/or disruption to commerce. So we should focus on economic consequences, property damage and crop damage.
library(plyr)
data1 <- ddply(data, .(EVTYPE), summarize, PROPDMG = sum(PROPDMG), CROPDMG = sum(CROPDMG))
## subset the data to get rid of those with no property damage and no crop damage
data2 <- data1[which(!data1$PROPDMG == 0 | !data1$CROPDMG == 0), ]
## sum the property damage and crop damage to creat a new variable fandI
data2$pcDMG = data2$PROPDMG + data2$CROPDMG
## rearrange the database according to pcDMG
data3 <- data2[order(-data2$pcDMG), ]
## subset data according to the total number of property and crop damages
data4 <- data3[which(data3$pcDMG >= 100000), ]
ggplot(data = data4, aes(x = EVTYPE, y = pcDMG, fill = EVTYPE)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle=45, vjust=0.5, size=10)) + xlab("Event Type") + ylab("Property and Crop Damage")
cat("the types of events that have the greatest economic consequnces are:", as.character(data4$EVTYPE), sep = " ")
## the types of events that have the greatest economic consequnces are: TORNADO FLASH FLOOD TSTM WIND HAIL FLOOD THUNDERSTORM WIND LIGHTNING THUNDERSTORM WINDS HIGH WIND WINTER STORM HEAVY SNOW
cat("among these events, the most harmful type of storms is TORNADO")
## among these events, the most harmful type of storms is TORNADO
According to the analyasis and as shown in the figures above, the most harmful events with respect to population health are tornado, excessive heat, TSTM wind, flash flood and lightning and the events that have the greatest economic consequences are tornado, flash flood, TSTM wind, hail, flood, thunderstorm wind, lightning, thunderstorm winds and high wind.