=======================================================================
This report presents an analysis about the impact of weather events on the population health and economy of US for the time period 1950 to 2011. We will address the damage on economy by reporting the economic damage calculated as property damage and crop damage measured in US dollars, and address the damage on public health by sum the number of injures and fatalites in the events. In the end, we illustrate the impact of different types of events on these two aspects with two barchart containing two measurements for each aspect. We found that the tornado is most harmful for population health, and the flash flood resulting in greatest loss on economic value.
The data comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.
Storm Data [47Mb]
#download packages if needed
list.of.packages <- c("plyr","grid","ggplot2","utils","R.utils","data.table","reshape2")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
#load packages and functions
library("plyr")
library("grid")
library("ggplot2")
library("utils")
library("R.utils")
library("data.table")
library("reshape2")
# -- set a seed
set.seed(11)
# Download and unzip input file if it does not exist in current directory
fl <- "repdata-data-StormData.csv"
fl.zip <- paste0(fl, ".bz2")
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (file.exists(fl) == FALSE) {
if (file.exists(fl.zip) == FALSE) {
download.file(url, destfile = fl.zip, method = "curl",
quiet = TRUE)
}
bunzip2(fl.zip)
}
StormData <- read.csv("repdata-data-StormData.csv",
stringsAsFactors = FALSE)
Initial exploration shows that this is a fairly large dataset, which includes some attributes that are not useful to our specific analysis.
dim(StormData)
## [1] 902297 37
names(StormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
| Variabl Name | Description |
|---|---|
| state | Abbervation for states in US |
| evtype | Event type |
| fatalites | The number of people died |
| injures | The number of people injuured |
| propdmg | The amount of property damge (measured in money) |
| propdmgexp | The unit of damge (B,M,K,H) |
| cropdmg | The amount of corp damge (measured in money) |
| cropdmgexp | The unit of damge (B,M,K,H) |
# attributes
attributes <- c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP","CROPDMG", "CROPDMGEXP")
pData <- StormData[, attributes]
colnames(pData) <- c("state", "evtype", "fatalites", "injures", "propdmg", "propdmgexp","cropdmg", "cropdmgexp")
names(pData)
## [1] "state" "evtype" "fatalites" "injures" "propdmg"
## [6] "propdmgexp" "cropdmg" "cropdmgexp"
- Create a column as the unified name of that eventype
- The hierachy looks like as follow:
- Convection
- lightning
- tornado
- thunderstorm Wind
- hail
- ExtremeTemp
- cold
- heat
- Marine
- costal Storm
- sunami
- runami
- rip Current
- Flood
- flash Flood
- river Flood
- Tropical Cyclones
- tropical Storm
- hurricane
- Other
- drought
- dust Storm
- dust Devil
- rain
| Event | Filter expression |
|---|---|
| Lightning | LIGHTNING |
| Tornado | NADO,FUNNEL,WATERSPOUT“| Thunderstorm wind | THUNDER,STORM,WIND | Hail | HAIL | Cold | BLIZZARD|WINTER|COLD|LOW TEMP|RECORD LOW|SNOW|ICE | Heat | HEAT|WARM|RECORD HIGH | Costal Storm | COSTAL STORM | Sunami | SUNAMI | Rip current |RIP CURRENT Flash flood | FLASH FLOOD,FLD,FLOOD | River flood | RIVER FLOOD , URBAN FLOOD | Tropical Storm |TROPICAL STORM,TROPICAL| Hurricane | HURRICANE | Drought | DROUGHT | Dust storm | DUST STORM | Dust devil | DUST DEVIL | Rain | RAIN | |
# Function : Give each eventype an unified event description
deliverEvtUnifiedName <- function(dd) {
evtMatcher <- data.frame(reg = c("NADO|FUNNEL|WATERSPOUT", "THUNDER|STORM|WIND",
"HAIL", "FROST|FREEZ|BLIZZARD|WINTER|COLD|LOW TEMP|RECORD LOW|SNOW|ICE",
"HEAT|WARM|RECORD HIGH", "COSTAL STORM", "SUNAMI", "RIP CURRENT", "FLASH FLOOD|FLD|FLOOD",
"RIVER FLOOD|URBAN FLOOD", "TROPICAL STORM|TROPICAL", "HURRICANE", "DROUGHT",
"DUST STORM", "DUST DEVIL", "RAIN", "LIGHTNING"))
factorId <- c("Tornado", "Thunderstorm wind", "Hail", "Cold", "Heat", "Costal Storm",
"Sunami", "Rip current", "Flash flood", "River flood", "Tropical Storm",
"Hurricane", "Drought", "Dust storm", "Dust devil", "Rain", "Ligntning")
for (i in 1:nrow(evtMatcher)) {
indexFit <- grep(evtMatcher[i, "reg"], toupper(dd[, "evtype"]))
if (length(indexFit) > 0) {
dd[indexFit, "event"] <- factorId[i]
}
}
return(dd)
}
pData$event <- ("-")
# Give each event an unified id factor (17+1 factors)
pData <- deliverEvtUnifiedName(pData)
otherIndex <- grep("-", pData[, "event"])
pData[otherIndex, "event"] <- "Other"
pData$event <- as.factor(pData$event)
# See the event and number of that
table(pData$event)
##
## Cold Drought Dust devil Dust storm
## 46131 2512 150 429
## Flash flood Hail Heat Hurricane
## 85530 290400 2969 288
## Ligntning Other Rain Rip current
## 15776 9558 12238 777
## River flood Sunami Thunderstorm wind Tornado
## 569 20 362662 71531
## Tropical Storm
## 757
Fomula II : cropdmgValue = cropdmg * cropdmgexp
propdmgexp | unit
----------------| ----------------
B | 1,000,000,000
M | 1,000,000
K | 1,000
H | 100
NA or BLANK | 1# Function : Caculate each entry the property and crop damge value by 'dmg' and 'exp' variables
deliverActualDmgValue <- function(dd) {
unit <- data.frame(cha = c("B", "M", "K", "H"), val = c(1e+09, 1e+06, 1000,
100))
multi <- c(1e+09, 1e+06, 1000, 100)
for (i in 1:nrow(unit)) {
# index that match the unit
indexCrodFit <- grep(unit[i, "cha"], toupper(dd[, "cropdmgexp"]))
if (length(indexCrodFit) > 0) {
# Caculate the actual value
dd[indexCrodFit, "cropdmgvalue"] <- multi[i] * dd[indexCrodFit,
"cropdmg"]
}
# Same procudure for property damage
indexProdFit <- grep(unit[i, "cha"], toupper(dd[, "propdmgexp"]))
if (length(indexProdFit) > 0) {
dd[indexProdFit, "propdmgvalue"] <- multi[i] * dd[indexProdFit,
"propdmg"]
}
}
return(dd)
}
# Default value of the damage equals to variable 'dmg'
pData$cropdmgvalue <- pData$cropdmg
pData$propdmgvalue <- pData$propdmg
pData <- deliverActualDmgValue(pData)
summary(pData$cropdmgvalue)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
summary(pData$propdmgvalue)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 4.736e+05 5.000e+02 1.150e+11
ecodmgvalue = propdmagevalue + corpdmgvaluepephealthdmg = propdmagevalue + corpdmgvaluepData$ecodmgvalue <- pData$cropdmgvalue + pData$propdmgvalue
pData$pephealthdmg <- pData$injures + pData$fatalites
summary(pData$ecodmgvalue)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 5.28e+05 1.00e+03 1.15e+11
Caculate the Sum of propdmgvalue, cropdmgvalue,injures,fatalites for each event
library(reshape2)
t <- aggregate(cbind(propdmgvalue, cropdmgvalue, injures, fatalites) ~ event,pData, FUN = sum)
tidy <- melt(t, id.var = "event", variable.name = "variable")
colnames(tidy) <- c("event", "damagetype", "value")
print(tidy)
## event damagetype value
## 1 Cold propdmgvalue 12683339763
## 2 Drought propdmgvalue 1046306000
## 3 Dust devil propdmgvalue 719130
## 4 Dust storm propdmgvalue 5599000
## 5 Flash flood propdmgvalue 162312938980
## 6 Hail propdmgvalue 17619991567
## 7 Heat propdmgvalue 20125750
## 8 Hurricane propdmgvalue 84756180010
## 9 Ligntning propdmgvalue 939007947
## 10 Other propdmgvalue 9577576400
## 11 Rain propdmgvalue 3265197192
## 12 Rip current propdmgvalue 163000
## 13 River flood propdmgvalue 5259639600
## 14 Sunami propdmgvalue 144062000
## 15 Thunderstorm wind propdmgvalue 64968720255
## 16 Tornado propdmgvalue 57002958829
## 17 Tropical Storm propdmgvalue 7716127550
## 18 Cold cropdmgvalue 8730107950
## 19 Drought cropdmgvalue 13972621780
## 20 Dust devil cropdmgvalue 0
## 21 Dust storm cropdmgvalue 3600000
## 22 Flash flood cropdmgvalue 7216670200
## 23 Hail cropdmgvalue 3114212873
## 24 Heat cropdmgvalue 904423500
## 25 Hurricane cropdmgvalue 5515292800
## 26 Ligntning cropdmgvalue 12097090
## 27 Other cropdmgvalue 572173030
## 28 Rain cropdmgvalue 919315800
## 29 Rip current cropdmgvalue 0
## 30 River flood cropdmgvalue 5058774000
## 31 Sunami cropdmgvalue 20000
## 32 Thunderstorm wind cropdmgvalue 1975024138
## 33 Tornado cropdmgvalue 414963020
## 34 Tropical Storm cropdmgvalue 694896000
## 35 Cold injures 6350
## 36 Drought injures 19
## 37 Dust devil injures 43
## 38 Dust storm injures 440
## 39 Flash flood injures 8680
## 40 Hail injures 1467
## 41 Heat injures 9228
## 42 Hurricane injures 1328
## 43 Ligntning injures 5232
## 44 Other injures 3565
## 45 Rain injures 305
## 46 Rip current injures 529
## 47 River flood injures 3
## 48 Sunami injures 129
## 49 Thunderstorm wind injures 11388
## 50 Tornado injures 91439
## 51 Tropical Storm injures 383
## 52 Cold fatalites 1088
## 53 Drought fatalites 6
## 54 Dust devil fatalites 2
## 55 Dust storm fatalites 22
## 56 Flash flood fatalites 1547
## 57 Hail fatalites 45
## 58 Heat fatalites 3172
## 59 Hurricane fatalites 135
## 60 Ligntning fatalites 817
## 61 Other fatalites 657
## 62 Rain fatalites 114
## 63 Rip current fatalites 577
## 64 River flood fatalites 5
## 65 Sunami fatalites 33
## 66 Thunderstorm wind fatalites 1220
## 67 Tornado fatalites 5639
## 68 Tropical Storm fatalites 66
library("ggplot2")
damgeOnHealth <- tidy[grep("fatalites|injures", tidy[, "damagetype"]), ]
ggplot(damgeOnHealth, aes(x = reorder(event, value), y = value, fill = factor(damagetype,
labels = c("Injures", "Fatalites")))) +
geom_bar(stat = "identity") +
labs(title = "Top harmful weather event for populaiton health",
x = "Event", y = "Damage on populaiton health (number of people)") + scale_fill_manual(values = c("#F7B388",
"#C70000")) +
guides(fill = guide_legend(title = "Type of damage")) +
xlab("Weather Event") +
theme(axis.text = element_text(size = 12, colour = "#1F4178"), axis.title = element_text(size = 14,
colour = "#3A3E42", face = "bold"),
title = element_text(size = 16,
colour = "#282B2E", face = "bold")) + coord_flip()
damgeOnEconomic <- tidy[grep("cropdmgvalue|propdmgvalue", tidy[, "damagetype"]),]
ggplot(damgeOnEconomic, aes(x = reorder(event, value),
y = value, fill = factor(damagetype,
labels = c("Property", "Crop")))) + geom_bar(stat = "identity") + labs(title = "Top harmful weather event for economy ",
x = "Event", y = "Damage on economic value (US dollors)") + scale_fill_manual(values = c("#7D5E35","#407D47")) +
guides(fill = guide_legend(title = "Type of damage")) +
xlab("Weather Event") + theme(axis.text = element_text(size = 12, colour = "#1F4178"), axis.title = element_text(size = 14,
colour = "#3A3E42", face = "bold"),
title = element_text(size = 16,
colour = "#282B2E", face = "bold")) +
coord_flip()
From the results, we can see : - 1. The most harmful weather event for population health is tornado
This analysis was completed as an assignment in Reproducible Research,part of the Data Science Specialization offered by Johns Hopkins University via the Coursera platform.
The NCDC’s Storm Center FAQ and the National Weather Service’s documentation explain the definitions of each variable and how the data was compiled.
Additional data can be obtained from the Storm Events database.
---------------------------------