Storm Data documents the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce.
From the human health point of view, tornade was the most harmful phenomenon. It occupied 37% of fatality and 65% of injury.
As to economic losses, property damage was caused by flood. It took 34% of all cause of property damage and 32% of all economic loss.
On the contrary, the crop damage was mainly (28%) caused by a drought. But this was only 3% of economic loss caused by a natural disaster.
We should take a heavy warning to Wind and Water!
The data is downloded from the course web site: “https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2” and decompressed as
file<-"repdata-data-StormData.csv"
and readed as a data.frame named sd.
library(data.table)
sd<-fread(file,verbose=TRUE)
## Input contains no \n. Taking this to be a filename to open
## File opened, filesize is 0.523066 GB.
## Memory mapping ... ok
## Detected eol as \r\n (CRLF) in that order, the Windows standard.
## Positioned on line 1 after skip or autostart
## This line is the autostart and not blank so searching up for the last non-blank ... line 1
## Detecting sep ... ','
## Detected 37 columns. Longest stretch was from line 1 to line 30
## Starting data input on line 1 (either column names or first row of data). First 10 characters: "STATE__",
## All the fields on line 1 are character fields. Treating as the column names.
## Count of eol: 1307675 (including 1 at the end)
## Count of sep: 34819802
## nrow = MIN( nsep [34819802] / ncol [37] -1, neol [1307675] - nblank [1] ) = 967216
## Type codes ( first 5 rows): 3444344430000303003343333430000333303
## Type codes (+ middle 5 rows): 3444344434444303443343333434440333343
## Type codes (+ last 5 rows): 3444344434444303443343333434444333343
## Type codes: 3444344434444303443343333434444333343 (after applying colClasses and integer64)
## Type codes: 3444344434444303443343333434444333343 (after applying drop or select (if supplied)
## Allocating 37 column slots (37 - 0 dropped)
##
Read 30.0% of 967216 rows
Read 51.7% of 967216 rows
Read 69.3% of 967216 rows
Read 81.7% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:06
## Warning in fread(file, verbose = TRUE): Read less rows (902297) than were
## allocated (967216). Run again with verbose=TRUE and please report.
## 0.000s ( 0%) Memory map (rerun may be quicker)
## 0.000s ( 0%) sep and header detection
## 0.544s ( 9%) Count rows (wc -l)
## 0.001s ( 0%) Column type detection (first, middle and last 5 rows)
## 0.487s ( 8%) Allocation of 902297x37 result (xMB) in RAM
## 4.914s ( 82%) Reading data
## 0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
## 0.000s ( 0%) Coercing data already read in type bumps (if any)
## 0.016s ( 0%) Changing na.strings to NA
## 5.962s Total
From the calculation as
## Count of eol: 1307675 (including 1 at the end)
## Count of sep: 34819802
## nrow = MIN( nsep [34819802] / ncol [37] -1, neol [1307675] - nblank [1] ) = 967216
fread insists some troubles with number of rows.
However, if you see the contents of REMARKS, you see many sep’s and eol’s. So you can use fread as fast and safe function to read StormData.
str(sd)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : chr "3" "2" "2" "2" ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
Subset should include
EVTYPE; event type
FATALITIES; number of people died
INJURIES; number of people injuured
PROPDMG; amount of property damage
PROPDMGEXP; unit of damage (B,M,K,H,…)
CROPDMG; amount of corp damage
CROPDMGEXP - unit of damage (B,M,K,…)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## d;%d8c.c*cc8c'c/cc/ 'package:data.table' cc cc9c/ccc&cc>c:
##
## between, last
##
## d;%d8c.c*cc8c'c/cc/ 'package:stats' cc cc9c/ccc&cc>c:
##
## filter, lag
##
## d;%d8c.c*cc8c'c/cc/ 'package:base' cc cc9c/ccc&cc>c:
##
## intersect, setdiff, setequal, union
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
##
## d;%d8c.c*cc8c'c/cc/ 'package:dplyr' cc cc9c/ccc&cc>c:
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
#####
#Select columns with ...EXP columns
sd.sub<-select(sd,c(EVTYPE,FATALITIES,INJURIES,PROPDMG:CROPDMGEXP))
#PROPDMEXP contains unit of PROP damage
unitpc<-names(table(sd.sub$PROPDMGEXP))
#"" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K" "m" "M"
unitpn<-c(0,0,0,0,0,1,1,1,1,1,1,1,1,1e9,1e2,1e2,1e3,1e6,1e6)
#
tmp<-as.numeric(mapvalues(sd.sub$PROPDMGEXP,unitpc,unitpn))
sd.sub$PROPDMG<-sd.sub$PROPDMG*tmp
#####
#In the case of CROPDMG
unitcc<-names(table(sd.sub$CROPDMGEXP))
#"" "?" "0" "2" "B" "k" "K" "m" "M"
unitcn<-c(0,0,0,1,1e9,1e3,1e3,1e6,1e6)
tmp<-as.numeric(mapvalues(sd.sub$CROPDMGEXP,unitcc,unitcn))
sd.sub$CROPDMG<-sd.sub$CROPDMG*tmp
## There are no use of ...EXP culomn.
sd.sub<-select(sd.sub,EVTYPE,FATALITIES,INJURIES,PROPDMG,CROPDMG)
by.event<-sd.sub %>% group_by(EVTYPE)
#Summarize fatal and injured number by events, Harm is a sum of the both.
health.sum<-by.event %>% summarise_each(funs(sum),FATALITIES,INJURIES) %>% mutate(Harm=FATALITIES+INJURIES)
#Each total number of fatal, injury and harm.
health.total<-as.data.frame(summarise_each(health.sum,funs(sum),FATALITIES,INJURIES,Harm))
#Ordered by number of the top 5 fatal events
fatal.ord<-as.data.frame(arrange(health.sum,desc(FATALITIES)))[1:5,]
fatalsubsum<-colSums(fatal.ord[,2:4])
fatal.ord5<-rbind(fatal.ord,cbind(EVTYPE="Others",(health.total-fatalsubsum)))
#Ordered by number of the top 5 injury events
injur.ord<-as.data.frame(arrange(health.sum,desc(INJURIES)))[1:5,]
injursubsum<-colSums(injur.ord[,2:4])
injur.ord5<-rbind(injur.ord,cbind(EVTYPE="Others",(health.total-injursubsum)))
#Ordered by number of the top 5 harm events
harm.ord<-as.data.frame(arrange(health.sum,desc(Harm)))[1:5,]
harmsubsum<-colSums(harm.ord[,2:4])
harm.ord5<-rbind(harm.ord,cbind(EVTYPE="Others",(health.total-harmsubsum)))
Top five events ordered by fatal, injured or harm category.
fatal.ord5
## EVTYPE FATALITIES INJURIES Harm
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 FLASH FLOOD 978 1777 2755
## 4 HEAT 937 2100 3037
## 5 LIGHTNING 816 5230 6046
## 6 Others 4878 33550 38428
#per cent
p<-mutate(fatal.ord5,fatalpercent=round(fatal.ord5[,2]/as.numeric(health.total[1])*100,2))
p<-mutate(p,injurpercent=round(fatal.ord5[,3]/as.numeric(health.total[2])*100,2))
p<-mutate(p,harmpercent=round(fatal.ord5[,4]/as.numeric(health.total[3])*100,2))
p[,c(1,5:7)]
## EVTYPE fatalpercent injurpercent harmpercent
## 1 TORNADO 37.19 65.00 62.30
## 2 EXCESSIVE HEAT 12.57 4.64 5.41
## 3 FLASH FLOOD 6.46 1.26 1.77
## 4 HEAT 6.19 1.49 1.95
## 5 LIGHTNING 5.39 3.72 3.88
## 6 Others 32.21 23.87 24.69
injur.ord5
## EVTYPE FATALITIES INJURIES Harm
## 1 TORNADO 5633 91346 96979
## 2 TSTM WIND 504 6957 7461
## 3 FLOOD 470 6789 7259
## 4 EXCESSIVE HEAT 1903 6525 8428
## 5 LIGHTNING 816 5230 6046
## 6 Others 5819 23681 29500
p<-mutate(injur.ord5,fatalpercent=round(injur.ord5[,2]/as.numeric(health.total[1])*100,2))
p<-mutate(p,injurpercent=round(injur.ord5[,3]/as.numeric(health.total[2])*100,2))
p<-mutate(p,harmpercent=round(injur.ord5[,4]/as.numeric(health.total[3])*100,2))
p[,c(1,5:7)]
## EVTYPE fatalpercent injurpercent harmpercent
## 1 TORNADO 37.19 65.00 62.30
## 2 TSTM WIND 3.33 4.95 4.79
## 3 FLOOD 3.10 4.83 4.66
## 4 EXCESSIVE HEAT 12.57 4.64 5.41
## 5 LIGHTNING 5.39 3.72 3.88
## 6 Others 38.42 16.85 18.95
harm.ord5
## EVTYPE FATALITIES INJURIES Harm
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 Others 5819 23681 29500
p<-mutate(harm.ord5,fatalpercent=round(harm.ord5[,2]/as.numeric(health.total[1])*100,2))
p<-mutate(p,injurpercent=round(harm.ord5[,3]/as.numeric(health.total[2])*100,2))
p<-mutate(p,harmpercent=round(harm.ord5[,4]/as.numeric(health.total[3])*100,2))
p[,c(1,5:7)]
## EVTYPE fatalpercent injurpercent harmpercent
## 1 TORNADO 37.19 65.00 62.30
## 2 EXCESSIVE HEAT 12.57 4.64 5.41
## 3 TSTM WIND 3.33 4.95 4.79
## 4 FLOOD 3.10 4.83 4.66
## 5 LIGHTNING 5.39 3.72 3.88
## 6 Others 38.42 16.85 18.95
#In the case of property, crop and economic
economic.sum<-by.event %>% summarise_each(funs(sum),PROPDMG,CROPDMG) %>% mutate(Economic=PROPDMG+CROPDMG)
#
economic.total<-as.data.frame(summarise_each(economic.sum,funs(sum),PROPDMG,CROPDMG,Economic))
##
prop.ord<-as.data.frame(arrange(economic.sum,desc(PROPDMG)))[1:5,]
propsubsum<-colSums(prop.ord[,2:4])
prop.ord5<-rbind(prop.ord,cbind(EVTYPE="Others",(economic.total-propsubsum)))
#
crop.ord<-as.data.frame(arrange(economic.sum,desc(CROPDMG)))[1:5,]
cropsubsum<-colSums(crop.ord[,2:4])
crop.ord5<-rbind(crop.ord,cbind(EVTYPE="Others",(economic.total-cropsubsum)))
#
economic.ord<-as.data.frame(arrange(economic.sum,desc(Economic)))[1:5,]
economicsubsum<-colSums(economic.ord[,2:4])
economic.ord5<-rbind(economic.ord,cbind(EVTYPE="Others",(economic.total-economicsubsum)))
Top five events ordered by propaty, crop or economic category.
prop.ord5
## EVTYPE PROPDMG CROPDMG Economic
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160582 414953110 57352113692
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 FLASH FLOOD 16140811599 1421317100 17562128699
## 6 Others 96953587223 38998075450 135951662673
#per cent
p<-mutate(prop.ord5,proppercent=round(prop.ord5[,2]/as.numeric(economic.total[1])*100,2))
p<-mutate(p,croppercent=round(prop.ord5[,3]/as.numeric(economic.total[2])*100,2))
p<-mutate(p,econopercent=round(prop.ord5[,4]/as.numeric(economic.total[3])*100,2))
p[,c(1,5:7)]
## EVTYPE proppercent croppercent econopercent
## 1 FLOOD 33.85 11.53 31.55
## 2 HURRICANE/TYPHOON 16.22 5.31 15.09
## 3 TORNADO 13.32 0.85 12.04
## 4 STORM SURGE 10.14 0.00 9.09
## 5 FLASH FLOOD 3.78 2.89 3.69
## 6 Others 22.69 79.42 28.54
crop.ord5
## EVTYPE PROPDMG CROPDMG Economic
## 1 DROUGHT 1046106000 13972566000 15018672000
## 2 FLOOD 144657709800 5661968450 150319678250
## 3 RIVER FLOOD 5118945500 5029459000 10148404500
## 4 ICE STORM 3944927810 5022113500 8967041310
## 5 HAIL 15732267250 3025954450 18758221700
## 6 Others 256818688844 16392130510 273210819354
p<-mutate(crop.ord5,proppercent=round(crop.ord5[,2]/as.numeric(economic.total[1])*100,2))
p<-mutate(p,croppercent=round(crop.ord5[,3]/as.numeric(economic.total[2])*100,2))
p<-mutate(p,econopercent=round(crop.ord5[,4]/as.numeric(economic.total[3])*100,2))
p[,c(1,5:7)]
## EVTYPE proppercent croppercent econopercent
## 1 DROUGHT 0.24 28.45 3.15
## 2 FLOOD 33.85 11.53 31.55
## 3 RIVER FLOOD 1.20 10.24 2.13
## 4 ICE STORM 0.92 10.23 1.88
## 5 HAIL 3.68 6.16 3.94
## 6 Others 60.10 33.38 57.35
economic.ord5
## EVTYPE PROPDMG CROPDMG Economic
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160582 414953110 57352113692
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732267250 3025954450 18758221700
## 6 Others 97362131572 37393438100 134755569672
p<-mutate(economic.ord5,proppercent=round(economic.ord5[,2]/as.numeric(economic.total[1])*100,2))
p<-mutate(p,croppercent=round(economic.ord5[,3]/as.numeric(economic.total[2])*100,2))
p<-mutate(p,econopercent=round(economic.ord5[,4]/as.numeric(economic.total[3])*100,2))
p[,c(1,5:7)]
## EVTYPE proppercent croppercent econopercent
## 1 FLOOD 33.85 11.53 31.55
## 2 HURRICANE/TYPHOON 16.22 5.31 15.09
## 3 TORNADO 13.32 0.85 12.04
## 4 STORM SURGE 10.14 0.00 9.09
## 5 HAIL 3.68 6.16 3.94
## 6 Others 22.78 76.15 28.28
Total number of fatal, injury and harm as the summation of the both.
health.total
## FATALITIES INJURIES Harm
## 1 15145 140528 155673
round(health.total/health.total$Harm*100,2)
## FATALITIES INJURIES Harm
## 1 9.73 90.27 100
These were distributed as followed.
#####
library(ggplot2)
library(gridExtra)
library(grid)
#####
graph.data<-data.frame(fatal.ord5$EVTYPE,fatal.ord5$FATALITIES,injur.ord5$EVTYPE,injur.ord5$INJURIES,harm.ord5$EVTYPE,harm.ord5$Harm)
name.data<-c("Fatal","Ingury","Harm")
graph.list<-lapply(1:3,function(i){
d <- data.frame(event=graph.data[,i*2-1],count=graph.data[,i*2])
g<- ggplot(d,aes(x=6:1,y=count,fill=event))+
geom_bar(stat="identity")+
coord_flip()+
theme(legend.position="right", legend.text = element_text(size=8))+
theme(legend.key.height=unit(0.7,"line"))+
theme(axis.text.y = element_blank())+
scale_fill_discrete(breaks=d$event)+
xlab("Event") +
ggtitle(name.data[i])})
args<-c(graph.list,list(ncol=1,nrow=3))
do.call(grid.arrange,args)
Total number of property, crop and economic as the summation of the both.
economic.total
## PROPDMG CROPDMG Economic
## 1 427318645204 49104191910 476422837114
round(economic.total/economic.total$Economic*100,2)
## PROPDMG CROPDMG Economic
## 1 89.69 10.31 100
These were distributed as followed.
graph.data<-data.frame(prop.ord5$EVTYPE,prop.ord5$PROPDMG,crop.ord5$EVTYPE,crop.ord5$CROPDMG,economic.ord5$EVTYPE,economic.ord5$Economic)
name.data<-c("Property","Crop","Economic")
graph.list<-lapply(1:3,function(i){
d <- data.frame(event=graph.data[,i*2-1],count=graph.data[,i*2])
g<- ggplot(d,aes(x=6:1,y=count,fill=event))+
geom_bar(stat="identity")+
coord_flip()+
theme(legend.position="right", legend.text = element_text(size=8))+
theme(legend.key.height=unit(0.7,"line"))+
theme(axis.text.y = element_blank())+
scale_fill_discrete(breaks=d$event)+
xlab("Event") +
ggtitle(name.data[i])})
args<-c(graph.list,list(ncol=1,nrow=3))
do.call(grid.arrange,args)