There have many natural disasters occured in USA from 1995 to 2011. Cause damage on both public health and economic problems.
Then the questions have two
(1)What is the most harmful disaster for America on public health
(2)Which type of natural disaster caused largest loss on crop.
knitr::opts_chunk$set(cache = TRUE,echo=T)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Because my Rstudio was set system locale as Mandarin Chinese, if not change system locale as English,
it’ll have some problem to load data into Rstudio
Sys.setlocale("LC_ALL", "English") ##important ##if `EOF within quoted string` occured
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
if(!file.exists("repdata_data_StormData.csv")){
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,destfile ="repdata_data_StormData.csv.bz2" )
}
storm<-read.csv(bzfile("repdata_data_StormData.csv.bz2"))
dim(storm)
## [1] 902297 37
head(storm)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
names(storm)<-tolower(names(storm))
names(storm)
## [1] "state__" "bgn_date" "bgn_time" "time_zone" "county"
## [6] "countyname" "state" "evtype" "bgn_range" "bgn_azi"
## [11] "bgn_locati" "end_date" "end_time" "county_end" "countyendn"
## [16] "end_range" "end_azi" "end_locati" "length" "width"
## [21] "f" "mag" "fatalities" "injuries" "propdmg"
## [26] "propdmgexp" "cropdmg" "cropdmgexp" "wfo" "stateoffic"
## [31] "zonenames" "latitude" "longitude" "latitude_e" "longitude_"
## [36] "remarks" "refnum"
length(levels(storm$evtype))
## [1] 985
we can see the event types have length 985
but if we view into it?
levels(storm$evtype)[760:770]
## [1] "THUNDERSTORM WIND" "THUNDERSTORM WIND (G40)"
## [3] "THUNDERSTORM WIND 50" "THUNDERSTORM WIND 52"
## [5] "THUNDERSTORM WIND 56" "THUNDERSTORM WIND 59"
## [7] "THUNDERSTORM WIND 59 MPH" "THUNDERSTORM WIND 59 MPH."
## [9] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WIND 65 MPH"
## [11] "THUNDERSTORM WIND 65MPH"
we can see the above were all the same, there have many same levels in event types
levels(storm$evtype)<-tolower(levels(storm$evtype)) #change upper case to lower case
##convert beach erosion
storm$evtype<-gsub(".*beach e([A-z]){1,6}.*","beach erosion",storm$evtype)
##convert blizzard
storm$evtype<-gsub("^blizzard.*","blizzard",storm$evtype)
##convert blowing snow
storm$evtype<-gsub("^blow.*","blowing snow",storm$evtype)
##convert coastal flooding
storm$evtype<-gsub("^coastal.*|cstl flooding/erosion","coastal flooding",storm$evtype)
##convert cold weather
storm$evtype<-gsub(".*cold.*","cold weather",storm$evtype)
##convert downburst
storm$evtype<-gsub("^downburst.*","downburst",storm$evtype)
##convert drought
storm$evtype<-gsub("^drought.*","drought",storm$evtype)
##convert dry microburst
storm$evtype<-gsub("^dry microburst.*","dry microburst",storm$evtype)
##convert dust storm
storm$evtype<-gsub("^dust.*","dust storm",storm$evtype)
##convert heat
storm$evtype<-gsub(".*heat.*","heat",storm$evtype)
##convert rain
storm$evtype<-gsub(".*rain.*","rain",storm$evtype)
##convert cold
storm$evtype<-gsub(".*cold.*","cold",storm$evtype)
##convert flood
storm$evtype<-gsub(".*flood.*","flood",storm$evtype)
##convert freezing drizzle
storm$evtype<-gsub("^freezing drizzle.*","freezing drizzle",storm$evtype)
##convert frost
storm$evtype<-gsub("^frost.*","frost",storm$evtype)
##convert funnel cloud
storm$evtype<-gsub("^funnel.*","frost",storm$evtype)
##convert glaze ice
storm$evtype<-gsub("^glaze.*","glaze ice",storm$evtype)
##convert wind
storm$evtype<-gsub(".*wind.*","wind",storm$evtype)
##convert gustnado
storm$evtype<-gsub("^gustnado.*","gustnado",storm$evtype)
##convert hail
storm$evtype<-gsub("^hail.*","hail",storm$evtype)
##convert heat
storm$evtype<-gsub(".*heat.*","heat",storm$evtype)
##convert mud
storm$evtype<-gsub(".*mud.*","mud",storm$evtype)
##convert surf
storm$evtype<-gsub(".*surf.*","surf",storm$evtype)
##convert swell
storm$evtype<-gsub(".*swell.*","swells",storm$evtype)
##convert high wind
storm$evtype<-gsub("^high( )? wind.*","high wind",storm$evtype)
##convert hurricane
storm$evtype<-gsub("^hurricane.*","hurricane",storm$evtype)
##convert ice
storm$evtype<-gsub("^ice.*|^icy.*","ice",storm$evtype)
##convert hurricane
storm$evtype<-gsub("^hurricane.*","hurricane",storm$evtype)
##convert hurricane
storm$evtype<-gsub("^hurricane.*","hurricane",storm$evtype)
##convert lightning
storm$evtype<-gsub(".*ligh.*","lightning",storm$evtype)
##convert thundestorm
storm$evtype<-gsub(".*thundestorm.*","thundestorm",storm$evtype)
##convert tornado
storm$evtype<-gsub(".*torn([A-z]){1,4} .*","tornado",storm$evtype)
##convert tropical storm
storm$evtype<-gsub(".*tropical .*","tropical storm",storm$evtype)
length(levels(as.factor(storm$evtype)))
## [1] 381
the length of event type levels reduce from 985 to 381
health<-storm%>%
mutate(casualty=fatalities+injuries)%>%
group_by(evtype)%>%
summarize(total=sum(casualty))%>%
arrange(desc(total))
the class of health data is not the data.frame, I want to convert it to data.frame
class(health)
## [1] "tbl_df" "tbl" "data.frame"
health<-as.data.frame(health)
health<-health[1:20,]
levels(storm$cropdmgexp)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
levels(storm$propdmgexp)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
We can see the levels of cropdmgexp and prodmgexp are composed by several symbols
B==billion, M==million, K==thousand, H==hundred, the other symbol are mean unknown, or not available.
I want to convert symbel to number so that it’ll be more easy to deal with
levels(storm$cropdmgexp)<-c(0,0,0,0,1000000000,1000,1000,1000000,1000000)
levels(storm$propdmgexp)<-c(0,0,0,0,0,0,0,0,0,0,0,0,0,1000000000,100,100,1000,1000000,1000000)
levels(storm$cropdmgexp)
## [1] "0" "1e+09" "1000" "1e+06"
levels(storm$propdmgexp)
## [1] "0" "1e+09" "100" "1000" "1e+06"
storm$cropdmgexp<-as.numeric(as.character(storm$cropdmgexp))
storm$propdmgexp<-as.numeric(as.character(storm$propdmgexp))
damage<-storm%>%
group_by(evtype)%>%
summarize(total=sum((cropdmg*cropdmgexp)+(propdmg*propdmgexp)))%>%
arrange(desc(total))
damage<-as.data.frame(damage)
damage<-damage[1:20,]
ggplot(health,aes(reorder(evtype,total),total,fill=evtype))+
geom_bar(stat="identity")+
theme(legend.position = "none")+
labs(x="event type",y="total fatalities/injuries",title="The total number of fatalities and injuries")+
coord_flip()
health
## evtype total
## 1 tornado 96997
## 2 wind 12687
## 3 heat 12360
## 4 flood 10134
## 5 lightning 6052
## 6 ice 2245
## 7 winter storm 1527
## 8 hurricane 1459
## 9 hail 1376
## 10 heavy snow 1148
## 11 wildfire 986
## 12 blizzard 906
## 13 fog 796
## 14 cold 771
## 15 rip current 600
## 16 wild/forest fire 557
## 17 dust storm 507
## 18 rip currents 501
## 19 tropical storm 449
## 20 winter weather 431
the type of most harmful to population health: tornado,
caused 96997 people injuried or fatalitied
the second is wind, endangered 12687 people’s life
ggplot(damage,aes(reorder(evtype,total),total,fill=evtype))+
geom_bar(stat="identity")+
theme(legend.position = "none")+
labs(x="event type",y="total damage",title="The total number of damage")+
coord_flip()
damage
## evtype total
## 1 flood 179782540420
## 2 hurricane 90161397810
## 3 tornado 57356891990
## 4 storm surge 43323541000
## 5 wind 19645649120
## 6 hail 19000013670
## 7 drought 15018677780
## 8 ice 8985154510
## 9 tropical storm 8411023550
## 10 winter storm 6715441250
## 11 wildfire 5060586800
## 12 storm surge/tide 4642038000
## 13 rain 4189545990
## 14 wild/forest fire 3108626330
## 15 cold 1663345000
## 16 severe thunderstorm 1205560000
## 17 frost 1171375600
## 18 heavy snow 1067242240
## 19 lightning 948384370
## 20 heat 924789250
the type of greatest economic consequences: flood, caused 179782540420 dollars loss
second is hurricane, caused 90161397810 dollars loss