This project analyzes storm data in order give insight into the most costly and dangerous storms in the US.
I first load the data into R that contains economic and health affects from certain types of storms.
df<-read.table(file="repdata%2Fdata%2FStormData.csv.bz2", header=TRUE, sep=",")
summary(df)
## STATE__ BGN_DATE BGN_TIME
## Min. : 1.0 5/25/2011 0:00:00: 1202 12:00:00 AM: 10163
## 1st Qu.:19.0 4/27/2011 0:00:00: 1193 06:00:00 PM: 7350
## Median :30.0 6/9/2011 0:00:00 : 1030 04:00:00 PM: 7261
## Mean :31.2 5/30/2004 0:00:00: 1016 05:00:00 PM: 6891
## 3rd Qu.:45.0 4/4/2011 0:00:00 : 1009 12:00:00 PM: 6703
## Max. :95.0 4/2/2006 0:00:00 : 981 03:00:00 PM: 6700
## (Other) :895866 (Other) :857229
## TIME_ZONE COUNTY COUNTYNAME STATE
## CST :547493 Min. : 0.0 JEFFERSON : 7840 TX : 83728
## EST :245558 1st Qu.: 31.0 WASHINGTON: 7603 KS : 53440
## MST : 68390 Median : 75.0 JACKSON : 6660 OK : 46802
## PST : 28302 Mean :100.6 FRANKLIN : 6256 MO : 35648
## AST : 6360 3rd Qu.:131.0 LINCOLN : 5937 IA : 31069
## HST : 2563 Max. :873.0 MADISON : 5632 NE : 30271
## (Other): 3631 (Other) :862369 (Other):621339
## EVTYPE BGN_RANGE BGN_AZI
## HAIL :288661 Min. : 0.000 :547332
## TSTM WIND :219940 1st Qu.: 0.000 N : 86752
## THUNDERSTORM WIND: 82563 Median : 0.000 W : 38446
## TORNADO : 60652 Mean : 1.484 S : 37558
## FLASH FLOOD : 54277 3rd Qu.: 1.000 E : 33178
## FLOOD : 25326 Max. :3749.000 NW : 24041
## (Other) :170878 (Other):134990
## BGN_LOCATI END_DATE END_TIME
## :287743 :243411 :238978
## COUNTYWIDE : 19680 4/27/2011 0:00:00: 1214 06:00:00 PM: 9802
## Countywide : 993 5/25/2011 0:00:00: 1196 05:00:00 PM: 8314
## SPRINGFIELD : 843 6/9/2011 0:00:00 : 1021 04:00:00 PM: 8104
## SOUTH PORTION: 810 4/4/2011 0:00:00 : 1007 12:00:00 PM: 7483
## NORTH PORTION: 784 5/30/2004 0:00:00: 998 11:59:00 PM: 7184
## (Other) :591444 (Other) :653450 (Other) :622432
## COUNTY_END COUNTYENDN END_RANGE END_AZI
## Min. :0 Mode:logical Min. : 0.0000 :724837
## 1st Qu.:0 NA's:902297 1st Qu.: 0.0000 N : 28082
## Median :0 Median : 0.0000 S : 22510
## Mean :0 Mean : 0.9862 W : 20119
## 3rd Qu.:0 3rd Qu.: 0.0000 E : 20047
## Max. :0 Max. :925.0000 NE : 14606
## (Other): 72096
## END_LOCATI LENGTH WIDTH
## :499225 Min. : 0.0000 Min. : 0.000
## COUNTYWIDE : 19731 1st Qu.: 0.0000 1st Qu.: 0.000
## SOUTH PORTION : 833 Median : 0.0000 Median : 0.000
## NORTH PORTION : 780 Mean : 0.2301 Mean : 7.503
## CENTRAL PORTION: 617 3rd Qu.: 0.0000 3rd Qu.: 0.000
## SPRINGFIELD : 575 Max. :2315.0000 Max. :4400.000
## (Other) :380536
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 :465934 Min. : 0.000 :618413
## 1st Qu.: 0.00 K :424665 1st Qu.: 0.000 K :281832
## Median : 0.00 M : 11330 Median : 0.000 M : 1994
## Mean : 12.06 0 : 216 Mean : 1.527 k : 21
## 3rd Qu.: 0.50 B : 40 3rd Qu.: 0.000 0 : 19
## Max. :5000.00 5 : 28 Max. :990.000 B : 9
## (Other): 84 (Other): 9
## WFO STATEOFFIC
## :142069 :248769
## OUN : 17393 TEXAS, North : 12193
## JAN : 13889 ARKANSAS, Central and North Central: 11738
## LWX : 13174 IOWA, Central : 11345
## PHI : 12551 KANSAS, Southwest : 11212
## TSA : 12483 GEORGIA, North and Central : 11120
## (Other):690738 (Other) :595920
## ZONENAMES
## :594029
## :205988
## GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M : 639
## GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA : 592
## JEFFERSON - JEFFERSON : 303
## MADISON - MADISON : 302
## (Other) :100444
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## Min. : 0 Min. :-14451 Min. : 0 Min. :-14455
## 1st Qu.:2802 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0
## Median :3540 Median : 8707 Median : 0 Median : 0
## Mean :2875 Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.:4019 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. :9706 Max. : 17124 Max. :9706 Max. :106220
## NA's :47 NA's :40
## REMARKS REFNUM
## :287433 Min. : 1
## : 24013 1st Qu.:225575
## Trees down.\n : 1110 Median :451149
## Several trees were blown down.\n : 568 Mean :451149
## Trees were downed.\n : 446 3rd Qu.:676723
## Large trees and power lines were blown down.\n: 432 Max. :902297
## (Other) :588295
For four components of the project I found the most costly and dangerous storms in the US by adding the total number of deaths in one analysis, injuries in another analysis and cost in dollars for crop damage and personal property. I followed the same scheme for all analysis and reported the data visually in bar charts.
I will first analyze which types of storms caused the most injuries. I summed the number of injuries for each type of storm. Then I sorted this data and found the top six most dangerous storms by injuries.
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.4.4
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
df1<-aggregate(INJURIES~EVTYPE,df, sum)
df1 <- df1[order(-df1$INJURIES),]
df1<-df1 %>%
arrange(desc(INJURIES)) %>%
group_by(EVTYPE)
d<-head(df1)
d
## # A tibble: 6 x 2
## # Groups: EVTYPE [6]
## EVTYPE INJURIES
## <fct> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
I then subsetted the dataframe to include only data from the most injurous storms. To do this I subsetted the original dataframe but only included the top six injurous storms.
d2<-d$EVTYPE
df2<-subset(df, EVTYPE %in% d2)
df2<-df2[, c("EVTYPE", "INJURIES")]
From the transformed dataframe, I created a barplot of the most injurous storms, with tornados having the highest frequency of injuries. I completed the report below to demonstrate the analysis. This report will be displayed with the results.
p1<-ggplot(data=df2, aes(x=EVTYPE, y=INJURIES, fill=EVTYPE)) + geom_bar(stat="identity")+
labs(title="Injuries from Severe Weather", x="", y="Total Number of Injuries per Storm Type")+
scale_fill_discrete(name="Storm Types")+
theme(axis.text.x =element_blank(),
axis.ticks.x=element_blank())
I will now determine which types of storms caused the most deaths. I summed the number of fatalities for each type of storm. Then I sorted this data and found the top six most dangerous storms by fatalities much like I did for injuries per storm type.
df1<-df[,c("FATALITIES", "EVTYPE")]
df1<-aggregate(FATALITIES~EVTYPE,df, sum)
df1 <- df1[order(-df1$FATALITIES),]
df1<-df1 %>%
arrange(desc(FATALITIES)) %>%
group_by(EVTYPE)
d1<-head(df1)
d1
## # A tibble: 6 x 2
## # Groups: EVTYPE [6]
## EVTYPE FATALITIES
## <fct> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
I then subsetted the original dataset to include only data from the most lethal storms.
d2<-d1$EVTYPE
d3<-subset(df, EVTYPE %in% d2)
df3<-d3[, c("EVTYPE", "FATALITIES")]
From the transformed dataframe, I created a barplot of the most injurous storms, with tornados having the highest frequency of deaths. I completed the report below to demonstrate the analysis. This report will be displayed with the results.
p2<-ggplot(data=df3, aes(x=EVTYPE, y=FATALITIES, fill=EVTYPE)) + geom_bar(stat="identity")+
labs(title="Deaths by Extreme Weather", x="", y="Number of Fatalities per Storm Type")+
scale_fill_discrete(name="Storm Types")+
theme(axis.text.x =element_blank(),
axis.ticks.x=element_blank())
I will now determine which types of storms caused the most property damage. I modified the property damage to account for thousands of millions of dollars. If the amount was in thousands, I multiplied the cost by 1000. If the amount was in the millions, I mutiplied the cost by 1000000.
df3<-df[, c("EVTYPE", "PROPDMG", "PROPDMGEXP")]
df4<-df3 %>% mutate(PROPDMG = ifelse(PROPDMGEXP=="K", PROPDMG*1000, PROPDMG))
df5<-df4 %>% mutate(PROPDMG = ifelse(PROPDMGEXP=="M", PROPDMG*1000000, PROPDMG))
I summed the cost of property damage for each type of storm. Then I sorted this data and found the top six most expensive storms in dollars.
df5<-aggregate(PROPDMG~EVTYPE,df5, sum)
df6<-df5 %>%
arrange(desc(PROPDMG)) %>%
group_by(EVTYPE)
d1<-head(df6)
d1
## # A tibble: 6 x 2
## # Groups: EVTYPE [6]
## EVTYPE PROPDMG
## <fct> <dbl>
## 1 TORNADO 51625660796
## 2 FLOOD 22157709930
## 3 FLASH FLOOD 15140812068
## 4 HAIL 13927367054
## 5 HURRICANE 6168319016
## 6 TSTM WIND 4484928495
I then subsetted the dataframe to include only data from the most costly types of storms.
d2<-d1$EVTYPE
d3<-subset(df, EVTYPE %in% d2)
df3<-d3[, c("EVTYPE", "PROPDMG")]
I had to perform scaling again on the cost in thousands or million of dollars, because this transformation was not done on the original dataset. Lastly, I selected only the storm type and property damage varialbe in the dataset.
df4<-d3 %>% mutate(PROPDMG = ifelse(PROPDMGEXP=="K", PROPDMG*1000, PROPDMG))
df5<-df4 %>% mutate(PROPDMG = ifelse(PROPDMGEXP=="M", PROPDMG*1000000, PROPDMG))
df6<-df5[, c("EVTYPE", "PROPDMG")]
From the transformed dataframe, I created a barplot of the most costly storms, with tornados having the highest total cost. I completed the report below to demonstrate the analysis. This report will be displayed with the results.
p3<-ggplot(data=df6, aes(x=EVTYPE, y=PROPDMG, fill=EVTYPE)) + geom_bar(stat="identity")+
labs(title="Most Costly Types of Storms", x="", y="Total Damage Costs")+
scale_fill_discrete(name="Storm Types")+
theme(axis.text.x =element_blank(),
axis.ticks.x=element_blank())
I will now determine which types of storms caused the most crop damage. I modified the crop damage to account for thousands and millions of dollars. If the amount was in thousands, I multiplied the cost by 1000. If the amount was in the millions, I mutiplied the cost by 1000000.
df3<-df[, c("EVTYPE", "CROPDMG", "CROPDMGEXP")]
df4<-df3 %>% mutate(CROPDMG = ifelse(CROPDMGEXP=="K", CROPDMG*1000, CROPDMG))
df5<-df4 %>% mutate(CROPDMG = ifelse(CROPDMGEXP=="M", CROPDMG*1000000, CROPDMG))
I then subsetted the dataframe to include only data from the most costly types of storms.
df5<-aggregate(CROPDMG~EVTYPE,df5, sum)
df6<-df5 %>%
arrange(desc(CROPDMG)) %>%
group_by(EVTYPE)
d1<-head(df6)
d1
## # A tibble: 6 x 2
## # Groups: EVTYPE [6]
## EVTYPE CROPDMG
## <fct> <dbl>
## 1 DROUGHT 12472566002
## 2 FLOOD 5661968450
## 3 HAIL 3025537890
## 4 HURRICANE 2741910000
## 5 FLASH FLOOD 1421317100
## 6 EXTREME COLD 1292973000
I had to perform scaling again on the cost in thousands or million of dollars, because this transformation was not done on the original dataset. Lastly, I selected only the storm type and property damage varialbe in the dataset.
d2<-d1$EVTYPE
d3<-subset(df, EVTYPE %in% d2)
df4<-d3 %>% mutate(CROPDMG = ifelse(CROPDMGEXP=="K", CROPDMG*1000, CROPDMG))
df5<-df4 %>% mutate(CROPDMG = ifelse(CROPDMGEXP=="M", CROPDMG*1000000, CROPDMG))
df6<-df5[, c("EVTYPE", "CROPDMG")]
From the transformed dataframe, I created a barplot of the most costly storms, with tornados having the highest total cost. I completed the report below to demonstrate the analysis. This report will be displayed with the results.
p4<-ggplot(data=df6, aes(x=EVTYPE, y=CROPDMG, fill=EVTYPE)) + geom_bar(stat="identity")+
labs(title="Most Costly Types of Storms by Crop Damage", x="", y="Total Damage Costs")+
scale_fill_discrete(name="Storm Types")+
theme(axis.text.x =element_blank(),
axis.ticks.x=element_blank(),
legend.text=element_text(size=7))
Below are the barplots from the analysis above.
grid.arrange(p1, p2, nrow=2)
grid.arrange(p3, p4, nrow=2)
The biggest cost to property damage, deaths and personal injury come from tornados, but the most costly storm in terms of crop damage is drought.