Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
You can also embed plots, for example:
storm<-read.csv("project/FStormData.csv")
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
There’s 902,297 observations and 37 variables.
As I’ll use the year of event to explore and aggregate the data I’ll convert the variable “BGN_DATE” to a date-time format and then extract the year.
storm$BGN_DATE<- as.POSIXct(storm$BGN_DATE,format="%m/%d/%Y %H:%M:%S",tz=Sys.timezone())
storm$BGN_Year<-year(storm$BGN_DATE)
storm %>%
group_by(STATE) %>%
summarise(Number_of_Events=n()) %>%
arrange(desc(Number_of_Events)) %>%
head(10)
## # A tibble: 10 × 2
## STATE Number_of_Events
## <fctr> <int>
## 1 TX 83728
## 2 KS 53440
## 3 OK 46802
## 4 MO 35648
## 5 IA 31069
## 6 NE 30271
## 7 IL 28488
## 8 AR 27102
## 9 NC 25351
## 10 GA 25259
storm %>%
group_by(STATE) %>%
summarise(Number_of_Injuries=sum(INJURIES)) %>%
arrange(desc(Number_of_Injuries)) %>%
head(10)
## # A tibble: 10 × 2
## STATE Number_of_Injuries
## <fctr> <dbl>
## 1 TX 17667
## 2 MO 8998
## 3 AL 8742
## 4 OH 7112
## 5 MS 6675
## 6 FL 5918
## 7 OK 5710
## 8 IL 5563
## 9 AR 5550
## 10 TN 5202
storm %>%
group_by(STATE) %>%
summarise(Number_of_Fatalities=sum(FATALITIES)) %>%
arrange(desc(Number_of_Fatalities)) %>%
head(10)
## # A tibble: 10 × 2
## STATE Number_of_Fatalities
## <fctr> <dbl>
## 1 IL 1421
## 2 TX 1366
## 3 PA 846
## 4 AL 784
## 5 MO 754
## 6 FL 746
## 7 MS 555
## 8 CA 550
## 9 AR 530
## 10 TN 521
Texas is the state with the highest number of events however the state of Illinois has the highest number of fatalities. There are around threee times more events recorded in Texas than Illinois but there are still more fatalities recorded in Illinois. It would be interesting to explore further those figures. To go further we could investigate what kind of event occur more often in Texas compare with Illinois, what is the popullation of those states, is it a rural or urban area …
storm %>%
group_by(EVTYPE) %>%
summarise(Number_of_Events=n()) %>%
arrange(desc(Number_of_Events)) %>%
head(10)
## # A tibble: 10 × 2
## EVTYPE Number_of_Events
## <fctr> <int>
## 1 HAIL 288661
## 2 TSTM WIND 219940
## 3 THUNDERSTORM WIND 82563
## 4 TORNADO 60652
## 5 FLASH FLOOD 54277
## 6 FLOOD 25326
## 7 THUNDERSTORM WINDS 20843
## 8 HIGH WIND 20212
## 9 LIGHTNING 15754
## 10 HEAVY SNOW 15708
storm %>%
group_by(EVTYPE) %>%
summarise(Number_of_Injuries=sum(INJURIES)) %>%
arrange(desc(Number_of_Injuries)) %>%
head(10)
## # A tibble: 10 × 2
## EVTYPE Number_of_Injuries
## <fctr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
storm %>%
group_by(EVTYPE) %>%
summarise(Number_of_Fatalities=sum(FATALITIES)) %>%
arrange(desc(Number_of_Fatalities)) %>%
head(10)
## # A tibble: 10 × 2
## EVTYPE Number_of_Fatalities
## <fctr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
After a brief exploration we can see that tornade and excessive heat are the top deadliest events while hail and thunderstorm are the most comon events.
storm %>%
group_by(EVTYPE) %>%
summarise(Tot_Property_Damage=sum(PROPDMG)) %>%
arrange(desc(Tot_Property_Damage)) %>%
head(10)
## # A tibble: 10 × 2
## EVTYPE Tot_Property_Damage
## <fctr> <dbl>
## 1 TORNADO 3212258.2
## 2 FLASH FLOOD 1420124.6
## 3 TSTM WIND 1335965.6
## 4 FLOOD 899938.5
## 5 THUNDERSTORM WIND 876844.2
## 6 HAIL 688693.4
## 7 LIGHTNING 603351.8
## 8 THUNDERSTORM WINDS 446293.2
## 9 HIGH WIND 324731.6
## 10 WINTER STORM 132720.6
storm %>%
group_by(EVTYPE) %>%
summarise(Tot_Crop_Damage=sum(CROPDMG)) %>%
arrange(desc(Tot_Crop_Damage)) %>%
head(10)
## # A tibble: 10 × 2
## EVTYPE Tot_Crop_Damage
## <fctr> <dbl>
## 1 HAIL 579596.28
## 2 FLASH FLOOD 179200.46
## 3 FLOOD 168037.88
## 4 TSTM WIND 109202.60
## 5 TORNADO 100018.52
## 6 THUNDERSTORM WIND 66791.45
## 7 DROUGHT 33898.62
## 8 THUNDERSTORM WINDS 18684.93
## 9 HIGH WIND 17283.21
## 10 HEAVY RAIN 11122.80
Hail is the most devastating weather event in term of crop damage however tornado is the event that causes the biggest property damage.
As long I was exploring this dataset I had an assumption that I’d like to verify. I’d like to see if the ratio fatalities?number of events and injuries/number of events has decreased over the years which would mean that nowadays we are better prepared to anticipate and tackle the weather events and thus reducing the civilian casualties.
aggrVictims<-storm %>% group_by(BGN_Year) %>%
summarise(NbEvents=n(),totFatalities=sum(FATALITIES),
ratioEventFat = sum(FATALITIES)/n(),totInjuries=sum(INJURIES),ratioEventInj=sum(INJURIES)/n())
#The reason why I'm excluding the data before 1960 is to avoid the outliers in the ratio calculation as I'm guessing that only the event causing fatalities or injuries were recorded in the past
aggrVictims %>%
filter(BGN_Year>=1960) %>%
ggplot( aes(x=BGN_Year,y=NbEvents,fill=NbEvents)) +
geom_bar(stat="identity") +
labs(x="Year", y="Number of events",title= "Number of events from 1960 to 2011") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
The number of events has significantly increased since 1960 obvisouly there were probably a lot of events in the past that haven’t been reccorded.
aggrVictims %>%
filter(BGN_Year>=1960) %>%
ggplot( aes(x=BGN_Year,y=ratioEventInj,fill=ratioEventInj)) +
geom_bar(stat="identity") +
labs(x="Year", y="Ratio Injuries",title= "Ratio Injuries per Event from 1960 to 2011") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
aggrVictims %>%
filter(BGN_Year>=1960) %>%
ggplot( aes(x=BGN_Year,y=ratioEventFat,fill=ratioEventFat)) +
geom_bar(stat="identity") +
labs(x="Year", y="Ratio Fatalies",title= "Ratio Fatalies per Event from 1960 to 2011") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
So appart from couple of peaks we can see that in general the ratio injuries per number of events tends to decrease over time and for the fatalities ratio it looks less obvious but we can still observe a decrease. However I can’t bring any evidence from this result as in my opinion only the major events were recorded in the past and we now record all events happening which reduces the ratio fatalities/ numbe rof events.
aggrVictimsEvtType<-storm %>% group_by(EVTYPE) %>%
summarise(totFatalities=sum(FATALITIES))
aggrVictimsEvtType$label<-paste(aggrVictimsEvtType$EVTYPE, aggrVictimsEvtType$totFatalities, sep = "\n ")
treemap(aggrVictimsEvtType,
index="label",
vSize="totFatalities",
vColor="totFatalities",
type="value",
title="Total Fatalities caused by weather events in U.S since 1950")
#### Economic Impact
aggrPrpDmgcEvtType<-storm %>% group_by(EVTYPE) %>%
summarise(totPropDmg=round(sum(PROPDMG)/1000000,2))
aggrPrpDmgcEvtType$label<-paste(aggrPrpDmgcEvtType$EVTYPE, paste(aggrPrpDmgcEvtType$totPropDmg,"M$"), sep = "\n ")
treemap(aggrPrpDmgcEvtType,
index="label",
vSize="totPropDmg",
vColor="totPropDmg",
type="value",
title="Total Property damage caused by weather events in U.S since 1950")
aggrCropDmgcEvtType<-storm %>% group_by(EVTYPE) %>%
summarise(totCropDmg=round(sum(CROPDMG)/1000,2))
aggrCropDmgcEvtType$label<-paste(aggrCropDmgcEvtType$EVTYPE, paste(aggrCropDmgcEvtType$totCropDmg,"K$"), sep = "\n ")
treemap(aggrCropDmgcEvtType,
index="label",
vSize="totCropDmg",
vColor="totCropDmg",
type="value",
title="Total Crop damage caused by weather events in U.S since 1950")
The top 3 deadliest event type are Tornado, Excessive heat and Flash flood. In the top 3 events causing the most preoperty damage we also find Tornado and Flash flood and the third one is thunder storm. Finally the top 3 events causing the most crop damages are Hail, Flash flood and Flood.