Aurélien Roussille
With the data from the U.S. National Oceanic and Atmospheric Administration, we explore the characteristics of major storms and weather events in the United States between 1950 and 2011. We focus on only two subjects. Firstly, across the U.S., which types of events are most harmful with respect to population health ? Secondly, yet across the U.S., which types of events have the greatest economic consequences ? To answer at theses questions, we focus on three variables in our data, “PROPDMG”, “FATALITIES” and “INJURIES”. We sum the observations of each to get the total of impact in this period of 61 years. Furthermore, we calculus the mean too for each, to get the most damageous in “frequency”, at each occurence, which types of events are harmful and expansive. On the 985 types of events, only 10% have records no null. Finally, during this period, the tornados caused the most damages for the human health. The sames main types cause dies and injuries (heat, flood and lightning). The types of events change if we take the mean of dies and injuries, the tropical storm gordon cause the most of dies and injuries, the climat effects are the main cause here, except for the wild fires that cause much of injuries. If we talk economicaly, we can see that during this period, the tornados were been the most expansive with the flash flood and the TSTM winds. But if we take for each occurence, the coastal erosion and the heavy rain and flood were been the most expansive.
We start by downloaded the data from the website and store it in the variable “data”. Then, we change the class of BGN_DATE and END_DATE in date class to get when the datas have been take.
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,destfile="meteo.csv.bz2")
data<-read.table(file="meteo.csv.bz2",header=TRUE,sep=",")
data$BGN_DATE<-as.Date(as.character(data$BGN_DATE),"%m/%d/%Y")
After obtain the dates, we load the package “dplyr” and subset the initial data to keep only the FATALATIES, the INJURIES and the EVTYPE variable which we aggregate by events types in summing its.
max(data$BGN_DATE)
## [1] "2011-11-30"
min(data$BGN_DATE)
## [1] "1950-01-03"
library("dplyr")
## Warning: package 'dplyr' was built under R version 3.4.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
harmful<-select(data,EVTYPE,FATALITIES:INJURIES)
sum_harmful<-aggregate(.~EVTYPE,data=harmful,FUN=sum)
Then, we filter to keep only the events which are no null fatalities AND injuries. We choose randomly five color, and arrange our subset to get the max of fatalities. We keep the names of event in a variable.
sum_harmful<-filter(sum_harmful,FATALITIES>0 & INJURIES >0)
## Warning: package 'bindrcpp' was built under R version 3.4.1
color=c("blue","white","red","black","green")
fat<-arrange(sum_harmful,desc(FATALITIES))
name_fat<-head(fat,5)$EVTYPE
We make the same thing for injuries.
inj<-arrange(sum_harmful,desc(INJURIES))
name_inj<-head(inj,5)$EVTYPE
In the same way, we make the means.
mean_harmful<-aggregate(.~EVTYPE,data=harmful,FUN=mean,na.rm=TRUE,digits=2)
mean_harmful<-filter(mean_harmful,FATALITIES>0 & INJURIES >0)
fatm<-arrange(mean_harmful,desc(FATALITIES))
name_fatm<-head(fatm,5)$EVTYPE
injm<-arrange(mean_harmful,desc(INJURIES))
name_injm<-head(injm,5)$EVTYPE
eco<-select(data,EVTYPE,PROPDMG)
sum_eco<-aggregate(.~EVTYPE,data=eco,FUN=sum)
sum_eco<-filter(sum_eco,PROPDMG >0)
eco_fat<-arrange(sum_eco,desc(PROPDMG ))
name_eco_sum<-head(eco_fat,5)$EVTYPE
mean_eco<-aggregate(.~EVTYPE,data=eco,FUN=mean,na.rm=TRUE,digits=2)
mean_eco<-filter(mean_eco,PROPDMG >0)
eco_fatm<-arrange(mean_eco,desc(PROPDMG ))
name_eco_mean<-head(eco_fatm,5)$EVTYPE
So, we see that the observations are on 61 years.
max(data$BGN_DATE)
## [1] "2011-11-30"
min(data$BGN_DATE)
## [1] "1950-01-03"
harmful<-select(data,EVTYPE,FATALITIES:INJURIES)
sum_harmful<-aggregate(.~EVTYPE,data=harmful,FUN=sum)
y<-dim(sum_harmful)
sum_harmful<-filter(sum_harmful,FATALITIES>0 & INJURIES >0)
x<-dim(sum_harmful)
0.1076142, 1 of the data is meaningful.
For the fatalities and injuries, we obtain the 5 most deadly and harmful during this period that we can compare on a graph.
head(fat,5)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
head(inj,5)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 TSTM WIND 504 6957
## 3 FLOOD 470 6789
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 816 5230
par(mfrow=c(1,2))
barplot(head(fat$FATALITIES,5),col=color)
legend("topright",legend=name_fat,fill=color)
barplot(head(inj$INJURIES,5),col=color)
legend("topright",legend=name_inj,fill=color)
title(main="Number of dies and injuries for the main harmfuls events, between 1950 and 2011 across the US",outer=TRUE,line=-2)
We observe now the occurence of fatalities and injuries for each event that we compare on a graph.
head(fatm,5)
## EVTYPE FATALITIES INJURIES
## 1 TROPICAL STORM GORDON 8.000000 43.000000
## 2 EXTREME HEAT 4.363636 7.045455
## 3 HEAT WAVE DROUGHT 4.000000 15.000000
## 4 MARINE MISHAP 3.500000 2.500000
## 5 WINTER STORMS 3.333333 5.666667
head(injm,5)
## EVTYPE FATALITIES INJURIES
## 1 TROPICAL STORM GORDON 8.00 43.0
## 2 WILD FIRES 0.75 37.5
## 3 HIGH WIND AND SEAS 3.00 20.0
## 4 HEAT WAVE DROUGHT 4.00 15.0
## 5 WINTER STORM HIGH WINDS 1.00 15.0
par(mfrow=c(1,2))
barplot(head(fatm$FATALITIES,5),col=color)
legend("topright",legend=name_fatm,fill=color)
barplot(head(injm$INJURIES,5),col=color)
legend("topright",legend=name_injm,fill=color)
title(main="Mean of dies and injuries by event, between 1950 and 2011 across the US",outer=TRUE,line=-2)
Finaly, we want to observe the economic consequences of the events. By the same way
sum_eco<-aggregate(.~EVTYPE,data=eco,FUN=sum)
x<-dim(sum_eco)
sum_eco<-filter(sum_eco,PROPDMG >0)
y<-dim(sum_eco)
We keep 0.4121827, 1 of the data. We get the top 5 expansive events during this period and by occurence, that we compare on a graph.
head(eco_fat,5)
## EVTYPE PROPDMG
## 1 TORNADO 3212258.2
## 2 FLASH FLOOD 1420124.6
## 3 TSTM WIND 1335965.6
## 4 FLOOD 899938.5
## 5 THUNDERSTORM WIND 876844.2
head(eco_fatm,5)
## EVTYPE PROPDMG
## 1 COASTAL EROSION 766
## 2 HEAVY RAIN AND FLOOD 600
## 3 RIVER AND STREAM FLOOD 600
## 4 Landslump 570
## 5 BLIZZARD/WINTER STORM 500
par(mfrow=c(2,1))
barplot(head(eco_fat$PROPDMG,5),main="Sum of expanse caused by event, between 1950 and 2011 across the US",col=color)
legend("topright",legend=name_eco_sum,fill=color)
barplot(head(eco_fatm$PROPDMG,5),ylim=c(0,4000),main="Mean of expanse by event, betwwen 1950 and 2011 across the US",col=color)
legend("topright",legend=name_eco_mean,fill=color)