THE CONSEQUENCES OF CLIMAT IN THE UNITED

STATES

Aurélien Roussille

Introduction

With the data from the U.S. National Oceanic and Atmospheric Administration, we explore the characteristics of major storms and weather events in the United States between 1950 and 2011. We focus on only two subjects. Firstly, across the U.S., which types of events are most harmful with respect to population health ? Secondly, yet across the U.S., which types of events have the greatest economic consequences ? To answer at theses questions, we focus on three variables in our data, “PROPDMG”, “FATALITIES” and “INJURIES”. We sum the observations of each to get the total of impact in this period of 61 years. Furthermore, we calculus the mean too for each, to get the most damageous in “frequency”, at each occurence, which types of events are harmful and expansive. On the 985 types of events, only 10% have records no null. Finally, during this period, the tornados caused the most damages for the human health. The sames main types cause dies and injuries (heat, flood and lightning). The types of events change if we take the mean of dies and injuries, the tropical storm gordon cause the most of dies and injuries, the climat effects are the main cause here, except for the wild fires that cause much of injuries. If we talk economicaly, we can see that during this period, the tornados were been the most expansive with the flash flood and the TSTM winds. But if we take for each occurence, the coastal erosion and the heavy rain and flood were been the most expansive.

Data Processing

We start by downloaded the data from the website and store it in the variable “data”. Then, we change the class of BGN_DATE and END_DATE in date class to get when the datas have been take.

url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,destfile="meteo.csv.bz2")
data<-read.table(file="meteo.csv.bz2",header=TRUE,sep=",")
data$BGN_DATE<-as.Date(as.character(data$BGN_DATE),"%m/%d/%Y")

After obtain the dates, we load the package “dplyr” and subset the initial data to keep only the FATALATIES, the INJURIES and the EVTYPE variable which we aggregate by events types in summing its.

max(data$BGN_DATE)
## [1] "2011-11-30"
min(data$BGN_DATE)
## [1] "1950-01-03"
library("dplyr")
## Warning: package 'dplyr' was built under R version 3.4.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
harmful<-select(data,EVTYPE,FATALITIES:INJURIES)
sum_harmful<-aggregate(.~EVTYPE,data=harmful,FUN=sum)

Then, we filter to keep only the events which are no null fatalities AND injuries. We choose randomly five color, and arrange our subset to get the max of fatalities. We keep the names of event in a variable.

sum_harmful<-filter(sum_harmful,FATALITIES>0 & INJURIES >0)
## Warning: package 'bindrcpp' was built under R version 3.4.1
color=c("blue","white","red","black","green")
fat<-arrange(sum_harmful,desc(FATALITIES))
name_fat<-head(fat,5)$EVTYPE

We make the same thing for injuries.

inj<-arrange(sum_harmful,desc(INJURIES))
name_inj<-head(inj,5)$EVTYPE

In the same way, we make the means.

mean_harmful<-aggregate(.~EVTYPE,data=harmful,FUN=mean,na.rm=TRUE,digits=2)
mean_harmful<-filter(mean_harmful,FATALITIES>0 & INJURIES >0)

fatm<-arrange(mean_harmful,desc(FATALITIES))
name_fatm<-head(fatm,5)$EVTYPE
injm<-arrange(mean_harmful,desc(INJURIES))
name_injm<-head(injm,5)$EVTYPE

eco<-select(data,EVTYPE,PROPDMG)
sum_eco<-aggregate(.~EVTYPE,data=eco,FUN=sum)
sum_eco<-filter(sum_eco,PROPDMG >0)

eco_fat<-arrange(sum_eco,desc(PROPDMG ))
name_eco_sum<-head(eco_fat,5)$EVTYPE
mean_eco<-aggregate(.~EVTYPE,data=eco,FUN=mean,na.rm=TRUE,digits=2)
mean_eco<-filter(mean_eco,PROPDMG >0)

eco_fatm<-arrange(mean_eco,desc(PROPDMG ))
name_eco_mean<-head(eco_fatm,5)$EVTYPE

Results

So, we see that the observations are on 61 years.

max(data$BGN_DATE)
## [1] "2011-11-30"
min(data$BGN_DATE)
## [1] "1950-01-03"
harmful<-select(data,EVTYPE,FATALITIES:INJURIES)
sum_harmful<-aggregate(.~EVTYPE,data=harmful,FUN=sum)
y<-dim(sum_harmful)
sum_harmful<-filter(sum_harmful,FATALITIES>0 & INJURIES >0)
x<-dim(sum_harmful)

0.1076142, 1 of the data is meaningful.

For the fatalities and injuries, we obtain the 5 most deadly and harmful during this period that we can compare on a graph.

head(fat,5)
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3    FLASH FLOOD        978     1777
## 4           HEAT        937     2100
## 5      LIGHTNING        816     5230
head(inj,5)
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5633    91346
## 2      TSTM WIND        504     6957
## 3          FLOOD        470     6789
## 4 EXCESSIVE HEAT       1903     6525
## 5      LIGHTNING        816     5230
par(mfrow=c(1,2))
barplot(head(fat$FATALITIES,5),col=color)
legend("topright",legend=name_fat,fill=color)
barplot(head(inj$INJURIES,5),col=color)
legend("topright",legend=name_inj,fill=color)
title(main="Number of dies and injuries for the main harmfuls events, between 1950 and 2011 across the US",outer=TRUE,line=-2)

We observe now the occurence of fatalities and injuries for each event that we compare on a graph.

head(fatm,5)
##                  EVTYPE FATALITIES  INJURIES
## 1 TROPICAL STORM GORDON   8.000000 43.000000
## 2          EXTREME HEAT   4.363636  7.045455
## 3     HEAT WAVE DROUGHT   4.000000 15.000000
## 4         MARINE MISHAP   3.500000  2.500000
## 5         WINTER STORMS   3.333333  5.666667
head(injm,5)
##                    EVTYPE FATALITIES INJURIES
## 1   TROPICAL STORM GORDON       8.00     43.0
## 2              WILD FIRES       0.75     37.5
## 3      HIGH WIND AND SEAS       3.00     20.0
## 4       HEAT WAVE DROUGHT       4.00     15.0
## 5 WINTER STORM HIGH WINDS       1.00     15.0
par(mfrow=c(1,2))
barplot(head(fatm$FATALITIES,5),col=color)
legend("topright",legend=name_fatm,fill=color)
barplot(head(injm$INJURIES,5),col=color)
legend("topright",legend=name_injm,fill=color)
title(main="Mean of dies and injuries by event, between 1950 and 2011 across the US",outer=TRUE,line=-2)

Finaly, we want to observe the economic consequences of the events. By the same way

sum_eco<-aggregate(.~EVTYPE,data=eco,FUN=sum)
x<-dim(sum_eco)
sum_eco<-filter(sum_eco,PROPDMG >0)
y<-dim(sum_eco)

We keep 0.4121827, 1 of the data. We get the top 5 expansive events during this period and by occurence, that we compare on a graph.

head(eco_fat,5)
##              EVTYPE   PROPDMG
## 1           TORNADO 3212258.2
## 2       FLASH FLOOD 1420124.6
## 3         TSTM WIND 1335965.6
## 4             FLOOD  899938.5
## 5 THUNDERSTORM WIND  876844.2
head(eco_fatm,5)
##                   EVTYPE PROPDMG
## 1        COASTAL EROSION     766
## 2   HEAVY RAIN AND FLOOD     600
## 3 RIVER AND STREAM FLOOD     600
## 4              Landslump     570
## 5  BLIZZARD/WINTER STORM     500
par(mfrow=c(2,1))
barplot(head(eco_fat$PROPDMG,5),main="Sum of expanse caused by event, between 1950 and 2011 across the US",col=color)
legend("topright",legend=name_eco_sum,fill=color)
barplot(head(eco_fatm$PROPDMG,5),ylim=c(0,4000),main="Mean of expanse by event, betwwen 1950 and 2011 across the US",col=color)
legend("topright",legend=name_eco_mean,fill=color)