The aim of this research work is to analyse the impact of storm towards population health and economic consequences in US. The data is obtained from the US NOAA Storm Database. This work has two specific research questions:
Loading required libraries
library(data.table)
library(ggplot2)
library(R.utils)
Downloading and loading dataset into stormData variable. The data is downloaded from here.
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "repdata-data-StormData.csv.bz2")
bunzip2("repdata-data-StormData.csv.bz2")
stormData <- read.csv("repdata-data-StormData.csv")
stormData<-data.table(stormData)
In this section, two detailed analysis works are reported.
In the first analysis work, the top 5 event types that are most harmful to US population health are listed and ploted.
Data preparation: Extracting the appropriate columns.
The original dataset was transformed by subsetting data with relevant columns only. The columns are “EVTYPE”, “FATALITIES”, and “INJURIES”. After that, a new table named “dt.event.health” is created to capture the total fatalities, injuries grouped by the event type.
stormData.event.health<-stormData[,list(EVTYPE,FATALITIES,INJURIES)]
dt.event.health<-stormData.event.health[,list(sum(FATALITIES),sum(INJURIES),sum(FATALITIES)+ sum(INJURIES)),by=list(EVTYPE)]
setnames(dt.event.health,"V1","Total.Fatalities")
setnames(dt.event.health,"V2","Total.Injuries")
setnames(dt.event.health,"V3","Total")
Order the dataset according to “Total” decreasing
dt.event.health<-dt.event.health[order(-rank(Total))]
Displaying the top 5 events
head(dt.event.health,5)
## EVTYPE Total.Fatalities Total.Injuries Total
## 1: TORNADO 5633 91346 96979
## 2: EXCESSIVE HEAT 1903 6525 8428
## 3: TSTM WIND 504 6957 7461
## 4: FLOOD 470 6789 7259
## 5: LIGHTNING 816 5230 6046
Plotting the top 5 events using ggplot
ggplot(data = dt.event.health[1:5], aes(x = EVTYPE, y= Total, fill=EVTYPE)) +
geom_bar(stat="identity") +
labs(x = "Event Type", y = "Count", color = "Event Type", fill="Event Type", title="Top 5 Events with Hightest Impact to Population Health in US") +
theme_bw()
The graph above shows that tornado has been the main event that greatly impact the health of US population. The second event is excessive heat, while lightning has ranked the 5th position.
In this analysis work, the top 5 event types that are most harmful to US economic are listed and ploted.
Data preparation: Extracting the appropriate columns.
stormData.event.eco<-stormData[,list(EVTYPE,PROPDMG,CROPDMG)]
dt.event.eco<-stormData.event.eco[,list(sum(PROPDMG),sum(CROPDMG),sum(PROPDMG)+ sum(CROPDMG)),by=list(EVTYPE)]
setnames(dt.event.eco,"V1","Total.Property")
setnames(dt.event.eco,"V2","Total.Crop")
setnames(dt.event.eco,"V3","Total")
dt.event.eco<-dt.event.eco[order(-rank(Total))]
Order the dataset according to “Total” decreasing
dt.event.health<-dt.event.health[order(-rank(Total))]
Displaying the top 5 events
head(dt.event.eco,5)
## EVTYPE Total.Property Total.Crop Total
## 1: TORNADO 3212258.2 100018.5 3312277
## 2: FLASH FLOOD 1420124.6 179200.5 1599325
## 3: TSTM WIND 1335965.6 109202.6 1445168
## 4: HAIL 688693.4 579596.3 1268290
## 5: FLOOD 899938.5 168037.9 1067976
Plotting the top 5 events using ggplot
ggplot(data = dt.event.eco[1:5], aes(x = EVTYPE, y= Total, fill=EVTYPE)) +
geom_bar(stat="identity") +
labs(x = "Event Type", y = "Count", color = "Event Type", fill="Event Type", title="Top 5 Events with Hightest Economic Consequences in US") +
theme_bw()
The graph above shows the top 5 events with hightest economic consequences in US. Tornado has again reported to be the main event with highest economic consequences in US. Like many countries, flash flood is one of the major problems to a nation’s economy.