Storms and other severe weather events can cause both public health and economic problems.
Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
By exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and analyzing Population Health and economic consequences brought by severe weather, we found out that
1.Download the file and put the file in the data folder
if(!file.exists("./data")){dir.create("./data")}
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="./data/Dataset.csv.bz2",method="curl")
2.load Data
library(data.table)
Data <- read.csv(bzfile("./data/Dataset.csv.bz2"), stringsAsFactors=FALSE)
Data<- data.table(Data)
Rename the variables to lowercase for ease of coding.
oldnames<-names(Data)
newnames<-tolower(names(Data))
setnames(Data,oldnames,newnames)
str(Data)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ state__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ bgn_date : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ bgn_time : chr "0130" "0145" "1600" "0900" ...
## $ time_zone : chr "CST" "CST" "CST" "CST" ...
## $ county : num 97 3 57 89 43 77 9 123 125 57 ...
## $ countyname: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ state : chr "AL" "AL" "AL" "AL" ...
## $ evtype : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ bgn_range : num 0 0 0 0 0 0 0 0 0 0 ...
## $ bgn_azi : chr "" "" "" "" ...
## $ bgn_locati: chr "" "" "" "" ...
## $ end_date : chr "" "" "" "" ...
## $ end_time : chr "" "" "" "" ...
## $ county_end: num 0 0 0 0 0 0 0 0 0 0 ...
## $ countyendn: logi NA NA NA NA NA NA ...
## $ end_range : num 0 0 0 0 0 0 0 0 0 0 ...
## $ end_azi : chr "" "" "" "" ...
## $ end_locati: chr "" "" "" "" ...
## $ length : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ width : num 100 150 123 100 150 177 33 33 100 100 ...
## $ f : int 3 2 2 2 2 2 2 1 3 3 ...
## $ mag : num 0 0 0 0 0 0 0 0 0 0 ...
## $ fatalities: num 0 0 0 0 0 0 0 0 1 0 ...
## $ injuries : num 15 0 2 2 2 6 1 0 14 0 ...
## $ propdmg : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ propdmgexp: chr "K" "K" "K" "K" ...
## $ cropdmg : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cropdmgexp: chr "" "" "" "" ...
## $ wfo : chr "" "" "" "" ...
## $ stateoffic: chr "" "" "" "" ...
## $ zonenames : chr "" "" "" "" ...
## $ latitude : num 3040 3042 3340 3458 3412 ...
## $ longitude : num 8812 8755 8742 8626 8642 ...
## $ latitude_e: num 3051 0 0 0 0 ...
## $ longitude_: num 8806 0 0 0 0 ...
## $ remarks : chr "" "" "" "" ...
## $ refnum : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
Find related variables
By reading National Weather Service Storm Data Documentation, variables releated to Population Health and Economic Consequences are listed as follows
| Variable name | Description |
|---|---|
| evtype | Event Type |
| fatalities | Number of occurrences of death: related to Population Health |
| injuries | Number of occurrences of injuries: related to Population Health |
| propdmg | Value of Property Damage with four levels: related to Economic Consequences |
| propdmgexp | levels of Property Damage (“B”, “M”, “K”,“H”) : related to Economic Consequences |
| cropdmg | Value of Crop Damage with three levels: related to Economic Consequences |
| cropdmgexp | levels of Crop Damage (“B”, “M”, K“) : related to Economic Consequences |
Note: “H” for hundreds, “K” for thousands, “M” for millions, and “B” for billions.
Scale the property damage variable propdmg
propdmgexp = B, then multiply propdmg by 1,000,000,000propdmgexp = M, then multiply propdmg by 1,000,000propdmgexp = K, then multiply propdmg by 1,000cropdmgexp = H, then multiply cropdmg by 100propdmg = NAData <- Data[, propdmgexp := toupper(propdmgexp)]
Data[, .N, propdmgexp]
## propdmgexp N
## 1: K 424665
## 2: M 11337
## 3: 465934
## 4: B 40
## 5: + 5
## 6: 0 216
## 7: 5 28
## 8: 6 4
## 9: ? 8
## 10: 4 4
## 11: 2 13
## 12: 3 4
## 13: H 7
## 14: 7 5
## 15: - 1
## 16: 1 25
## 17: 8 1
Data<- Data[, propdmg := ifelse(propdmgexp == "B", propdmg * 1E9,
ifelse(propdmgexp == "M", propdmg * 1E6,
ifelse(propdmgexp == "K", propdmg * 1E3,
ifelse(propdmgexp == "H", propdmg * 1E2, NA))))]
summary(Data$propdmg)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00e+00 0.00e+00 1.00e+03 9.80e+05 1.00e+04 1.15e+11 466248
Scale the property damage variable cropdmg
cropdmgexp = B, then multiply cropdmg by 1,000,000,000cropdmgexp = M, then multiply cropdmg by 1,000,000cropdmgexp = K, then multiply cropdmg by 1,000cropdmg = NAData <- Data[, cropdmgexp := toupper(cropdmgexp)]
Data[, .N, cropdmgexp]
## cropdmgexp N
## 1: 618413
## 2: M 1995
## 3: K 281853
## 4: B 9
## 5: ? 7
## 6: 0 19
## 7: 2 1
Data <- Data[, cropdmg := ifelse(cropdmgexp == "B", cropdmg * 1E9,
ifelse(cropdmgexp == "M", cropdmg * 1E6,
ifelse(cropdmgexp == "K", cropdmg * 1E3, NA)))]
summary(Data$cropdmg)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00e+00 0.00e+00 0.00e+00 1.73e+05 0.00e+00 5.00e+09 618440
Subset storm dataset
tidyData<-subset(Data,select = c('evtype','fatalities','injuries', 'propdmg',
'propdmgexp', 'cropdmg', 'cropdmgexp'))
Rename variables by discriptive names
oldnames<-names(tidyData)
newnames<-c('eventType','fatalities','injuries', 'propertyDamage', 'propertyDamageLevel', 'cropDamage', 'cropDamageLevel')
setnames(tidyData,oldnames,newnames)
str(tidyData)
## Classes 'data.table' and 'data.frame': 902297 obs. of 7 variables:
## $ eventType : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ fatalities : num 0 0 0 0 0 0 0 0 1 0 ...
## $ injuries : num 15 0 2 2 2 6 1 0 14 0 ...
## $ propertyDamage : num 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
## $ propertyDamageLevel: chr "K" "K" "K" "K" ...
## $ cropDamage : num NA NA NA NA NA NA NA NA NA NA ...
## $ cropDamageLevel : chr "" "" "" "" ...
## - attr(*, ".internal.selfref")=<externalptr>
fatalitiesData <- aggregate(fatalities ~ eventType, data=tidyData, sum)
fatalitiesData<-fatalitiesData[order(-fatalitiesData$fatalities), ][1:10, ]
fatalitiesData$eventType <- factor(fatalitiesData$eventType, levels = fatalitiesData$eventType)
str(fatalitiesData)
## 'data.frame': 10 obs. of 2 variables:
## $ eventType : Factor w/ 10 levels "TORNADO","EXCESSIVE HEAT",..: 1 2 3 4 5 6 7 8 9 10
## $ fatalities: num 5633 1903 978 937 816 ...
library(ggplot2)
ggplot(fatalitiesData, aes(x = eventType, y = fatalities)) +
geom_bar(stat = "identity", fill = "blue", las = 3) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Fatalities") +
ggtitle("Top 10 Harmful Weather Events Evaluated by Fatalities")
Analysis:
From the above plot, Tornado is the most harmful event type evaluated by number of fatalitie.
The top 3 Harmful Weather Events evaluated by number of injuries are Tornado,Excessive Heat and Flash Flood
The number of fatalitie caused by Tornado is far more than those of the other Top 10 Harmful Weather Events.
injuriesData <- aggregate(injuries ~ eventType, data=tidyData, sum)
injuriesData<-injuriesData[order(-injuriesData$injuries), ][1:10, ]
injuriesData$eventType <- factor(injuriesData$eventType, levels = injuriesData$eventType)
str(injuriesData)
## 'data.frame': 10 obs. of 2 variables:
## $ eventType: Factor w/ 10 levels "TORNADO","TSTM WIND",..: 1 2 3 4 5 6 7 8 9 10
## $ injuries : num 91346 6957 6789 6525 5230 ...
ggplot(injuriesData, aes(x = eventType, y = injuries)) +
geom_bar(stat = "identity", fill = "blue", las = 3) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Injuries") + ggtitle("Top 10 Harmful Weather Events Evaluated by Injuries")
Analysis:
From the above plot, Tornado is the most harmful event type evaluated by number of injuries.
The top 3 Harmful Weather Events evaluated by number of injuries are Tornado,TSTM Wind and Flood
The number of injuries caused by Tornado is far more than those of the other Top 10 Harmful Weather Events.
Economic consequences include Crop Damages and Property Damages.
Crop Damages and Property Damages often occur at the same time.
Therefore, the sum of Crop Damages and Property Damages will be used to evaluate the harmful weather events.
1 Subset the dataset to get the top 10 harmful event types evaluated by sum of crop & property damages
damagesData <- aggregate(propertyDamage + cropDamage ~ eventType, data=tidyData, sum)
names(damagesData)<- c('eventType', 'totalDamages' )
damagesData<-damagesData[order(-damagesData$totalDamages), ][1:10, ]
damagesData$eventType <- factor(damagesData$eventType, levels = damagesData$eventType)
str(damagesData)
## 'data.frame': 10 obs. of 2 variables:
## $ eventType : Factor w/ 10 levels "FLOOD","HURRICANE/TYPHOON",..: 1 2 3 4 5 6 7 8 9 10
## $ totalDamages: num 1.38e+11 2.93e+10 1.65e+10 1.24e+10 1.01e+10 ...
ggplot(damagesData, aes(x = eventType, y = totalDamages)) +
geom_bar(stat = "identity", fill = "blue", las = 3) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Property & Crop Damages") +
ggtitle("Top 10 Harmful Weather Events Evaluated \n by Property & Crop Damages ")
Analysis: From the above plot, Flood is the most harmful event type evaluated by Crop Damages and Property Damages
The top 3 Harmful Weather Events evaluated by Crop Damages and Property Damages are Flood, Hurricane/Typhoon and Tornado
Crop Damages and Property Damages caused by Tornado are far more than those of the other Top 10 Harmful Weather Events.