Analysis of severe weather phenomena ranging from 1950 till November 2011. The results show that
library(data.table)
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2','NOAA.csv.bz2', method='curl')
data <- as.data.table(read.csv("NOAA.csv.bz2"))
Checking the characteristics of data variables in the data table
str(data)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "000","0000","0001",..: 152 167 2645 1563 2524 3126 122 1563 3126 3126 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 826 826 826 826 826 826 826 826 826 826 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","+","-","0",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","0","2","?",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
Considering data from January 2000 onwards. Changing the variable BGN_DATE (class: factor) to date format.
data$BGN_DATE <- as.POSIXct(strptime(as.character(data$BGN_DATE), "%m/%d/%Y %H:%M:%S"))
Subsetting the data based on time period (January 2000 to November 2011)
required.data <- subset(data, data$BGN_DATE > as.POSIXct("1999-12-31"))
casualties <- required.data[, lapply(.SD, sum), by = EVTYPE, .SDcols = c("FATALITIES",
"INJURIES")]
summary(casualties)
## EVTYPE FATALITIES INJURIES
## HIGH SURF ADVISORY: 1 Min. : 0.00 Min. : 0.0
## FLASH FLOOD : 1 1st Qu.: 0.00 1st Qu.: 0.0
## TSTM WIND : 1 Median : 0.00 Median : 0.0
## WATERSPOUT : 1 Mean : 30.58 Mean : 179.2
## ABNORMALLY DRY : 1 3rd Qu.: 2.25 3rd Qu.: 5.0
## ABNORMALLY WET : 1 Max. :1193.00 Max. :15213.0
## (Other) :190
Plotting the variables Fatalities and Injuries against each other on a scatterplot, we’ll get to see the most harmful events on the upper right side of the plot
par(mfrow = c(1, 1))
plot(casualties$FATALITIES, casualties$INJURIES, ylim = c(0, 16000), xlim = c(0,
1300), pch = 19, col = rgb(0, 1, 0, 0.5), cex = 1.5, xlab = "number of fatalities",
ylab = "number of injuries", main = "Fatalities and Injuries due to severe weather events")
casualties.sub <- casualties[which(casualties$FATALITIES > 300)]
text(casualties.sub$FATALITIES + 50, casualties.sub$INJURIES + 500, labels = casualties.sub$EVTYPE,
cex = 0.75)
Figure 1: Scatter plot with the number of injuries and fatalities due to severe weather events between 2000 and 2011.
casualties.sub
## EVTYPE FATALITIES INJURIES
## 1: FLASH FLOOD 600 812
## 2: TORNADO 1193 15213
## 3: EXCESSIVE HEAT 1013 3708
## 4: LIGHTNING 466 2993
## 5: RIP CURRENT 340 208
Conclusion: From the plot, we can see that there are 5 events more harmful than the others, causing more than 3000 fatalities. These are tornadoes, excessive heat, lightning, flash floods rip currents.
Among these, tornadoes are the most devastating and harmful weather event, having caused around 1200 fatalities and over 15000 injuries during this period.
For this question, we have selected the following four variables
Now we generate numerical value to denote the combined values (in billions USD) of these damages (Property + Crop).
library(car)
required.data$PROPDMG.B <- recode(as.character(required.data$PROPDMGEXP), "'K'=1e-6; 'M'=1e-3;'B'=1;''=0")
required.data$PROPDMG.B <- required.data$PROPDMG.B * required.data$PROPDMG
required.data$CROPDMG.B <- recode(as.character(required.data$CROPDMGEXP), "'K'=1e-6; 'M'=1e-3;'B'=1;''=0")
required.data$CROPDMG.B <- required.data$CROPDMG.B * required.data$CROPDMG
Now we create the subset of the data to visualize events causing major economic loses during this period.
event.cost <- required.data[, lapply(.SD, sum), by = EVTYPE, .SDcols = c("PROPDMG.B",
"CROPDMG.B")]
summary(event.cost)
## EVTYPE PROPDMG.B CROPDMG.B
## HIGH SURF ADVISORY: 1 Min. : 0.00000 Min. :0.0000
## FLASH FLOOD : 1 1st Qu.: 0.00000 1st Qu.:0.0000
## TSTM WIND : 1 Median : 0.00000 Median :0.0000
## WATERSPOUT : 1 Mean : 1.68774 Mean :0.1204
## ABNORMALLY DRY : 1 3rd Qu.: 0.00157 3rd Qu.:0.0000
## ABNORMALLY WET : 1 Max. :134.69107 Max. :9.1356
## (Other) :190
Now we plot the variables Crop damage against Property damage to understand which weather phenomena resulted in the highest economic loss.
plot(event.cost$PROPDMG.B, event.cost$CROPDMG.B, ylim = c(0, 10), xlim = c(0,
150), pch = 19, col = rgb(0, 1, 0, 0.5), cex = 1.5, xlab = "Property Damage (Billions of USD)",
ylab = "Crop Damage(Billions of USD)", main = "Economic losses due to severe weather phenomena")
event.cost.sub <- event.cost[which(event.cost$PROPDMG.B > 40)]
text(event.cost.sub$PROPDMG.B, event.cost.sub$CROPDMG.B + 0.5, labels = event.cost.sub$EVTYPE,
cex = 0.75)
Figure 2. Scatter plot to show the economic loss cause due to severe weather phenomena in the time period (January 2000 to November 2011)
event.cost.sub
## EVTYPE PROPDMG.B CROPDMG.B
## 1: STORM SURGE 43.17094 0.000000
## 2: FLOOD 134.69107 4.221934
## 3: HURRICANE/TYPHOON 69.30584 2.607873
Conclusion: Severe weather events viz. Storm Surge, Hurricanes / Typhoons & Floods cause severe economic loses to the US, with Floods causing loses to the north of 135 Bn USD. While Floods are causing overall economic loss, Droughts are causing a very economic damages to crops.