The main aim of this project is to explore NOAA Storm Data and figure out a few facts about sever whether and their consequences. The results of this analysis can be used to alert the government and get themselves prepared.
In this section, I have read in the data and viewed a summary of the data and NAs in the data set.
stormdat <- read.csv("repdata%2Fdata%2FStormData.csv",na.strings = c("","NA"))
str(stormdat)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29600 levels "5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13512 1872 4597 10591 4371 10093 1972 23872 24417 4597 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 34 levels " N"," NW","E",..: NA NA NA NA NA NA NA NA NA NA ...
## $ BGN_LOCATI: Factor w/ 54428 levels "- 1 N Albion",..: NA NA NA NA NA NA NA NA NA NA ...
## $ END_DATE : Factor w/ 6662 levels "1/1/1993 0:00:00",..: NA NA NA NA NA NA NA NA NA NA ...
## $ END_TIME : Factor w/ 3646 levels " 0900CST"," 200CST",..: NA NA NA NA NA NA NA NA NA NA ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 23 levels "E","ENE","ESE",..: NA NA NA NA NA NA NA NA NA NA ...
## $ END_LOCATI: Factor w/ 34505 levels "- .5 NNW","- 11 ESE Jay",..: NA NA NA NA NA NA NA NA NA NA ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 18 levels "-","?","+","0",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 8 levels "?","0","2","B",..: NA NA NA NA NA NA NA NA NA NA ...
## $ WFO : Factor w/ 541 levels " CI","$AC","$AG",..: NA NA NA NA NA NA NA NA NA NA ...
## $ STATEOFFIC: Factor w/ 249 levels "ALABAMA, Central",..: NA NA NA NA NA NA NA NA NA NA ...
## $ ZONENAMES : Factor w/ 25111 levels " "| __truncated__,..: NA NA NA NA NA NA NA NA NA NA ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436773 levels "-2 at Deer Park\n",..: NA NA NA NA NA NA NA NA NA NA ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
colSums(is.na(stormdat))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME
## 0 0 0 0 0 1589
## STATE EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## 0 0 0 547332 287743 243411
## END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI END_LOCATI
## 238978 0 902297 0 724837 499225
## LENGTH WIDTH F MAG FATALITIES INJURIES
## 0 0 843563 0 0 0
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC
## 0 465934 0 618413 142069 248769
## ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## 594029 47 0 40 0 287433
## REFNUM
## 0
storm1 <- stormdat
names(storm1) <- tolower(names(storm1))
names(storm1)
## [1] "state__" "bgn_date" "bgn_time" "time_zone" "county"
## [6] "countyname" "state" "evtype" "bgn_range" "bgn_azi"
## [11] "bgn_locati" "end_date" "end_time" "county_end" "countyendn"
## [16] "end_range" "end_azi" "end_locati" "length" "width"
## [21] "f" "mag" "fatalities" "injuries" "propdmg"
## [26] "propdmgexp" "cropdmg" "cropdmgexp" "wfo" "stateoffic"
## [31] "zonenames" "latitude" "longitude" "latitude_e" "longitude_"
## [36] "remarks" "refnum"
There are two variables in the dataset with respect to human population health : fatalities and injuries. So, I have first added the 2 variables together. Then aggregation was done based on the combined variable. I have taken into consideration only the Top 6 causes, which makes sense because, they are the highest priority.
storm1$fatalities[which(is.na(storm1$fatalities))] <- 0
storm1$injuries[which(is.na(storm1$injuries))] <- 0
storm1$pop.damage <- storm1$fatalities + storm1$injuries
pop.fatalities <- aggregate(pop.damage ~ evtype,sum, data = storm1)
pop.fatalities <- head(pop.fatalities[order(pop.fatalities$pop.damage,decreasing = T),])
There are two variables in the dataset which have greatest economic consequence. They are Property damage and crop damage. But, the place values of the costs are in two different columns. So the place values are cleansed and combined into a single column. Finally the 2 columns with cost fue to prop damage and crop damage are added together into a single column and then the aggregation is done based on this variable. I have taken into consideration only the Top 6 causes, which makes sense because, they are the highest priority.
levels(storm1$propdmgexp)
## [1] "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K" "m"
## [18] "M"
levels(storm1$propdmgexp) <- c(0,0,0,0,10,
100,1000,10000,100000,
1000000,10000000,100000000,1000000000,
100,100,1000,1000000,1000000)
storm1$propdmgexp <- as.numeric(as.character(storm1$propdmgexp))
storm1$propdmgcount <- storm1$propdmg * storm1$propdmgexp
storm1$propdmgcount[which(is.na(storm1$propdmgcount))] <- 0
levels(storm1$cropdmgexp)
## [1] "?" "0" "2" "B" "k" "K" "m" "M"
levels(storm1$cropdmgexp) <- c(0,0,100,1000000000,1000,1000,1000000,1000000)
storm1$cropdmgexp <- as.numeric(as.character(storm1$cropdmgexp))
storm1$cropdmgcount <- storm1$cropdmg * storm1$cropdmgexp
storm1$cropdmgcount[which(is.na(storm1$cropdmgcount))] <- 0
storm1$damage <- storm1$propdmgcount + storm1$cropdmgcount
economic <- aggregate(damage ~ evtype, sum, data = storm1)
economic1 <- aggregate(damage ~ evtype, max, data = economic)
economic1 <- head(economic1[order(economic1$damage,decreasing = T),])
print(pop.fatalities)
## evtype pop.damage
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
library(ggplot2)
ggplot(pop.fatalities,aes(x=reorder(evtype, -pop.damage), y=pop.damage)) + geom_bar(stat = "identity") + xlab("Event Types") + ylab("Magnitude of Damage") + ggtitle("Most Harmful effects with respect to population health")
It can be seen that the most harmful event is “TORNADO”.
print(economic1)
## evtype damage
## 170 FLOOD 150319678250
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333590
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221670
## 153 FLASH FLOOD 18243990610
#Plotting the causes of most severe economic damages
ggplot(economic1,aes(x=reorder(evtype, -damage), y=damage)) + geom_bar(stat = "identity") + xlab("Event Types") + ylab("Cost of Damage") + ggtitle("Causes of Most Economic damage")
From the above plot the top 6 events that caused the most economic damage is shown.