Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

Loading Library

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.3
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Loading Data

# if (!file.exists("c:/coursera/storm.csv.bz2")) {
#     download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
#         "c:/coursera/storm.csv.bz2")
# }
# # unzip file
# if (!file.exists("c:/coursera/storm.csv")) {
#     library(R.utils)
#     bunzip2("c:/coursera/storm.csv.bz2", "c:/coursera/storm.csv", remove = FALSE)
# }
# # load data into R
#storm_data <- read.csv("c:/coursera/storm.csv")
storm_data<- read.csv("repdata_data_StormData.csv")
dim(storm_data)
## [1] 902297     37
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

There are 902297 rows and 37 columns in storm data.

Exacting the data contain weather event, health and economic impact data

head(storm_data,2)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
## 2         NA         0                         2   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of reliable/complete records.

Adding a year valiable

storm_data$year <- as.numeric(format(as.Date(storm_data$BGN_DATE,format="%m/%d/%Y %H:%M:%S"), "%Y"))

ploting the storm by year

ggplot(aes(x=year,y=),data=storm_data)+geom_histogram(color="red",fill="green")+
                  xlab("Year")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Warning: position_stack requires constant width: output may be incorrect

Based on the above histogram we can see that the event track beging increasing on 1990. so we will subset the data from 1990 to 2011
storm_subset<- storm_data%>%filter(year>=1990)
dim(storm_subset)
## [1] 751740     38

Impact on Public Health

In this section we will checke the no fatalities and injurieshat are caused by the severe weather events. We would like to get the first 10 most severe types of weather events.

storm_helper <- function(type, top=10, inputdata=storm_subset){
  
          col_id<-which(colnames(inputdata)==type)
          fields<- aggregate(inputdata[,col_id],by=list(inputdata$EVTYPE),FUN=sum)
          names(fields)<- c("EVTYPE",type)
          fields<- arrange(fields,desc(fields$type))
          fields <- head(fields, n = top)
          fields <- within(fields, EVTYPE <- factor(x = EVTYPE, levels = fields$EVTYPE))
          return(fields)
}

fatalities <- storm_helper("FATALITIES")
injuries <- storm_helper("INJURIES")

Impact on Economy

We will convert the property damage and crop damage data into comparable numerical forms.Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).

convertHelper <- function (fieldName,newFieldName, dataset=storm_subset){
          totalLen <- dim(dataset)[2]
          index <- which(colnames(dataset) == fieldName)
          dataset[, index] <- as.character(dataset[, index])
          dataset[toupper(dataset[, index]) == "B", index] <- "9"
          dataset[toupper(dataset[, index]) == "M", index] <- "6"
          dataset[toupper(dataset[, index]) == "K", index] <- "3"
          dataset[toupper(dataset[, index]) == "H", index] <- "2"
          dataset[toupper(dataset[, index]) ==  "", index] <- "0"
          dataset[, index] <- as.numeric(dataset[, index])
          dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
          names(dataset)[totalLen + 1] <- newFieldName
          return(dataset)
          head(dataset,1)
}

storm_subset<- convertHelper("PROPDMGEXP","propertyDamage")
## Warning in convertHelper("PROPDMGEXP", "propertyDamage"): NAs introduced by
## coercion
storm_subset<- convertHelper("CROPDMGEXP","cropDamage")
## Warning in convertHelper("CROPDMGEXP", "cropDamage"): NAs introduced by
## coercion
Propety<- storm_helper("propertyDamage")
crop <- storm_helper("cropDamage")

Result

As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.

fatalities
##            EVTYPE FATALITIES
## 1  EXCESSIVE HEAT       1903
## 2         TORNADO       1752
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6           FLOOD        470
## 7     RIP CURRENT        368
## 8       TSTM WIND        327
## 9       HIGH WIND        248
## 10      AVALANCHE        224
injuries   
##               EVTYPE INJURIES
## 1            TORNADO    26674
## 2              FLOOD     6789
## 3     EXCESSIVE HEAT     6525
## 4          LIGHTNING     5230
## 5          TSTM WIND     5022
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10      WINTER STORM     1321

following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.

fatalitiesplot <- ggplot(aes(x=EVTYPE,weight = FATALITIES,stat = "identity"),data=fatalities)+geom_bar()+
  scale_y_continuous("Number of Fatalities") + 
  ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1990 - 2011")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
  
injuriesplot<-  ggplot(aes(x=EVTYPE,weight =INJURIES ),data = injuries)+geom_histogram()+
  scale_y_continuous("Number of Injuries") + 
  ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1990 - 2011")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

grid.arrange(fatalitiesplot, injuriesplot, ncol = 2)

Based on the above histograms, we find that excessive heat and tornado cause most fatalities; tornato causes most injuries in the United States from 1990 to 2011.

As for the impact on economy, we have got two sorted lists below by the amount of money cost by damages.

Propety
##               EVTYPE propertyDamage
## 1              FLOOD   144657709807
## 2  HURRICANE/TYPHOON    69305840000
## 3        STORM SURGE    43323536000
## 4          HURRICANE    11868319010
## 5     TROPICAL STORM     7703890550
## 6       WINTER STORM     6688497251
## 7        RIVER FLOOD     5118945500
## 8           WILDFIRE     4765114000
## 9   STORM SURGE/TIDE     4641188000
## 10         TSTM WIND     4484928495
crop
##               EVTYPE  cropDamage
## 1            DROUGHT 13972566000
## 2              FLOOD  5661968450
## 3        RIVER FLOOD  5029459000
## 4          ICE STORM  5022113500
## 5               HAIL  3025954473
## 6          HURRICANE  2741910000
## 7  HURRICANE/TYPHOON  2607872800
## 8        FLASH FLOOD  1421317100
## 9       EXTREME COLD  1292973000
## 10      FROST/FREEZE  1094086000

The following is are graphs of total property damage and total crop damage affected by these severe weather events.

propertyplot <- ggplot(aes(x=EVTYPE,weight = propertyDamage),data=Propety)+geom_bar()+
  scale_y_continuous("Property Damage in US dollars") + 
  ggtitle("Total Property Damage by Severe Weather\n Events in the U.S.\n from 1990 - 2011")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  xlab("Severe Weather Type")
  
cropplot<-  ggplot(aes(x=EVTYPE,weight =cropDamage ),data = crop)+geom_histogram()+
  scale_y_continuous("Crop Damage in US dollars") + 
  ggtitle("Total Crop Damage by Severe Weather\n Events in the U.S.\n from 1990 - 2011")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

grid.arrange(propertyplot, cropplot, ncol = 2)

Based on the histograms above, we find that flood and hurricane/typhoon cause most property damage; drought and flood causes most crop damage in the United States from 1990 to 2011.

Conclusion

From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic impact.