Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site: Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.


CONTENTS Synopsis Data Processing +Download data +LOading data +Make the data tidy +Explore data *Results

Synopsis

The NOAA Storm Database contains information that can help us assess severe weather in terms of economic impact and public safety. This may help us improve our planning and preparation for severe weather. Specifically, we want to answer the following questions:

1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

2.Across the United States, which types of events have the greatest economic consequences?

Data Processing

Download data

if(!file.exists("stormData1.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                destfile = "stormData1.csv.bz2")
}

LOading data

The data s zip by bz2, so load the package “R.utils”,and use the function “bunzip2”

library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## R.utils v2.7.0 successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
bunzip2("stormData1.csv.bz2", "stormData.csv", remove = FALSE, skip = TRUE)
## [1] "stormData.csv"
## attr(,"temporary")
## [1] FALSE
storm <- read.csv("stormData.csv", 
                           header = TRUE, 
                           quote = "", 
                           strip.white=TRUE,
                           stringsAsFactors = F,
                           sep = ",")

Make the data tidy

The data s huge and mass, and there are loads of ways to preprocessing the data . The way i dealed with it , probably is not really good ,but still can show what i need . Next, the R file here wont show all the preprocessing coz it will be way too long . If u r insterested about it , feel free to check out my blog : Storm Data preprocessing

## Warning: 强制改变过程中产生了NA

## Warning: 强制改变过程中产生了NA

## Warning: 强制改变过程中产生了NA

STORM DATA PREPROCESSING AND u can leave ur advice there. After preprocessing ,here s the data looks like :

dim(healthData)
## [1] 2644    5
str(healthData)
## 'data.frame':    2644 obs. of  5 variables:
##  $ X.EVTYPE.  : chr  "\"TORNADO\"" "\"TORNADO\"" "\"TORNADO\"" "\"TORNADO\"" ...
##  $ X.BGN_DATE.: chr  "2/13/1952 0:00:00" "2/13/1952 0:00:00" "3/22/1952 0:00:00" "2/20/1953 0:00:00" ...
##  $ FATALITIES : num  1 1 4 1 6 7 2 5 25 2 ...
##  $ INJURIES   : num  14 26 50 8 195 12 3 20 200 90 ...
##  $ total      : num  15 27 54 9 201 19 5 25 225 92 ...
dim(propData)
## [1] 239412      5

The healthData ,i only keep the Injuries and Fatality part >0

Explore data

totFatalities  <- aggregate(healthData$FATALITIES, by = list(healthData$X.EVTYPE.), "sum")
totInjuries<-aggregate(healthData$INJURIES,by=list(healthData$X.EVTYPE.),"sum")
tottal<-aggregate(healthData$total,by=list(healthData$X.EVTYPE.),"sum")

topFatalities <-totFatalities[order(-totFatalities$x), ][1:10, ]
topInjuries <- totInjuries[order(-totInjuries$x),][1:10,]
topTotal<-tottal[order(-tottal$x),][1:10,]

totalData<-aggregate(propData$PropertyDamage,by=list(propData$X.EVTYPE.),"sum")
topprop<-totalData[order(-totalData$x),][1:10,]


topFatalities
##             Group.1    x
## 70        "TORNADO" 5199
## 11 "EXCESSIVE HEAT"  402
## 50      "LIGHTNING"  283
## 73      "TSTM WIND"  199
## 16    "FLASH FLOOD"  171
## 17          "FLOOD"  104
## 39      "HIGH WIND"   97
## 83   "WINTER STORM"   85
## 30           "HEAT"   73
## 80       "WILDFIRE"   55
topInjuries
##                Group.1     x
## 70           "TORNADO" 59729
## 11    "EXCESSIVE HEAT"  4791
## 17             "FLOOD"  2679
## 44         "ICE STORM"  1720
## 30              "HEAT"  1420
## 43 "HURRICANE/TYPHOON"  1219
## 3           "BLIZZARD"   718
## 50         "LIGHTNING"   649
## 73         "TSTM WIND"   646
## 16       "FLASH FLOOD"   641
topTotal
##                Group.1     x
## 70           "TORNADO" 64928
## 11    "EXCESSIVE HEAT"  5193
## 17             "FLOOD"  2783
## 44         "ICE STORM"  1755
## 30              "HEAT"  1493
## 43 "HURRICANE/TYPHOON"  1251
## 50         "LIGHTNING"   932
## 73         "TSTM WIND"   845
## 16       "FLASH FLOOD"   812
## 3           "BLIZZARD"   766
topprop
##                 Group.1            x
## 67              "FLOOD" 144621292807
## 182 "HURRICANE/TYPHOON"  69305840000
## 342           "TORNADO"  56851050779
## 286       "STORM SURGE"  43323536000
## 56        "FLASH FLOOD"  16115317867
## 114              "HAIL"  15731737043
## 181         "HURRICANE"  11868319010
## 350    "TROPICAL STORM"   7703890550
## 406      "WINTER STORM"   6688497251
## 163         "HIGH WIND"   5268996295

Results

plot health

par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 3, cex = 0.7, cex.main = 1.4, cex.lab = 1.2)
barplot(topFatalities$x, names.arg = topFatalities$Group.1, col = 'red',
        main = 'Top 10 Weather Events for Fatalities', ylab = 'Number of Fatalities')
barplot(topInjuries$x, names.arg = topInjuries$Group.1, col = 'orange',
        main = 'Top 10 Weather Events for Injuries', ylab = 'Number of Injuries')

barplot(topTotal$x, names.arg = topTotal$Group.1, col = 'blue',
        main = 'Top 10 Weather Events for Injuries & Fatalities', ylab = 'Number of total')

As u can see from the plot above,Across the United States, TORNADO ,EXCESSIVE HEAT,FLOOD,** LIGHTNING** are most harmful with respect to population health

plot proper

barplot(topprop$x, names.arg = topprop$Group.1, col = 'blue',
        main = 'Top 10 Weather Events for properity', ylab = 'Number of properity')

As u can see from the plot above,Across the United States, FLOOD ,HURRICANE/TYPHOON,TORNADO,STORM SURGE,FLASH FLOODhave the greatest economic consequences