Synopsis
Data Processing
Results
- Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
- Across the United States, which types of events have the greatest economic consequences?
Appendix A – Assignment and Evaluation Rubric

Synopsis

The NOAA Storm Database contains information that can help us assess severe weather in terms of economic impact and public safety. This may help us improve our planning and preparation for severe weather. Specifically, we want to answer the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Data Processing

Get the Data

Storm Data

## create a local directory for the data
localDir <- "data"
if (!file.exists(localDir)) {
  dir.create(localDir)
}

## download and unzip the data
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file <- paste(localDir,basename(url),sep='/')
if (!file.exists(file)) {
  download.file(url, file)
}

storm_data <- read.csv(file, as.is = TRUE)

## show contents of zip file
library(knitr)
downloaded_files <- as.data.frame(list.files(localDir))
names(downloaded_files) <- 'Files'
kable(downloaded_files)

Files

repdata%2Fdata%2FStormData.csv.bz2

Explore/Profile the Data

Additional information about the data is available from the National Weather Service: Storm Data Documentation

Data Dictionary

str(storm_data)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

library(dplyr)
library(lubridate)

storm_data$YEAR <- year(as.Date(storm_data$BGN_DATE, '%m/%d/%Y'))

Histogram of Data Density

hist(storm_data$YEAR, 62)

From the assignment:

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

For this analysis, we’ll focus on data collected for the years 2000 and later.

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

library(sqldf)

## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI

sql <- "
  select
    EVTYPE as Event,
    sum(FATALITIES) as Fatalities,
    sum(INJURIES) as Injuries
  from
    storm_data
  where
    YEAR >= 2000
  group by
    EVTYPE
  order by
    sum(FATALITIES) desc,
    sum(INJURIES) desc
"
tbl_ds <-sqldf::sqldf(sql)

## Loading required package: tcltk

library(knitr)
knitr::kable(tbl_ds[1:10,])

Event	Fatalities	Injuries
TORNADO	1193	15213
EXCESSIVE HEAT	1013	3708
FLASH FLOOD	600	812
LIGHTNING	466	2993
RIP CURRENT	340	208
FLOOD	266	315
HEAT	231	1222
AVALANCHE	179	126
HIGH WIND	131	677
THUNDERSTORM WIND	130	1400

The table above shows the ten most harmful events from the year 2000 forward, ordered first by the number of fatalities then by the number of injuries.

Across the United States, which types of events have the greatest economic consequences?

library(sqldf)
sql <- "
  select
    EVTYPE as Event,
    sum(PROPDMG)
  from
    storm_data
  where
    YEAR >= 2000
  group by
    EVTYPE
  order by
    sum(PROPDMG) desc
"
tbl_ds <-sqldf(sql)

## Loading required package: tcltk

names(tbl_ds) <- c('Event', 'Property Damage')
library(knitr)
knitr::kable(tbl_ds[1:10,])

Event	Property Damage
FLASH FLOOD	999333.42
TORNADO	907111.70
THUNDERSTORM WIND	862257.36
TSTM WIND	811528.22
FLOOD	671747.56
HAIL	452533.47
LIGHTNING	395884.69
HIGH WIND	247108.53
WINTER STORM	97093.93
WILDFIRE	83007.34

The table above shows the events with the highest cost of property damage, in thousands of US dollars.

Appendix A – Assignment and Evaluation Rubric

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Questions

Your data analysis must address the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

Requirements

For this assignment you will need some specific tools

RStudio: You will need RStudio to publish your completed analysis document to RPubs. You can also use RStudio to edit/write your analysis.

knitr: You will need the knitr package in order to compile your R Markdown document and convert it to HTML

Document Layout

Language: Your document should be written in English.

Title: Your document should have a title that briefly summarizes your data analysis

Synopsis: Immediately after the title, there should be a synopsis which describes and summarizes your analysis in at most 10 complete sentences.

There should be a section titled Data Processing which describes (in words and code) how the data were loaded into R and processed for analysis. In particular, your analysis must start from the raw CSV file containing the data. You cannot do any preprocessing outside the document. If preprocessing is time-consuming you may consider using the cache = TRUE option for certain code chunks.

There should be a section titled Results in which your results are presented.

You may have other sections in your analysis, but Data Processing and Results are required.

The analysis document must have at least one figure containing a plot.

Your analyis must have no more than three figures. Figures may have multiple plots in them (i.e. panel plots), but there cannot be more than three figures total.

You must show all your code for the work in your analysis document. This may make the document a bit verbose, but that is okay. In general, you should ensure that echo = TRUE for every code chunk (this is the default setting in knitr).

Publishing Your Analysis

For this assignment you will need to publish your analysis on RPubs.com. If you do not already have an account, then you will have to create a new account. After you have completed writing your analysis in RStudio, you can publish it to RPubs by doing the following:

In RStudio, make sure your R Markdown document (.Rmd) document is loaded in the editor

Click the Knit HTML button in the doc toolbar to preview your document.

In the preview window, click the Publish button.

Once your document is published to RPubs, you should get a unique URL to that document. Make a note of this URL as you will need it to submit your assignment.

NOTE: If you are having trouble connecting with RPubs due to proxy-related or other issues, you can upload your final analysis document file as a PDF to Coursera instead.

Submitting Your Assignment

In order to submit this assignment, you must copy the RPubs URL for your completed data analysis document in to the peer assessment question.

Please submit the URL from RPubs that points to your full report for this assignment.

NOTE: The URL for the RPubs document should begin with http:// (not https://)

NOTE: If you are having trouble connecting with RPubs due to proxy-related or other issues, you can upload your final analysis document file as a PDF to Coursera instead.

Evaluation/feedback on the above work

Note: this section can only be filled out during the evaluation phase. Has either a (1) valid RPubs URL pointing to a data analysis document for this assignment been submitted; or (2) a complete PDF file presenting the data analysis been uploaded?

Overall evaluation/feedback

Note: this section can only be filled out during the evaluation phase. Is the document written in English?

Does the document have a title that briefly summarizes the data analysis?

Does the document have a synopsis that describes and summarizes the data analysis in less than 10 sentences?

Is there a section titled “Data Processing” that describes how the data were loaded into R and processed for analysis?

Is there a section titled “Results” where the main results are presented?

Is there at least one figure in the document that contains a plot?

Are there at most 3 figures in this document?

Does the analysis start from the raw data file (i.e. the original .csv.bz2 file)?

Does the analysis address the question of which types of events are most harmful to population health?

Does the analysis address the question of which types of events have the greatest economic consequences?

Do all the results of the analysis (i.e. figures, tables, numerical summaries) appear to be reproducible?

Do the figure(s) have descriptive captions (i.e. there is a description near the figure of what is happening in the figure)?

Does the analysis include description and justification for any data transformations?

As far as you can determine, does it appear that the work submitted for this project is the work of the student who submitted it?

Use this space to provide constructive feedback to the student who submitted the work. Point out both strengths and weaknesses in the submission and provide advice about how the work could be improved in the future.

Analysis of Severe Weather Events
Using NOAA Storm Database

John M. Hay

2016-02-01

Synopsis

Data Processing

Get the Data

Files

Explore/Profile the Data

Data Dictionary

Histogram of Data Density

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

Appendix A – Assignment and Evaluation Rubric

Introduction

Data

Assignment

Questions

Requirements

Publishing Your Analysis

Submitting Your Assignment

Overall evaluation/feedback

Analysis of Severe Weather Events Using NOAA Storm Database

John M. Hay

2016-02-01

Synopsis

Data Processing

Get the Data

Files

Explore/Profile the Data

Data Dictionary

Histogram of Data Density

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

Appendix A – Assignment and Evaluation Rubric

Introduction

Data

Assignment

Questions

Requirements

Publishing Your Analysis

Submitting Your Assignment

Overall evaluation/feedback

Analysis of Severe Weather Events
Using NOAA Storm Database