U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database - Report(1950 to 2011)

SYNOPSIS:-

This project discuss in detail the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.The data for this project & related other document available here:

This STORM DATA recorded characteristics of major storms and weather events from 1950 to Novmber 2011 in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The main objective of this project is to analyze the data and answer the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

After loading and processing the data, the estimated Answers as follows:

  1. Across the U.S,TRONADO is the most harmful event type in terms of human fatalites and injuries;

  2. Across the U.S,FLOODS is the greastes the greatest economic consequences event type.

DATA PROCESSING :-

  • Set or Creat new the working Directory

  • Download the STORM DATA in working directory

  • Read the STORM DATA

  • Analyze data

dir.create('./STORM_DATA')
URL<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, destfile="STORM_DATA/repdata-data-StormData.csv.bz2")
storm_data <- read.csv("repdata_data_StormData.csv.bz2", header =TRUE, sep=",",stringsAsFactors = FALSE)
dim(storm_data)
## [1] 902297     37
summary(storm_data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI       
##  Min.   :   0.000   Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character  
##  Mean   :   1.484                                        
##  3rd Qu.:   1.000                                        
##  Max.   :3749.000                                        
##                                                          
##    END_DATE           END_TIME           COUNTY_END COUNTYENDN    
##  Length:902297      Length:902297      Min.   :0    Mode:logical  
##  Class :character   Class :character   1st Qu.:0    NA's:902297   
##  Mode  :character   Mode  :character   Median :0                  
##                                        Mean   :0                  
##                                        3rd Qu.:0                  
##                                        Max.   :0                  
##                                                                   
##    END_RANGE          END_AZI           END_LOCATI       
##  Min.   :  0.0000   Length:902297      Length:902297     
##  1st Qu.:  0.0000   Class :character   Class :character  
##  Median :  0.0000   Mode  :character   Mode  :character  
##  Mean   :  0.9862                                        
##  3rd Qu.:  0.0000                                        
##  Max.   :925.0000                                        
##                                                          
##      LENGTH              WIDTH                F               MAG         
##  Min.   :   0.0000   Min.   :   0.000   Min.   :0.0      Min.   :    0.0  
##  1st Qu.:   0.0000   1st Qu.:   0.000   1st Qu.:0.0      1st Qu.:    0.0  
##  Median :   0.0000   Median :   0.000   Median :1.0      Median :   50.0  
##  Mean   :   0.2301   Mean   :   7.503   Mean   :0.9      Mean   :   46.9  
##  3rd Qu.:   0.0000   3rd Qu.:   0.000   3rd Qu.:1.0      3rd Qu.:   75.0  
##  Max.   :2315.0000   Max.   :4400.000   Max.   :5.0      Max.   :22000.0  
##                                         NA's   :843563                    
##    FATALITIES          INJURIES            PROPDMG       
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Median :  0.0000   Median :   0.0000   Median :   0.00  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##                                                          
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000                     
##                                                         
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

There are 902297 obserbations and 37 variables analysied from data summary. To know the name of the 37 variables from the data to:-

names(storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

storm_data has number of variables,amoung them “EVTYPE”,“FATALITIES”,“INJURIES”,“PROPDMG”,“PROPDMGEXP” “CROPDMG” “CROPDMGEXP” are the 8 variable required to analysize the which event cause most harmful effect to population health & greatest economic consequences

Question-1:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

-To know which event is harmful to human health “EVTYPE”.“FATALITIES”,“INJURIES” are the three variable sorted out from the data(“storm_data”)

event <- c("EVTYPE", "FATALITIES", "INJURIES")
mydata<-storm_data[event]

Aggregate the data by event

fatal <- aggregate(FATALITIES ~ EVTYPE, data = mydata, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, data = mydata, FUN = sum)
healthData <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data=mydata, FUN=sum)
View(healthData)

Listing the EVTYPE with top 10 hightest FATALITIES & INJURIES

highest_fatal <- fatal[order(-fatal$FATALITIES), ][1:10, ]
highest_injury <- injury[order(-injury$INJURIES), ][1:10, ]

By observaing both list remove the last 3 variable which are not comman

rm_uncomfatal<- highest_fatal[ ! ( ( highest_fatal$EVTYPE =="RIP CURRENT" & highest_fatal$FATALITIES==368) | ( highest_fatal$EVTYPE =="HIGH WIND" & highest_fatal$FATALITIES==248 )|(highest_fatal$EVTYPE=="AVALANCHE" & highest_fatal$FATALITIES==224) ) , ] 

rm_uncominjury<- highest_injury[ ! ( ( highest_injury$EVTYPE =="ICE STORM" & highest_injury$INJURIES==1975) | ( highest_injury$EVTYPE =="THUNDERSTORM WIND" & highest_injury$INJURIES==1488 )|(highest_injury$EVTYPE=="HAIL" & highest_injury$INJURIES==1361) ) , ]

library(plyr)
table1<-join_all(list(rm_uncomfatal,rm_uncominjury),by="EVTYPE")
 View(table1)

library(reshape2)
melt_table1<- melt(table1, id.vars="EVTYPE")

PLOT:-1

library(ggplot2) 

Create chart

chart<-ggplot(melt_table1, aes(x=reorder(EVTYPE, -value), y=value))

Plot data as bar chart

chart<- chart + geom_bar(stat="identity", aes(fill=variable), position="dodge",
                         color="black")+ scale_fill_manual(values=c("sky blue", "purple"))

Format y-axis scale and set y-axis label

chart<- chart + scale_y_sqrt("Frequency Count")

Set x-axis label

chart <-chart + xlab("Event Type")

Rotate x-axis tick labels

chart <- chart + theme(axis.text.x = element_text(angle=25, hjust=1))

Set chart title

chart<-chart + ggtitle("US Storm Health Impacts")

Display the chart

chart

Question-2

Across the United States, which types of events have the greatest economic consequences?

-To know which event made greadtest adverse economic effect across United States, the “EVTYPE”,“PROPDMG”,“PROPDMGEXP”,“CROPDMG”,“CROPDMGEXP” are the five variable have to sort from the data(“storm_data”).

event_eco <-c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
data_eco <- storm_data[event_eco ]

A brief summary of the data_eco

summary(data_eco)
##     EVTYPE             PROPDMG         PROPDMGEXP           CROPDMG       
##  Length:902297      Min.   :   0.00   Length:902297      Min.   :  0.000  
##  Class :character   1st Qu.:   0.00   Class :character   1st Qu.:  0.000  
##  Mode  :character   Median :   0.00   Mode  :character   Median :  0.000  
##                     Mean   :  12.06                      Mean   :  1.527  
##                     3rd Qu.:   0.50                      3rd Qu.:  0.000  
##                     Max.   :5000.00                      Max.   :990.000  
##   CROPDMGEXP       
##  Length:902297     
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
unique(data_eco$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(data_eco$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

From the summary the value of variable “CROPDMGEXP” & “PRODMGEXP” are uncover and found that in same format (some are in small letter and some are in capital letter).To make them uniform (same) First, make everything upper case

data_eco$PROPDMGEXP <- toupper(data_eco$PROPDMGEXP)
data_eco$CROPDMGEXP <- toupper(data_eco$CROPDMGEXP)
unique(c(data_eco$PROPDMGEXP, data_eco$CROPDMGEXP))
##  [1] "K" "M" ""  "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"

Replace or match symboles(like “”,“+”,“-”,“?”)

r data_eco[data_eco$PROPDMGEXP %in% c("", "+", "-", "?"), "PROPDMGEXP"] <- "0" data_eco[data_eco$CROPDMGEXP %in% c("", "+", "-", "?"), "CROPDMGEXP"] <- "0" unique(c(data_eco$PROPDMGEXP, data_eco$CROPDMGEXP))

## [1] "K" "M" "0" "B" "5" "6" "4" "2" "3" "H" "7" "1" "8"

Substituting exponant for “B”=Billion, “H”=Hundred, “k”=Kilo, and “M”=Million

data_eco[data_eco$PROPDMGEXP == "B", "PROPDMGEXP"] <- 9
data_eco[data_eco$CROPDMGEXP == "B", "CROPDMGEXP"] <- 9
data_eco[data_eco$PROPDMGEXP == "M", "PROPDMGEXP"] <- 6
data_eco[data_eco$CROPDMGEXP == "M", "CROPDMGEXP"] <- 6
data_eco[data_eco$PROPDMGEXP == "K", "PROPDMGEXP"] <- 3
data_eco[data_eco$CROPDMGEXP == "K", "CROPDMGEXP"] <- 3
data_eco[data_eco$PROPDMGEXP == "H", "PROPDMGEXP"] <- 2
data_eco[data_eco$CROPDMGEXP == "H", "CROPDMGEXP"] <- 2
unique(c(data_eco$PROPDMGEXP, data_eco$CROPDMGEXP))
##  [1] "3" "6" "0" "9" "5" "4" "2" "7" "1" "8"

Now combine exponant with value

data_eco$PROPDMGEXP <- 10^(as.numeric(data_eco$PROPDMGEXP))
data_eco$CROPDMGEXP <- 10^(as.numeric(data_eco$CROPDMGEXP))

missing value replace by“0”

data_eco[is.na(data_eco$PROPDMG), "PROPDMG"] <- 0
data_eco[is.na(data_eco$CROPDMG), "CROPDMG"] <- 0

To calculate total damage

data_eco<- within(data_eco,total_damage <- PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)

Aggregate damage by Event-type

damage_EVTYPE <- aggregate(data_eco$total_damage, by = list(EVTYPE = data_eco$EVTYPE),FUN = sum)

damage_EVTYPE <- damage_EVTYPE[order(damage_EVTYPE$x, decreasing = TRUE), ]

head(damage_EVTYPE, 10)
##                EVTYPE            x
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362333947
## 670       STORM SURGE  43323541000
## 244              HAIL  18761221986
## 153       FLASH FLOOD  18243991079
## 95            DROUGHT  15018672000
## 402         HURRICANE  14610229010
## 590       RIVER FLOOD  10148404500
## 427         ICE STORM   8967041360

PLot:-2

chart2<- ggplot(damage_EVTYPE[1:10,], aes(reorder(EVTYPE,-x), y = x/1000000)) 

Plot data as bar chart

chart2<-chart2+ geom_bar(stat = "identity",aes(fill=EVTYPE))

Nomenating y-axis scale and set y-axis label

 chart2<-chart2 + xlab("Event Type") +ylab("Total Damage (million of USD)") 

Tilting the X-axis lable

  chart2<-chart2+ theme(axis.text.x = element_text(angle = 45, size=9, hjust = 1, vjust = 1))

Set Title to chart2

 chart2<-chart2+ggtitle("US STORM ECONOMIC IMPACT")

Display chart2

chart2

RESULT:-

Question 1:
  str(melt_table1)
## 'data.frame':    14 obs. of  3 variables:
##  $ EVTYPE  : chr  "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT" ...
##  $ variable: Factor w/ 2 levels "FATALITIES","INJURIES": 1 1 1 1 1 1 1 2 2 2 ...
##  $ value   : num  5633 1903 978 937 816 ...
  dim(melt_table1)
## [1] 14  3
  head(melt_table1,5)
##           EVTYPE   variable value
## 1        TORNADO FATALITIES  5633
## 2 EXCESSIVE HEAT FATALITIES  1903
## 3    FLASH FLOOD FATALITIES   978
## 4           HEAT FATALITIES   937
## 5      LIGHTNING FATALITIES   816

“TORNADO”,“EXCESSIVE HEAT”,“FLASH FLOOD”,“HEAT”,“LIGHTING” are most harmful events with respect to population health

Question 2:

  str(damage_EVTYPE)
## 'data.frame':    985 obs. of  2 variables:
##  $ EVTYPE: chr  "FLOOD" "HURRICANE/TYPHOON" "TORNADO" "STORM SURGE" ...
##  $ x     : num  1.50e+11 7.19e+10 5.74e+10 4.33e+10 1.88e+10 ...
  dim(damage_EVTYPE)
## [1] 985   2
  head(damage_EVTYPE,5)
##                EVTYPE            x
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362333947
## 670       STORM SURGE  43323541000
## 244              HAIL  18761221986

“FLOOD”,“HURRICANE/TYPHOON”,“TORNADO”,“STORM SURGE”,“HAIL” are the top types of events which cause the greatest economic consequences across United State.