This project discuss in detail the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.The data for this project & related other document available here:
NOAA Storm Data Base STORM DATA 47Mb
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
This STORM DATA recorded characteristics of major storms and weather events from 1950 to Novmber 2011 in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The main objective of this project is to analyze the data and answer the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
After loading and processing the data, the estimated Answers as follows:
Across the U.S,TRONADO is the most harmful event type in terms of human fatalites and injuries;
Across the U.S,FLOODS is the greastes the greatest economic consequences event type.
Set or Creat new the working Directory
Download the STORM DATA in working directory
Read the STORM DATA
Analyze data
dir.create('./STORM_DATA')
URL<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, destfile="STORM_DATA/repdata-data-StormData.csv.bz2")
storm_data <- read.csv("repdata_data_StormData.csv.bz2", header =TRUE, sep=",",stringsAsFactors = FALSE)
dim(storm_data)
## [1] 902297 37
summary(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI
## Min. : 0.000 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character
## Median : 0.000 Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_DATE END_TIME COUNTY_END COUNTYENDN
## Length:902297 Length:902297 Min. :0 Mode:logical
## Class :character Class :character 1st Qu.:0 NA's:902297
## Mode :character Mode :character Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
##
## END_RANGE END_AZI END_LOCATI
## Min. : 0.0000 Length:902297 Length:902297
## 1st Qu.: 0.0000 Class :character Class :character
## Median : 0.0000 Mode :character Mode :character
## Mean : 0.9862
## 3rd Qu.: 0.0000
## Max. :925.0000
##
## LENGTH WIDTH F MAG
## Min. : 0.0000 Min. : 0.000 Min. :0.0 Min. : 0.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.:0.0 1st Qu.: 0.0
## Median : 0.0000 Median : 0.000 Median :1.0 Median : 50.0
## Mean : 0.2301 Mean : 7.503 Mean :0.9 Mean : 46.9
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.:1.0 3rd Qu.: 75.0
## Max. :2315.0000 Max. :4400.000 Max. :5.0 Max. :22000.0
## NA's :843563
## FATALITIES INJURIES PROPDMG
## Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
##
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:902297 Min. : 0.000 Length:902297
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
There are 902297 obserbations and 37 variables analysied from data summary. To know the name of the 37 variables from the data to:-
names(storm_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
storm_data has number of variables,amoung them “EVTYPE”,“FATALITIES”,“INJURIES”,“PROPDMG”,“PROPDMGEXP” “CROPDMG” “CROPDMGEXP” are the 8 variable required to analysize the which event cause most harmful effect to population health & greatest economic consequences
-To know which event is harmful to human health “EVTYPE”.“FATALITIES”,“INJURIES” are the three variable sorted out from the data(“storm_data”)
event <- c("EVTYPE", "FATALITIES", "INJURIES")
mydata<-storm_data[event]
Aggregate the data by event
fatal <- aggregate(FATALITIES ~ EVTYPE, data = mydata, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, data = mydata, FUN = sum)
healthData <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data=mydata, FUN=sum)
View(healthData)
Listing the EVTYPE with top 10 hightest FATALITIES & INJURIES
highest_fatal <- fatal[order(-fatal$FATALITIES), ][1:10, ]
highest_injury <- injury[order(-injury$INJURIES), ][1:10, ]
By observaing both list remove the last 3 variable which are not comman
rm_uncomfatal<- highest_fatal[ ! ( ( highest_fatal$EVTYPE =="RIP CURRENT" & highest_fatal$FATALITIES==368) | ( highest_fatal$EVTYPE =="HIGH WIND" & highest_fatal$FATALITIES==248 )|(highest_fatal$EVTYPE=="AVALANCHE" & highest_fatal$FATALITIES==224) ) , ]
rm_uncominjury<- highest_injury[ ! ( ( highest_injury$EVTYPE =="ICE STORM" & highest_injury$INJURIES==1975) | ( highest_injury$EVTYPE =="THUNDERSTORM WIND" & highest_injury$INJURIES==1488 )|(highest_injury$EVTYPE=="HAIL" & highest_injury$INJURIES==1361) ) , ]
library(plyr)
table1<-join_all(list(rm_uncomfatal,rm_uncominjury),by="EVTYPE")
View(table1)
library(reshape2)
melt_table1<- melt(table1, id.vars="EVTYPE")
library(ggplot2)
Create chart
chart<-ggplot(melt_table1, aes(x=reorder(EVTYPE, -value), y=value))
Plot data as bar chart
chart<- chart + geom_bar(stat="identity", aes(fill=variable), position="dodge",
color="black")+ scale_fill_manual(values=c("sky blue", "purple"))
Format y-axis scale and set y-axis label
chart<- chart + scale_y_sqrt("Frequency Count")
Set x-axis label
chart <-chart + xlab("Event Type")
Rotate x-axis tick labels
chart <- chart + theme(axis.text.x = element_text(angle=25, hjust=1))
Set chart title
chart<-chart + ggtitle("US Storm Health Impacts")
Display the chart
chart
-To know which event made greadtest adverse economic effect across United States, the “EVTYPE”,“PROPDMG”,“PROPDMGEXP”,“CROPDMG”,“CROPDMGEXP” are the five variable have to sort from the data(“storm_data”).
event_eco <-c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
data_eco <- storm_data[event_eco ]
A brief summary of the data_eco
summary(data_eco)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG
## Length:902297 Min. : 0.00 Length:902297 Min. : 0.000
## Class :character 1st Qu.: 0.00 Class :character 1st Qu.: 0.000
## Mode :character Median : 0.00 Mode :character Median : 0.000
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
## CROPDMGEXP
## Length:902297
## Class :character
## Mode :character
##
##
##
unique(data_eco$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(data_eco$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
From the summary the value of variable “CROPDMGEXP” & “PRODMGEXP” are uncover and found that in same format (some are in small letter and some are in capital letter).To make them uniform (same) First, make everything upper case
data_eco$PROPDMGEXP <- toupper(data_eco$PROPDMGEXP)
data_eco$CROPDMGEXP <- toupper(data_eco$CROPDMGEXP)
unique(c(data_eco$PROPDMGEXP, data_eco$CROPDMGEXP))
## [1] "K" "M" "" "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"
Replace or match symboles(like “”,“+”,“-”,“?”)
r data_eco[data_eco$PROPDMGEXP %in% c("", "+", "-", "?"), "PROPDMGEXP"] <- "0" data_eco[data_eco$CROPDMGEXP %in% c("", "+", "-", "?"), "CROPDMGEXP"] <- "0" unique(c(data_eco$PROPDMGEXP, data_eco$CROPDMGEXP))
## [1] "K" "M" "0" "B" "5" "6" "4" "2" "3" "H" "7" "1" "8"
Substituting exponant for “B”=Billion, “H”=Hundred, “k”=Kilo, and “M”=Million
data_eco[data_eco$PROPDMGEXP == "B", "PROPDMGEXP"] <- 9
data_eco[data_eco$CROPDMGEXP == "B", "CROPDMGEXP"] <- 9
data_eco[data_eco$PROPDMGEXP == "M", "PROPDMGEXP"] <- 6
data_eco[data_eco$CROPDMGEXP == "M", "CROPDMGEXP"] <- 6
data_eco[data_eco$PROPDMGEXP == "K", "PROPDMGEXP"] <- 3
data_eco[data_eco$CROPDMGEXP == "K", "CROPDMGEXP"] <- 3
data_eco[data_eco$PROPDMGEXP == "H", "PROPDMGEXP"] <- 2
data_eco[data_eco$CROPDMGEXP == "H", "CROPDMGEXP"] <- 2
unique(c(data_eco$PROPDMGEXP, data_eco$CROPDMGEXP))
## [1] "3" "6" "0" "9" "5" "4" "2" "7" "1" "8"
Now combine exponant with value
data_eco$PROPDMGEXP <- 10^(as.numeric(data_eco$PROPDMGEXP))
data_eco$CROPDMGEXP <- 10^(as.numeric(data_eco$CROPDMGEXP))
missing value replace by“0”
data_eco[is.na(data_eco$PROPDMG), "PROPDMG"] <- 0
data_eco[is.na(data_eco$CROPDMG), "CROPDMG"] <- 0
To calculate total damage
data_eco<- within(data_eco,total_damage <- PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)
Aggregate damage by Event-type
damage_EVTYPE <- aggregate(data_eco$total_damage, by = list(EVTYPE = data_eco$EVTYPE),FUN = sum)
damage_EVTYPE <- damage_EVTYPE[order(damage_EVTYPE$x, decreasing = TRUE), ]
head(damage_EVTYPE, 10)
## EVTYPE x
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333947
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221986
## 153 FLASH FLOOD 18243991079
## 95 DROUGHT 15018672000
## 402 HURRICANE 14610229010
## 590 RIVER FLOOD 10148404500
## 427 ICE STORM 8967041360
PLot:-2
chart2<- ggplot(damage_EVTYPE[1:10,], aes(reorder(EVTYPE,-x), y = x/1000000))
Plot data as bar chart
chart2<-chart2+ geom_bar(stat = "identity",aes(fill=EVTYPE))
Nomenating y-axis scale and set y-axis label
chart2<-chart2 + xlab("Event Type") +ylab("Total Damage (million of USD)")
Tilting the X-axis lable
chart2<-chart2+ theme(axis.text.x = element_text(angle = 45, size=9, hjust = 1, vjust = 1))
Set Title to chart2
chart2<-chart2+ggtitle("US STORM ECONOMIC IMPACT")
chart2
str(melt_table1)
## 'data.frame': 14 obs. of 3 variables:
## $ EVTYPE : chr "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT" ...
## $ variable: Factor w/ 2 levels "FATALITIES","INJURIES": 1 1 1 1 1 1 1 2 2 2 ...
## $ value : num 5633 1903 978 937 816 ...
dim(melt_table1)
## [1] 14 3
head(melt_table1,5)
## EVTYPE variable value
## 1 TORNADO FATALITIES 5633
## 2 EXCESSIVE HEAT FATALITIES 1903
## 3 FLASH FLOOD FATALITIES 978
## 4 HEAT FATALITIES 937
## 5 LIGHTNING FATALITIES 816
“TORNADO”,“EXCESSIVE HEAT”,“FLASH FLOOD”,“HEAT”,“LIGHTING” are most harmful events with respect to population health
Question 2:
str(damage_EVTYPE)
## 'data.frame': 985 obs. of 2 variables:
## $ EVTYPE: chr "FLOOD" "HURRICANE/TYPHOON" "TORNADO" "STORM SURGE" ...
## $ x : num 1.50e+11 7.19e+10 5.74e+10 4.33e+10 1.88e+10 ...
dim(damage_EVTYPE)
## [1] 985 2
head(damage_EVTYPE,5)
## EVTYPE x
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333947
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221986
“FLOOD”,“HURRICANE/TYPHOON”,“TORNADO”,“STORM SURGE”,“HAIL” are the top types of events which cause the greatest economic consequences across United State.