Synopsis
U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is used to study impact of weather events on health and economy of the country.
Note: Few of the output are deleted and commented. The reason is document can be easily readerable
Import Library
Download data in zip format and unzip
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Read Data in CSV
1. Load the data
2. Process/transform the data (if necessary) into a format suitable for your analysis
Variables we are interested
Health variables:
- FATALITIES
- INJURIES
Economic variables:
- PROPDMG
- PROPDMGEXP
- CROPDMG
- CROPDMGEXP
Events - target variable:
- EVTYPE
Checking for NA values
Divide the hold data set into the groups
- Hail
- Heat
- Flood
- Wind
- Storm
- Snow
- Tornado
- Winter
- Rain
data$EVENT <- "OTHER"
data$EVENT[grep("HAIL", data$EVTYPE, ignore.case = TRUE)] <- "HAIL"
data$EVENT[grep("HEAT", data$EVTYPE, ignore.case = TRUE)] <- "HEAT"
data$EVENT[grep("FLOOD", data$EVTYPE, ignore.case = TRUE)] <- "FLOOD"
data$EVENT[grep("WIND", data$EVTYPE, ignore.case = TRUE)] <- "WIND"
data$EVENT[grep("STORM", data$EVTYPE, ignore.case = TRUE)] <- "STORM"
data$EVENT[grep("SNOW", data$EVTYPE, ignore.case = TRUE)] <- "SNOW"
data$EVENT[grep("TORNADO", data$EVTYPE, ignore.case = TRUE)] <- "TORNADO"
data$EVENT[grep("WINTER", data$EVTYPE, ignore.case = TRUE)] <- "WINTER"
data$EVENT[grep("RAIN", data$EVTYPE, ignore.case = TRUE)] <- "RAIN"
data$PROPDMGEXP <- as.character(data$PROPDMGEXP)If any NA replace it with 0
Remove all the values except Thousand, Million, Billion dollars
Convert character to numeric for analysis
data$PROPDMGEXP[grep("K", data$PROPDMGEXP, ignore.case = TRUE)] <- "3"
data$PROPDMGEXP[grep("M", data$PROPDMGEXP, ignore.case = TRUE)] <- "6"
data$PROPDMGEXP[grep("B", data$PROPDMGEXP, ignore.case = TRUE)] <- "9"
data$PROPDMGEXP <- as.numeric(as.character(data$PROPDMGEXP))
data$property.damage <- data$PROPDMG * 10^data$PROPDMGEXPSame preprocessing of data (CROPDMGEXP)
data$CROPDMGEXP <- as.character(data$CROPDMGEXP)
data$CROPDMGEXP[is.na(data$CROPDMGEXP)] <- 0
data$CROPDMGEXP[!grepl("K|M|B", data$CROPDMGEXP, ignore.case = TRUE)] <- 0
data$CROPDMGEXP[grep("K", data$CROPDMGEXP, ignore.case = TRUE)] <- "3"
data$CROPDMGEXP[grep("M", data$CROPDMGEXP, ignore.case = TRUE)] <- "6"
data$CROPDMGEXP[grep("B", data$CROPDMGEXP, ignore.case = TRUE)] <- "9"
data$CROPDMGEXP <- as.numeric(as.character(data$CROPDMGEXP))
data$crop.damage <- data$CROPDMG * 10^data$CROPDMGEXP1. Across the United States, which types of events are most harmful with respect to population health?
Plot top 10 reason for Death and Injury
Death <- aggregate(data$FATALITIES, by = list(data$EVENT), FUN = sum)
colnames(Death) <- c("EVENT", "Death")
Death <- Death[order(Death$Death, decreasing = TRUE),][1:10,]INJURIES <- aggregate(data$INJURIES, by = list(data$EVENT), FUN = sum)
colnames(INJURIES) <- c("EVENT", "INJURIES")
INJURIES <- INJURIES[order(INJURIES$INJURIES, decreasing = TRUE),][1:10,]2. Across the United States, which types of events have the greatest economic consequences?
Two types of damage property and crop which would be plotted
Property <- aggregate(data$property.damage, by = list(data$EVENT), FUN = sum)
colnames(Property) <- c("EVENT", "Property")
Property <- Property[order(Property$Property, decreasing = TRUE),][1:10,]C <- aggregate(data$crop.damage, by = list(data$EVTYPE), FUN = sum)
colnames(C) <- c("EVENT", "Crop")
C <- C[order(C$Crop, decreasing = TRUE),][1:10,]Result
- First & Second plot shows that Death and Injury is most due to Tornado.
- Third plot illustrates majority demage of property is due to Flood.
- Fourth plot shows that Drought is event due to which Crop are demaged.