The basic goal of this assignment is to explore the NOAA Storm Database.
NCDC receives Storm Data from the National Weather Service.
The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.
The data is extracted from the bz2 file and the size of the data is 46.9MB.
the data comprise of 37 different variables and 902297 observations
The purpose of the analysis is to find out answers following issues within United State
1- that all across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? and also , 2- Across the United States, which types of events have the greatest economic consequences?
Finally i would like to mention that this report is purely an analysis and does not have any recommendation purpose
This document is generated using R markdown file and is in RPuB for your reference.
fileURL<-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,destfile = "./stormData.csv.bz2",method = "libcurl")
NOAA <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)
dim(NOAA)
## [1] 902297 37
str(NOAA)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
we try to make the data useable for our analysis, the data has 37 row and we need few of them right now we need the following 3 colums for our anlaysis
variables<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
NOAA_DATA<-NOAA[variables]
dim(NOAA_DATA)
## [1] 902297 7
names(NOAA_DATA)
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
## [6] "CROPDMG" "CROPDMGEXP"
head(NOAA_DATA)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
the analysis include aggregate the data based on EVTYPE and their sorting , we also selected the top 15 rows for the charts
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
totfatalities<-aggregate(FATALITIES~EVTYPE,data = NOAA_DATA,sum)
totinjuries<-aggregate(INJURIES~EVTYPE,data = NOAA_DATA,sum)
totfatalities<-totfatalities[order(totfatalities$FATALITIES,decreasing = TRUE),]
fatalities<-totfatalities[1:15,]
head(fatalities,15)
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
## 972 WINTER STORM 206
## 586 RIP CURRENTS 204
## 278 HEAT WAVE 172
## 140 EXTREME COLD 160
## 760 THUNDERSTORM WIND 133
totinjuries<-totinjuries[order(totinjuries$INJURIES,decreasing = TRUE),][1:15,]
head(totinjuries,15)
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
## 972 WINTER STORM 1321
## 411 HURRICANE/TYPHOON 1275
## 359 HIGH WIND 1137
## 310 HEAVY SNOW 1021
## 957 WILDFIRE 911
dim(totinjuries)
## [1] 15 2
NOW PLOTTING THE RESULTS ; we will be using ggplot and choose different colours for injuries and fatalities graphs
library(ggplot2)
ggplot(fatalities,aes(x=EVTYPE,y=FATALITIES,theme_set(theme_bw())))+geom_bar(stat="identity",fill="blue")+theme(axis.text.x = element_text(angle = 90,hjust = 1,size = 6))+xlab("Events")+ylab("Total Fatalities")+ggtitle("Top 15 Events That Caused Fatalities ")
ggplot(totinjuries,aes(x=EVTYPE,y=INJURIES,theme_set(theme_bw())))+geom_bar(stat="identity",fill="GREEN")+theme(axis.text.x = element_text(angle = 90,hjust = 1,size = 6))+xlab("Events")+ylab("Total Injuries")+ggtitle("Top 15 Events That Caused Injuries ")
2- Across the United States, which types of events have the greatest economic consequences?
We will now convert the PROPDMGEXP & CROPDMGEXP fields to tangible numbers where H (hundreds = 10^2), K (thousands = 10^3), M (millions = 10^6), and B (billions = 10^9)
convert the property damage expenses:
and also the crop damage expenses
finding teh total cost of damage for crops and prooperties
library(dplyr)
NOAA_DATA$PROPDMGCOST = 0
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="H",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="H",]$PROPDMG*10^2
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="K",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="K",]$PROPDMG*10^3
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="M",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="M",]$PROPDMG*10^6
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="B",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="B",]$PROPDMG*10^9
NOAA_DATA$CROPDMGCOST = 0
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="H",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="H",]$CROPDMG*10^2
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="K",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="K",]$CROPDMG*10^3
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="M",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="M",]$CROPDMG*10^6
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="B",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="B",]$CROPDMG*10^9
head(NOAA_DATA)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
## PROPDMGCOST CROPDMGCOST
## 1 25000 0
## 2 2500 0
## 3 25000 0
## 4 2500 0
## 5 2500 0
## 6 2500 0
now plotting the damage results
totaldamage<-aggregate(PROPDMGCOST + CROPDMGCOST ~ EVTYPE ,data = NOAA_DATA,sum)
names(totaldamage)<-c("Event","TotDamageCost")
totaldamage<-totaldamage[order(- totaldamage$TotDamageCost),][1:15,]
totaldamage$Event<-factor(totaldamage$Event,levels = totaldamage$Event)
head(totaldamage)
## Event TotDamageCost
## 170 FLOOD 150319678250
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57340613590
## 670 STORM SURGE 43323541000
## 244 HAIL 18752904670
## 153 FLASH FLOOD 17562128610
ggplot(totaldamage,aes(x=Event,y=TotDamageCost,theme_set(theme_bw())))+geom_bar(stat = "identity",fill="pink")+theme(axis.text.x = element_text(angle = 90,hjust = 1))+xlab("Events")+ylab("Total Damage Cost in Dollars")+ggtitle("Total Damage cost of Properties and Crops top 15")
Based on our Research - we have analysed and come to following results :
Tornados are the most harmful =weather events to population health within USA as compare to others if we analyse the highest number of fatalities and injuries and also the most expensive severe weather events that have the greatest economic consequences as per our analsis is Flood.