Synopsis:

The basic goal of this assignment is to explore the NOAA Storm Database.

NCDC receives Storm Data from the National Weather Service.

The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.

The data is extracted from the bz2 file and the size of the data is 46.9MB.

the data comprise of 37 different variables and 902297 observations

The purpose of the analysis is to find out answers following issues within United State

1- that all across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? and also , 2- Across the United States, which types of events have the greatest economic consequences?

Finally i would like to mention that this report is purely an analysis and does not have any recommendation purpose

This document is generated using R markdown file and is in RPuB for your reference.

fileURL<-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,destfile = "./stormData.csv.bz2",method = "libcurl")

NOAA <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)
dim(NOAA)
## [1] 902297     37
str(NOAA)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

DATA PROCESSING

we try to make the data useable for our analysis, the data has 37 row and we need few of them right now we need the following 3 colums for our anlaysis

variables<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
NOAA_DATA<-NOAA[variables]
dim(NOAA_DATA)
## [1] 902297      7
names(NOAA_DATA)
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"
head(NOAA_DATA)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Part 1

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

the analysis include aggregate the data based on EVTYPE and their sorting , we also selected the top 15 rows for the charts

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
totfatalities<-aggregate(FATALITIES~EVTYPE,data = NOAA_DATA,sum)
totinjuries<-aggregate(INJURIES~EVTYPE,data = NOAA_DATA,sum)
totfatalities<-totfatalities[order(totfatalities$FATALITIES,decreasing = TRUE),]
fatalities<-totfatalities[1:15,]
head(fatalities,15)
##                EVTYPE FATALITIES
## 834           TORNADO       5633
## 130    EXCESSIVE HEAT       1903
## 153       FLASH FLOOD        978
## 275              HEAT        937
## 464         LIGHTNING        816
## 856         TSTM WIND        504
## 170             FLOOD        470
## 585       RIP CURRENT        368
## 359         HIGH WIND        248
## 19          AVALANCHE        224
## 972      WINTER STORM        206
## 586      RIP CURRENTS        204
## 278         HEAT WAVE        172
## 140      EXTREME COLD        160
## 760 THUNDERSTORM WIND        133
totinjuries<-totinjuries[order(totinjuries$INJURIES,decreasing = TRUE),][1:15,]
head(totinjuries,15)
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
## 972      WINTER STORM     1321
## 411 HURRICANE/TYPHOON     1275
## 359         HIGH WIND     1137
## 310        HEAVY SNOW     1021
## 957          WILDFIRE      911
dim(totinjuries)
## [1] 15  2

NOW PLOTTING THE RESULTS ; we will be using ggplot and choose different colours for injuries and fatalities graphs

library(ggplot2)
ggplot(fatalities,aes(x=EVTYPE,y=FATALITIES,theme_set(theme_bw())))+geom_bar(stat="identity",fill="blue")+theme(axis.text.x = element_text(angle = 90,hjust = 1,size = 6))+xlab("Events")+ylab("Total Fatalities")+ggtitle("Top 15 Events That Caused Fatalities ")

ggplot(totinjuries,aes(x=EVTYPE,y=INJURIES,theme_set(theme_bw())))+geom_bar(stat="identity",fill="GREEN")+theme(axis.text.x = element_text(angle = 90,hjust = 1,size = 6))+xlab("Events")+ylab("Total Injuries")+ggtitle("Top 15 Events That Caused Injuries ")

part 2

2- Across the United States, which types of events have the greatest economic consequences?

We will now convert the PROPDMGEXP & CROPDMGEXP fields to tangible numbers where H (hundreds = 10^2), K (thousands = 10^3), M (millions = 10^6), and B (billions = 10^9)

convert the property damage expenses:

and also the crop damage expenses

finding teh total cost of damage for crops and prooperties

library(dplyr)
NOAA_DATA$PROPDMGCOST = 0
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="H",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="H",]$PROPDMG*10^2
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="K",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="K",]$PROPDMG*10^3
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="M",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="M",]$PROPDMG*10^6
NOAA_DATA[NOAA_DATA$PROPDMGEXP=="B",]$PROPDMGCOST= NOAA_DATA[NOAA_DATA$PROPDMGEXP=="B",]$PROPDMG*10^9


NOAA_DATA$CROPDMGCOST = 0
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="H",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="H",]$CROPDMG*10^2
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="K",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="K",]$CROPDMG*10^3
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="M",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="M",]$CROPDMG*10^6
NOAA_DATA[NOAA_DATA$CROPDMGEXP=="B",]$CROPDMGCOST= NOAA_DATA[NOAA_DATA$CROPDMGEXP=="B",]$CROPDMG*10^9


head(NOAA_DATA)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0           
##   PROPDMGCOST CROPDMGCOST
## 1       25000           0
## 2        2500           0
## 3       25000           0
## 4        2500           0
## 5        2500           0
## 6        2500           0

now plotting the damage results

totaldamage<-aggregate(PROPDMGCOST + CROPDMGCOST ~ EVTYPE ,data = NOAA_DATA,sum)
names(totaldamage)<-c("Event","TotDamageCost")
totaldamage<-totaldamage[order(- totaldamage$TotDamageCost),][1:15,]
totaldamage$Event<-factor(totaldamage$Event,levels = totaldamage$Event)
head(totaldamage)
##                 Event TotDamageCost
## 170             FLOOD  150319678250
## 411 HURRICANE/TYPHOON   71913712800
## 834           TORNADO   57340613590
## 670       STORM SURGE   43323541000
## 244              HAIL   18752904670
## 153       FLASH FLOOD   17562128610
ggplot(totaldamage,aes(x=Event,y=TotDamageCost,theme_set(theme_bw())))+geom_bar(stat = "identity",fill="pink")+theme(axis.text.x = element_text(angle = 90,hjust = 1))+xlab("Events")+ylab("Total Damage Cost in Dollars")+ggtitle("Total Damage cost of Properties and Crops top 15")

RESULTS

Based on our Research - we have analysed and come to following results :

Tornados are the most harmful =weather events to population health within USA as compare to others if we analyse the highest number of fatalities and injuries and also the most expensive severe weather events that have the greatest economic consequences as per our analsis is Flood.