Synopsis

On this project we will read of storm related consequences data and try to answer the two questions:

Across the United States, which types of events (as indicated in the EVTYPE EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic

consequences? We will look at data and figure out which of the columns will be important to perform this analysis. Then we will group by data set by selected columns and sum the numbers that will answer the stated questions.

Data

We will analyze the goverment data of the storm. * Dataset url * National Weather Service url * National Climatic Data Center Storm Events FAQ

Load Data

We can read the data without unpucking them just by using read.csv function.

dane <- read.csv("repdata_data_StormData.csv.bz2")

We can take a quick look of the table, if it is well loaded.

head(dane)

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

And check the column names.

names(dane)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

As we see the coulmn names we will need are EVTYPE and FATALITIES and INJURIES. Also we will need DMG and FARM to answer the second question.

Data Processing

Fatalities and injuries

As the first we will need fatalities and injuries for every type of disaster that can occur. For this we ill select some part of the data frame and save this into d2 variable.

library(tidyr)
library(dplyr)

## 
## Dołączanie pakietu: 'dplyr'

## Następujące obiekty zostały zakryte z 'package:stats':
## 
##     filter, lag

## Następujące obiekty zostały zakryte z 'package:base':
## 
##     intersect, setdiff, setequal, union

d2 <- dane %>% group_by(EVTYPE)  %>%
  summarize( fat = sum(FATALITIES,na.rm=T) , inj = sum(INJURIES ,na.rm=T) ) %>%
  ungroup() %>%
  arrange(desc(fat))
head(d2)

## # A tibble: 6 × 3
##   EVTYPE           fat   inj
##   <chr>          <dbl> <dbl>
## 1 TORNADO         5633 91346
## 2 EXCESSIVE HEAT  1903  6525
## 3 FLASH FLOOD      978  1777
## 4 HEAT             937  2100
## 5 LIGHTNING        816  5230
## 6 TSTM WIND        504  6957

We will keep the fatalities in the specific variable.

dane_fatal <- d2 %>% select(type=EVTYPE,fatalities=fat) %>% arrange(desc(fatalities))
head(dane_fatal)

## # A tibble: 6 × 2
##   type           fatalities
##   <chr>               <dbl>
## 1 TORNADO              5633
## 2 EXCESSIVE HEAT       1903
## 3 FLASH FLOOD           978
## 4 HEAT                  937
## 5 LIGHTNING             816
## 6 TSTM WIND             504

The same thing for the injuries.

dane_injuries <- d2 %>% select(type=EVTYPE,injuries=inj) %>% arrange(desc(injuries))
head(dane_injuries)

## # A tibble: 6 × 2
##   type           injuries
##   <chr>             <dbl>
## 1 TORNADO           91346
## 2 TSTM WIND          6957
## 3 FLOOD              6789
## 4 EXCESSIVE HEAT     6525
## 5 LIGHTNING          5230
## 6 HEAT               2100

Property and farmd damage

As we have seen in data overview we can use PROPDMG and CROPDMG to answer the second question. For this we will create a subset of main data set.

d3 <- dane %>% group_by(EVTYPE)  %>%
  summarize( dmg = sum(PROPDMG,na.rm=T) , farms = sum(CROPDMG  ,na.rm=T) ) %>%
  ungroup() %>%
  arrange(desc(dmg))
head(d3)

## # A tibble: 6 × 3
##   EVTYPE                 dmg   farms
##   <chr>                <dbl>   <dbl>
## 1 TORNADO           3212258. 100019.
## 2 FLASH FLOOD       1420125. 179200.
## 3 TSTM WIND         1335966. 109203.
## 4 FLOOD              899938. 168038.
## 5 THUNDERSTORM WIND  876844.  66791.
## 6 HAIL               688693. 579596.

We will also crate a data frame that adress the property dmg.

dane_dmg <- d3 %>% select(type=EVTYPE,property_dmg=dmg   ) %>% arrange(desc(property_dmg))
head(dane_dmg)

## # A tibble: 6 × 2
##   type              property_dmg
##   <chr>                    <dbl>
## 1 TORNADO               3212258.
## 2 FLASH FLOOD           1420125.
## 3 TSTM WIND             1335966.
## 4 FLOOD                  899938.
## 5 THUNDERSTORM WIND      876844.
## 6 HAIL                   688693.

And the crop dmg.

dane_farms <- d3 %>% select(type=EVTYPE,crop_dmg=farms   ) %>% arrange(desc(crop_dmg))
head(dane_farms)

## # A tibble: 6 × 2
##   type              crop_dmg
##   <chr>                <dbl>
## 1 HAIL               579596.
## 2 FLASH FLOOD        179200.
## 3 FLOOD              168038.
## 4 TSTM WIND          109203.
## 5 TORNADO            100019.
## 6 THUNDERSTORM WIND   66791.

Results

In order to reproduce plots you need ggplot2 package.

library(ggplot2)

Answer to question 1

Two elements indicated as harmful for population health are:

fatalities
injuries

The sumamrization of the top fatalities are:

head(dane_fatal)

## # A tibble: 6 × 2
##   type           fatalities
##   <chr>               <dbl>
## 1 TORNADO              5633
## 2 EXCESSIVE HEAT       1903
## 3 FLASH FLOOD           978
## 4 HEAT                  937
## 5 LIGHTNING             816
## 6 TSTM WIND             504

The sumamrization of the top injuries are:

head(dane_injuries)

## # A tibble: 6 × 2
##   type           injuries
##   <chr>             <dbl>
## 1 TORNADO           91346
## 2 TSTM WIND          6957
## 3 FLOOD              6789
## 4 EXCESSIVE HEAT     6525
## 5 LIGHTNING          5230
## 6 HEAT               2100

We can see both on them on this brilliant plot

ggplot( d2[1:20,] , aes(x=EVTYPE ) )  +
  geom_bar(aes(y = inj, fill = "red"), stat = "identity") +
  geom_bar(aes(y = fat, fill = "blue"), stat = "identity") +
  #scale_x_discrete(breaks = reorder( -( fat + inj)))
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_manual(values = c("red", "blue"), labels = c("Fatalities", "Injuries"))+
  labs(title = "Fatality Count by Disaster Type", x = "Disaster Type", y = "Number of Fatalities",fill = c("")) #+

As we see the main disaster to contribute to this are:

tornado
TSTM wind

Answer to question 2

The are two elements in this data frame that are related with economic consequences

property damage
crop damage

The list top proporty damages are

head(dane_dmg)

## # A tibble: 6 × 2
##   type              property_dmg
##   <chr>                    <dbl>
## 1 TORNADO               3212258.
## 2 FLASH FLOOD           1420125.
## 3 TSTM WIND             1335966.
## 4 FLOOD                  899938.
## 5 THUNDERSTORM WIND      876844.
## 6 HAIL                   688693.

The list top crop damages are

head(dane_farms)

## # A tibble: 6 × 2
##   type              crop_dmg
##   <chr>                <dbl>
## 1 HAIL               579596.
## 2 FLASH FLOOD        179200.
## 3 FLOOD              168038.
## 4 TSTM WIND          109203.
## 5 TORNADO            100019.
## 6 THUNDERSTORM WIND   66791.

We can see the combination of these on the bar plot

ggplot( d3[1:20,] , aes(x=EVTYPE ) )  +
  geom_bar(aes(y = dmg, fill = "red"), stat = "identity") +
  geom_bar(aes(y = farms, fill = "blue"), stat = "identity") +
  theme(axis.text.x = element_text(angle = 90))  +
  scale_fill_manual(values = c("red", "blue"), labels = c("Farms", "Property")) +
  labs(title = "Damage", x = "Disaster Type", y = "Number of dmg made",fill = c("")) #+

As we see in plot the main disaster to contribute to economic consequences are:

tornado (3212258+100019)
TSTM wind (109203+1335966)
flashflood (1420125+179200)

Storm Data Analysis - CoursEra Project

Tom

2024-05-05