Public Health and Economic Effects of Severe Weather Events in US

Synopsis

The goal of this report is to show the weather effects on Public Health and Economy of US. With that information the government authorities will be able to consider plans to define preventive and corrective actions to reduce the effects of some severe weather events such as tornados, floods and so on. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) has gathered weather data since 1950, so an EDA (Exploratory Data Analysis) should be done to better know and understand the effects caused by weather which includes fatalities, injuries, and property damage.

Data Processing

The data was obtained from Peer-graded Assignment: Course Project 2 of Reproducible Research web site. The file is csv bz2 format so after the download it will be read.

fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "dataset.csv.bz2", mode = "wb")
data_1 <- read.csv("dataset.csv.bz2", stringsAsFactors = F)
data_1 <- as_tibble(data_1)

The original data set includes information since 1950 to 2011 which means that there are more than 60 years of data, but the early years have few information. BGN_DATE is a character variable so it will be transformed to date class and then extract just the year to compare the quantity of the data of the early years with the last 30 years.

new_date <- strptime(data_1$BGN_DATE, "%m/%d/%Y %H:%M:%S")
data_1$YEAR <- year(new_date)
year_table <- table(data_1$YEAR)
y1 <- sum(year_table)
y2 <- sum(head(year_table, 32))
y3 <- sum(tail(year_table, 30))
y3/y1
## [1] 0.9046556

90% of the data is placed in the last 30 years. A further Statistical Analysis should be done in order to infer the sample size. The statistical analysis is out of the scope of this report, so the entire data set will be used.

Public Heath Analysis

The first question is related to the health problems caused by weather. EVTYPE (type of weather event), FATALITIES, and INJURIES variables will be taken into account in this analysis. At the beginning, a data subset considers only FATALITIES. Second, the focus will be on INJURIES and finally the sum of these variables. To do that, the subset will be grouped by EVTYPE, and then sum to the total amount of the variable. To show the data, the subset will be arranged by descending mode, and finally select just two variables according to the each case and present them as a table.

data_2 <- data_1 %>% group_by(EVTYPE) %>% summarize(FATALITIES = sum(FATALITIES)) %>% arrange(desc(FATALITIES)) %>% select(EVTYPE, FATALITIES)
data_3 <- head(data_2, 10)
data_3
## # A tibble: 10 x 2
##    EVTYPE         FATALITIES
##    <chr>               <dbl>
##  1 TORNADO              5633
##  2 EXCESSIVE HEAT       1903
##  3 FLASH FLOOD           978
##  4 HEAT                  937
##  5 LIGHTNING             816
##  6 TSTM WIND             504
##  7 FLOOD                 470
##  8 RIP CURRENT           368
##  9 HIGH WIND             248
## 10 AVALANCHE             224

The Top Ten shows that the three major fatalities were caused by TORNADO, EXCESSIVE HEAT and FLASH FLOOD.

data_4 <- data_1 %>% group_by(EVTYPE) %>% summarize(INJURIES = sum(INJURIES)) %>% arrange(desc(INJURIES)) %>% select(EVTYPE, INJURIES)
data_5 <- head(data_4, 10)
data_5
## # A tibble: 10 x 2
##    EVTYPE            INJURIES
##    <chr>                <dbl>
##  1 TORNADO              91346
##  2 TSTM WIND             6957
##  3 FLOOD                 6789
##  4 EXCESSIVE HEAT        6525
##  5 LIGHTNING             5230
##  6 HEAT                  2100
##  7 ICE STORM             1975
##  8 FLASH FLOOD           1777
##  9 THUNDERSTORM WIND     1488
## 10 HAIL                  1361

The Top Ten shows that the three major injuries were caused by TORNADO, TSTM WIND and FLOOD.

data_6 <- data_1 %>% group_by(EVTYPE) %>% summarize(HEALTH_PROBLEMS = sum(FATALITIES, INJURIES)) %>% arrange(desc(HEALTH_PROBLEMS)) %>% select(EVTYPE, HEALTH_PROBLEMS)
data_7 <- head(data_6, 10)
data_7
## # A tibble: 10 x 2
##    EVTYPE            HEALTH_PROBLEMS
##    <chr>                       <dbl>
##  1 TORNADO                     96979
##  2 EXCESSIVE HEAT               8428
##  3 TSTM WIND                    7461
##  4 FLOOD                        7259
##  5 LIGHTNING                    6046
##  6 HEAT                         3037
##  7 FLASH FLOOD                  2755
##  8 ICE STORM                    2064
##  9 THUNDERSTORM WIND            1621
## 10 WINTER STORM                 1527

The Top Ten (Fatalities + Injuires) shows that the three major problems were caused by TORNADO, EXCESSIVE HEAT, and TSTM WIND.

Econimic Analysis

The second question is related to the economic problems caused by weather. EVTYPE, PROPDMGEXP, PROPDMG, CROPDMGEXP, and CROPDMGEXP variables will be taken into account in this analysis. PROPDMGEXP and CROPDMGEXP have an alphabetical character that must be consider to determine the exact amount of the PROPDMG and CROPDMG variables. In this case, the original subset will be filtered considering just the rows with “”, “K”, “M”, and “B” in PROPDMGEXP and CROPDMGEXP and the others alphabetical characters will be ignored. Next, and as was done in health problems analysis, the subset will be grouped by EVTYPE, and then sum to the total amount of the variable. To show the data, the subset will be arranged by descending mode, and finally select just two variables according to the each case and present them as a table and in Millions.

x1 <- data_1 %>% filter(PROPDMGEXP == "")
x2 <- data_1 %>% filter(PROPDMGEXP == "K") %>% mutate(PROPDMG = PROPDMG*1000)
x3 <- data_1 %>% filter(PROPDMGEXP == "M") %>% mutate(PROPDMG = PROPDMG*1000000)
x4 <- data_1 %>% filter(PROPDMGEXP == "B") %>% mutate(PROPDMG = PROPDMG*1000000000)
data_10 <- rbind(x1, x2, x3, x4)

data_11 <- data_10 %>% group_by(EVTYPE) %>% summarize(PROPDMG = sum(PROPDMG)) %>% mutate(PROPDMG = PROPDMG/1000000) %>% arrange(desc(PROPDMG)) %>% select(EVTYPE, PROPDMG)
data_12 <- head(data_11, 10)
data_12
## # A tibble: 10 x 2
##    EVTYPE            PROPDMG
##    <chr>               <dbl>
##  1 FLOOD             144658.
##  2 HURRICANE/TYPHOON  69306.
##  3 TORNADO            56926.
##  4 STORM SURGE        43324.
##  5 FLASH FLOOD        16141.
##  6 HAIL               15727.
##  7 HURRICANE          11868.
##  8 TROPICAL STORM      7704.
##  9 WINTER STORM        6688.
## 10 HIGH WIND           5270.

The Top Ten shows, in Millions, that the three major property damages were caused by FLOOD, HURRICANE/TYPHOON, and TORNADO.

x10 <- data_1 %>% filter(CROPDMGEXP == "")
x20 <- data_1 %>% filter(CROPDMGEXP == "K") %>% mutate(CROPDMG = CROPDMG*1000)
x30 <- data_1 %>% filter(CROPDMGEXP == "M") %>% mutate(CROPDMG = CROPDMG*1000000)
x40 <- data_1 %>% filter(CROPDMGEXP == "B") %>% mutate(CROPDMG = CROPDMG*1000000000)
data_20 <- rbind(x10, x20, x30, x40)

data_21 <- data_20 %>% group_by(EVTYPE) %>% summarize(CROPDMG = sum(CROPDMG)) %>% mutate(CROPDMG = CROPDMG/1000000) %>% arrange(desc(CROPDMG)) %>% select(EVTYPE, CROPDMG)
data_22 <- head(data_21, 10)
data_22
## # A tibble: 10 x 2
##    EVTYPE            CROPDMG
##    <chr>               <dbl>
##  1 DROUGHT            13973.
##  2 FLOOD               5662.
##  3 RIVER FLOOD         5029.
##  4 ICE STORM           5022.
##  5 HAIL                3026.
##  6 HURRICANE           2742.
##  7 HURRICANE/TYPHOON   2608.
##  8 FLASH FLOOD         1421.
##  9 EXTREME COLD        1293.
## 10 FROST/FREEZE        1094.

The Top Ten shows, in Millions, that the three major crop damages were caused by DROUGHT, FLOOD, and RIVER FLOOD.

data_25 <- rbind(data_10, data_20)
data_30 <- data_25 %>% group_by(EVTYPE) %>% summarize(ECONOMIC_PROBLEMS = sum(PROPDMG, CROPDMG)) %>% mutate(ECONOMIC_PROBLEMS = ECONOMIC_PROBLEMS/1000000) %>% arrange(desc(ECONOMIC_PROBLEMS)) %>% select(EVTYPE, ECONOMIC_PROBLEMS)
data_31 <- head(data_30, 10)
data_31
## # A tibble: 10 x 2
##    EVTYPE            ECONOMIC_PROBLEMS
##    <chr>                         <dbl>
##  1 FLOOD                       150321.
##  2 HURRICANE/TYPHOON            71914.
##  3 TORNADO                      57344.
##  4 STORM SURGE                  43324.
##  5 HAIL                         18754.
##  6 FLASH FLOOD                  17564.
##  7 DROUGHT                      15019.
##  8 HURRICANE                    14610.
##  9 RIVER FLOOD                  10148.
## 10 ICE STORM                     8967.

The Top Ten shows, in Millions, that the three major (property + crop) damages were caused by FLOOD, HURRICANE/TYPHOON, and TORNADO.

Results

As was mentioned before, this report considers the entire data set from 1950 to 2011. The distribution per year is presented in Figure 1.

barplot(table(data_1$YEAR), col = "green", xlab="Year", ylab="Data", main = "Barplot of Data per Year", panel.first = grid())

Figure 1. Barplot of Data per Year.

Data was gathered for more than 60 years, but in the last 20 years there are more data than before. It could be due to that in early years there wasn’t registered enough data.

Public Heath Results

The severe weather events that have caused fatalities and injuries are Tornados, Excessive Heat, and Tstm Wind. If we consider the total health problems (fatalities + injuries) the Tornado is the most dangerous event because it is close to 100000 health issues. The government should focus in Tornado effects.

g3 <- ggplot(data_7, aes(reorder(x = EVTYPE, -HEALTH_PROBLEMS), y = HEALTH_PROBLEMS)) 
g3 + geom_bar(stat = "identity", fill = "blue") + theme(axis.text.x = element_text(angle = 45, hjust =1, size = 8)) + ggtitle("Top Ten Health Problems per Weather Event") + xlab("Weather Event") + ylab("Health Problems")

Figure 2. Top Ten Health Problems per Weather Event.

This figure shows the total fatalities and injuries caused by the most severe weather events. The effect of the tornado is really huge compared to the others.

Econimic Results

On the other hand, the severe weather events that have caused property and crop damages are Flood, Hurricane/Typhoon, and Tornado. Considering the total economic problems (properties + crops) the Flood is the most dangerous event because it has costed more than 150321 Million. The government should also focus in Flood effects.

g6 <- ggplot(data_31, aes(reorder(x = EVTYPE, -ECONOMIC_PROBLEMS), y = ECONOMIC_PROBLEMS))
g6 + geom_bar(stat = "identity", fill = "red") + theme(axis.text.x = element_text(angle = 45, hjust =1, size = 8)) + ggtitle("Top Ten Economic Problems per Weather Event") + xlab("Weather Event") + ylab("Economic Problems")

Figure 3. Top Ten Economic Problems per Weather Event.

This figure shows the total economic problems caused by the most severe weather events. The total amount, in Millions of USD, of the Flood, Hurricane/Typhoon, and Tornado is 279579.