About

This`s part of the exercises assignments in Reproducible Research and Reporting of Statistical Analyses class in Data Science specialization pipeline in Johns Hopkins by Coursera.

Assigment

The objective of this study is to observe those climatic events with the greatest impact on the health and safety of the population, as well as the economic impacts arising.

Synopsis

Climate events may be harmful to population, environment, and economy of societies.
This project aims to explore data from the United States between the years 1950 and 2011 to obtain information about the magnitude of the impact of each type event.

Data Processing

#Extract the data of zip file
dados <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"), sep=",", header=T)
head(dados)
#Subset a data frame with the variables of interest
procesed <- dados[,c(8, 23:28)]
head(procesed)
#Apply a typecast
procesed$EVTYPE <- as.factor(procesed$EVTYPE)

#Prepare strings to match with the verification
procesed$PROPDMGEXP = toupper(procesed$PROPDMGEXP)
procesed$CROPDMGEXP = toupper(procesed$CROPDMGEXP)
#Convert class units to numbers to apply compute
procesed[procesed$PROPDMGEXP == "?", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "?", ]$PROPDMG * 0
procesed[procesed$PROPDMGEXP == "H", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "H", ]$PROPDMG * 100
procesed[procesed$PROPDMGEXP == "K", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "K", ]$PROPDMG * 1000
procesed[procesed$PROPDMGEXP == "M", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "M", ]$PROPDMG * 1000000
procesed[procesed$PROPDMGEXP == "B", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "B", ]$PROPDMG * 1000000000
procesed[procesed$CROPDMGEXP == "?", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "?", ]$CROPDMG * 0
procesed[procesed$CROPDMGEXP == "H", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "H", ]$CROPDMG * 100
procesed[procesed$CROPDMGEXP == "K", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "K", ]$CROPDMG * 1000
procesed[procesed$CROPDMGEXP == "M", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "M", ]$CROPDMG * 1000000
procesed[procesed$CROPDMGEXP == "B", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "B", ]$CROPDMG * 1000000000
#Apply a new typecast
procesed$PROPDMGEXP <- as.numeric(procesed$PROPDMGEXP)
## Warning: NAs introduced by coercion
procesed$CROPDMGEXP <- as.numeric(procesed$CROPDMGEXP)
#Aggregate values to generate a plot about the relationship between weather event and damage cost
damages <- aggregate(PROPDMG + CROPDMG + PROPDMGEXP + CROPDMGEXP ~ EVTYPE, data=procesed, sum)
names(damages) = c("Event_Type", "Total_Damage")
head(damages)
paste("This dataset contain", nrow(procesed), "registers about Severe Weather events in United States between the years 1950 and 2011", sep = " ")
## [1] "This dataset contain 902297 registers about Severe Weather events in United States between the years 1950 and 2011"

Results

Climate events in the United States that harm the population and the economy

Obtain the values of fatalities over weather events

library(ggplot2)
  • Make a aggegate between the two variables of interest.
  • Order the events by number of fatalities and obtains the first 15 more high.
  • Generatea plot.
fatais <- aggregate(FATALITIES ~ EVTYPE, data=procesed, sum)
fatais <- fatais[order(-fatais$FATALITIES), ][1:15, ]
head(fatais)
ggplot(fatais, aes(x = EVTYPE, y = FATALITIES)) + 
    geom_bar(stat = "identity", fill = "Pink",  las = 5) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    xlab("Types of events") + ylab("Total No. Fatalities") + 
    ggtitle("Fatalities by Weather Events (Between 1950-2011)")
## Warning: Ignoring unknown parameters: las

Obtain the values of injuries over weather events

  • Make a aggegate between the two variables of interest.
  • Order the events by number of fatalities and obtains the first 15 more high.
  • Generatea plot.
lesoes <- aggregate(INJURIES ~ EVTYPE, data=procesed, sum)
lesoes <- lesoes[order(-lesoes$INJURIES), ][1:15, ]
head(lesoes)
ggplot(lesoes, aes(x = EVTYPE, y = INJURIES)) + 
    geom_bar(stat = "identity", las = 5) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    xlab("Types of events") + ylab("Total No. Injuries") + 
    ggtitle("Injuries by Weather Events  (Between 1950-2011)")
## Warning: Ignoring unknown parameters: las

What weather events generate major economic loss ($)

damages <- damages[order(-damages$Total_Damage),]
damages <- damages[1:15,]
damages <- damages[order(damages$Total_Damage),]
damages$Event_Type <- factor(damages$Event_Type, levels = damages$Event_Type)
head(damages)
#Scientifc notation to integer
options(scipen=999)
ggplot(damages, aes(x = Event_Type, y = Total_Damage, fill=EVTYPE)) + 
    geom_bar(stat = "identity", fill = "Green", las = 5) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    xlab("Types of events") + ylab("Cost Damages {US$}") + ggtitle("Events with economic consequences in USA between 1950-2011 ")
## Warning: Ignoring unknown parameters: las

Event with major economic loss ($)

damages[15,]

Last on this subset

damages[1,]

Review criteria

Has either a (1) valid RPubs URL pointing to a data analysis document for this assignment been submitted; or (2) a complete PDF file presenting the data analysis been uploaded?
Is the document written in English?
Does the analysis include description and justification for any data transformations?
Does the document have a title that briefly summarizes the data analysis?
Does the document have a synopsis that describes and summarizes the data analysis in less than 10 sentences?
Is there a section titled “Data Processing” that describes how the data were loaded into R and processed for analysis?
Is there a section titled “Results” where the main results are presented?
Is there at least one figure in the document that contains a plot?
Are there at most 3 figures in this document?
Does the analysis start from the raw data file (i.e. the original .csv.bz2 file)?
Does the analysis address the question of which types of events are most harmful to population health?
Does the analysis address the question of which types of events have the greatest economic consequences?
Do all the results of the analysis (i.e. figures, tables, numerical summaries) appear to be reproducible?
Do the figure(s) have descriptive captions (i.e. there is a description near the figure of what is happening in the figure)?
As far as you can determine, does it appear that the work submitted for this project is the work of the student who submitted it?