This`s part of the exercises assignments in Reproducible Research and Reporting of Statistical Analyses class in Data Science specialization pipeline in Johns Hopkins by Coursera.
The objective of this study is to observe those climatic events with the greatest impact on the health and safety of the population, as well as the economic impacts arising.
Climate events may be harmful to population, environment, and economy of societies.
This project aims to explore data from the United States between the years 1950 and 2011 to obtain information about the magnitude of the impact of each type event.
#Extract the data of zip file
dados <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"), sep=",", header=T)
head(dados)
#Subset a data frame with the variables of interest
procesed <- dados[,c(8, 23:28)]
head(procesed)
#Apply a typecast
procesed$EVTYPE <- as.factor(procesed$EVTYPE)
#Prepare strings to match with the verification
procesed$PROPDMGEXP = toupper(procesed$PROPDMGEXP)
procesed$CROPDMGEXP = toupper(procesed$CROPDMGEXP)
#Convert class units to numbers to apply compute
procesed[procesed$PROPDMGEXP == "?", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "?", ]$PROPDMG * 0
procesed[procesed$PROPDMGEXP == "H", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "H", ]$PROPDMG * 100
procesed[procesed$PROPDMGEXP == "K", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "K", ]$PROPDMG * 1000
procesed[procesed$PROPDMGEXP == "M", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "M", ]$PROPDMG * 1000000
procesed[procesed$PROPDMGEXP == "B", ]$PROPDMGEXP = procesed[procesed$PROPDMGEXP == "B", ]$PROPDMG * 1000000000
procesed[procesed$CROPDMGEXP == "?", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "?", ]$CROPDMG * 0
procesed[procesed$CROPDMGEXP == "H", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "H", ]$CROPDMG * 100
procesed[procesed$CROPDMGEXP == "K", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "K", ]$CROPDMG * 1000
procesed[procesed$CROPDMGEXP == "M", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "M", ]$CROPDMG * 1000000
procesed[procesed$CROPDMGEXP == "B", ]$CROPDMGEXP = procesed[procesed$CROPDMGEXP == "B", ]$CROPDMG * 1000000000
#Apply a new typecast
procesed$PROPDMGEXP <- as.numeric(procesed$PROPDMGEXP)
## Warning: NAs introduced by coercion
procesed$CROPDMGEXP <- as.numeric(procesed$CROPDMGEXP)
#Aggregate values to generate a plot about the relationship between weather event and damage cost
damages <- aggregate(PROPDMG + CROPDMG + PROPDMGEXP + CROPDMGEXP ~ EVTYPE, data=procesed, sum)
names(damages) = c("Event_Type", "Total_Damage")
head(damages)
paste("This dataset contain", nrow(procesed), "registers about Severe Weather events in United States between the years 1950 and 2011", sep = " ")
## [1] "This dataset contain 902297 registers about Severe Weather events in United States between the years 1950 and 2011"
library(ggplot2)
fatais <- aggregate(FATALITIES ~ EVTYPE, data=procesed, sum)
fatais <- fatais[order(-fatais$FATALITIES), ][1:15, ]
head(fatais)
ggplot(fatais, aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity", fill = "Pink", las = 5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Types of events") + ylab("Total No. Fatalities") +
ggtitle("Fatalities by Weather Events (Between 1950-2011)")
## Warning: Ignoring unknown parameters: las
lesoes <- aggregate(INJURIES ~ EVTYPE, data=procesed, sum)
lesoes <- lesoes[order(-lesoes$INJURIES), ][1:15, ]
head(lesoes)
ggplot(lesoes, aes(x = EVTYPE, y = INJURIES)) +
geom_bar(stat = "identity", las = 5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Types of events") + ylab("Total No. Injuries") +
ggtitle("Injuries by Weather Events (Between 1950-2011)")
## Warning: Ignoring unknown parameters: las
damages <- damages[order(-damages$Total_Damage),]
damages <- damages[1:15,]
damages <- damages[order(damages$Total_Damage),]
damages$Event_Type <- factor(damages$Event_Type, levels = damages$Event_Type)
head(damages)
#Scientifc notation to integer
options(scipen=999)
ggplot(damages, aes(x = Event_Type, y = Total_Damage, fill=EVTYPE)) +
geom_bar(stat = "identity", fill = "Green", las = 5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Types of events") + ylab("Cost Damages {US$}") + ggtitle("Events with economic consequences in USA between 1950-2011 ")
## Warning: Ignoring unknown parameters: las
damages[15,]
damages[1,]
Has either a (1) valid RPubs URL pointing to a data analysis document for this assignment been submitted; or (2) a complete PDF file presenting the data analysis been uploaded?
Is the document written in English?
Does the analysis include description and justification for any data transformations?
Does the document have a title that briefly summarizes the data analysis?
Does the document have a synopsis that describes and summarizes the data analysis in less than 10 sentences?
Is there a section titled “Data Processing” that describes how the data were loaded into R and processed for analysis?
Is there a section titled “Results” where the main results are presented?
Is there at least one figure in the document that contains a plot?
Are there at most 3 figures in this document?
Does the analysis start from the raw data file (i.e. the original .csv.bz2 file)?
Does the analysis address the question of which types of events are most harmful to population health?
Does the analysis address the question of which types of events have the greatest economic consequences?
Do all the results of the analysis (i.e. figures, tables, numerical summaries) appear to be reproducible?
Do the figure(s) have descriptive captions (i.e. there is a description near the figure of what is happening in the figure)?
As far as you can determine, does it appear that the work submitted for this project is the work of the student who submitted it?