STAT451 Final Exam
(please refer to test for questions)
References and Resources:
R Color Cheatsheet https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf When to Use Sequential and Diverging Palettes https://everydayanalytics.ca/2017/03/when-to-use-sequential-and-diverging-palettes.html Understanding Sequential and Diverging Palettes in Tableau https://interworks.com/blog/rrouse/2014/12/15/understanding-sequential-and-diverging-color-palettes-tableau/ How to Pick the Perfect Color Combination for Your Data Visualiztion https://blog.hubspot.com/marketing/color-combination-data-visualization Statistical Language-What are Variables http://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+what+are+variables Wiki Categorical Variable https://en.wikipedia.org/wiki/Categorical_variable
Data from https://earthquake.usgs.gov/earthquakes/search/, Global records for 30 days, min. magnitude =1.0.Create csv query from site or load csv file from email, then set directory.
data <- read.csv("query.csv")
keeps <- c("time","latitude","longitude","mag")
data1 = data[keeps]
library(maps)
library(ggplot2)
world_map <- map_data("world")
p1 <- ggplot() + coord_fixed() +
xlab("") + ylab("")+labs(title = "Gloabal Earthquakes Over 30 Days")
base_world1 <- p1 + geom_polygon(data=world_map, aes(x=long, y=lat, group=group),
colour="light green", fill="light green")
etqk_data <-
base_world1 +
geom_point(data=data1,
aes(x=longitude, y=latitude), colour="Deep Pink",
fill="Pink",pch=21, alpha=I(0.7))
etqk_data
p2 <- ggplot() + coord_fixed() +
xlab("") + ylab("")+labs(title = "Gloabal Earthquakes and Magnitude(approx.) Over 30 Days")
base_world2 <- p2 + geom_polygon(data=world_map, aes(x=long, y=lat, group=group),
colour="light green", fill="light green")
cleanup <-
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'black', colour = 'black'),
axis.line = element_line(colour = "black"),
axis.ticks=element_blank(), axis.text.x=element_blank(),legend.key=element_rect(fill = 'black', colour = 'black'),
axis.text.y=element_blank())
base_world_clean <- base_world2 + cleanup
etqk_data_enhanced <-
base_world_clean +
geom_point(data=data1,
aes(x=longitude, y=latitude, size=mag),colour="Deep Pink",
fill="Pink",pch=21, alpha=I(0.2))
etqk_data_enhanced
In the end, the last graphic was a nice way to see frequency of earthquakes hitting the same areas at specific magnitudes. It is obvious that the coasts near the Pacific and Indian Ocean are being hit with large earthquakes. Whereas the North American Pacific coast seem to be feeling smaller earthquakes at a much greater frequency.
References and Resources:
Plotting Data Points on Maps with R https://sarahleejane.github.io/learning/r/2014/09/21/plotting-data-points-on-maps-with-r.html Plotting Beautiful Clear Maps with R http://sarahleejane.github.io/learning/r/2014/09/20/plotting-beautiful-clear-maps-with-r.html Modify Components of a Theme https://ggplot2.tidyverse.org/reference/theme.html R Plot PCH Symbols Chart http://www.endmemo.com/program/R/pchsymbols.php
When choosing my data it was hard to find a dataset I was interested in. After asking my girlfriend about data she would be interested in knowing, she had mentioned “Titanic”, as in the movie. I asked why? She replied, “her students had been obsessing about the movie”. She is a 3rd and 4th grade special education teacher with some awesome kids in her class. So I figured that was a good enough reason as any. So I looked up data on Titanic and found a .CSV dataset on:
https://vincentarelbundock.github.io/Rdatasets/datasets.html
…It’s titled TitanicSurvival. It is a bit morbid, but I was interested at that point.
#set directory
data2 <- read.csv("TitanicSurvival.csv")
colnames(data2)[colnames(data2)=="sex"] <- "gender"
colnames(data2)[colnames(data2)=="passengerClass"] <- "class"
ggplot(data2, aes(x = survived, fill = gender)) + geom_bar()
ggplot(data2, aes(x = survived, fill = class)) + geom_bar()
c)Again my movie knowledge and it’s historical accuracy holds, so I asked a last question. What was average age of people who died per class?
ggplot(data2, aes(x=age, y=class, shape=gender, color=survived)) +
geom_point(aes(size=survived))
## Warning: Using size for a discrete variable is not advised.
## Warning: Removed 263 rows containing missing values (geom_point).
p <- qplot(class, age,data=data2, geom=c("boxplot"), shape=gender, color=survived, fill=gender, main="Survival Average by Gender and Age",
xlab="Class", ylab="Age")
p2 <- p + theme_classic()
p2
## Warning: Removed 263 rows containing non-finite values (stat_boxplot).
Honestly, I think the final plot takes a second to grasp as it holds all the data from the dataset. I struggled with the overall look, but I think it tells the story well. And it points out details I would not have assumed. For instance, average age for males to die in first class was around 45/50, where as the age for a male in 3rd class was about 25/30. But something a bit more intriguing is on average males who died were a bit older than the women who survived. Which is intriguing, seeing as Jack Dawson was 3 years older than Rose in the movie. Don’t know if James Cameron meant to get that right, but it was pretty cool to find that out through the data visualization.
References and Resources:
Datasets https://vincentarelbundock.github.io/Rdatasets/datasets.html Chapter 2 R ggplot2 Examples http://www.stat.wisc.edu/~larget/stat302/chap2.pdf Ggplot2 Scatter Plots http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-start-guide-r-software-and-data-visualization Advanced Data Visualization with Ggplot2 https://4va.github.io/biodatasci/r-viz-gapminder.html Understanding Interpreting Boxplots https://www.wellbeingatschool.org.nz/information-sheet/understanding-and-interpreting-box-plots Plot multiple boxplots https://stackoverflow.com/questions/14604439/plot-multiple-boxplot-in-one-graph Rose Bukater https://jamescameronstitanic.fandom.com/wiki/Rose_DeWitt_Bukater Jack Dawson https://jamescameronstitanic.fandom.com/wiki/Jack_Dawson