Question 1

When using color in visualizations there are different kinds of palettes that can be used. For a)-c), Describe the pallet and explain when you would use such a pallet.

The color palette displayed in part a is a “sequential color scheme”. You can tell this because each bar of color has a continuous color scale that goes from light to dark. This works when you have continuous data, and the light-to-dark scale can represent low, medium, or high values of data. For example, you can color the points on a map based on population, with dark brown being higher populations and light brown being lower populations.
The middle color palette is a “diverging color scheme”. You can tell this because each bar of color has two seperate colors, seperated by a white square in the middle. This scheme works when the data has a two-sided quality to it. they should only be used when there is a value of importance around which the data are to be compared. Zero could be an example of the middle value, with the color on each side of the white value being a positive or a negative value.
This color palette is “qualitative”. This makes more sense with data is qualitative with classes or categories, and you might want a unique color for each. For example, for a map of America, you could color each state based on its highest item of production (such as wheat, corn, cellphones, etc.)

Question 2

Here is the link to the USGS website where the worldwide earthquake data can be downloaded. Download all earthquake data for the past 30 days in .csv format. Using R, make a map of the world with points where the earthquakes occurred. In addition, make a useful bubble map for the data. Thoroughly discuss your visualizations.

library(tidyverse)

## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

The earthquake data has been imported as a dataframe called all_month_1_…I would like to change this to something easier, called ‘earthquake’.

earthquake <- read_csv("all_month.csv")

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   time = col_datetime(format = ""),
##   magType = col_character(),
##   net = col_character(),
##   id = col_character(),
##   updated = col_datetime(format = ""),
##   place = col_character(),
##   type = col_character(),
##   status = col_character(),
##   locationSource = col_character(),
##   magSource = col_character()
## )

## See spec(...) for full column specifications.

The following code elminates any possible missing latitude or longitude data from my dataframe. This way, there won’t be any random, incorrenct points.

earthquake <- earthquake %>%
  filter(!is.na(latitude) & !is.na(longitude))

Globe Data with latitude and longitude can be mapped onto the globe. The code below reads in a list of points on a map, and each one is colored yellow. The following globe will have each and every earthquake in the map at a yellow point.

library(threejs)

## Loading required package: igraph

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union

## The following objects are masked from 'package:purrr':
## 
##     compose, simplify

## The following object is masked from 'package:tidyr':
## 
##     crossing

## The following object is masked from 'package:tibble':
## 
##     as_data_frame

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

globejs(lat = earthquake$latitude,
       lon = earthquake$longitude,
       color = "yellow")

Next, I am going to make a bubble map with the same data.

color <- cut(earthquake$mag,
breaks=quantile(earthquake$mag, c(0, 0.25, 0.5,
0.75, 1)))

The first thing I am making is a color set, which breaks the earthquake’s magnitudes into quantiles. This way, when I make the map of the world, each size of earthquake will be broken into sizes/colors that coordinate continuously and categorically. It is really hard with a map with this many points to just distingish by square size, since there are so many on the board.

earthquake2 <- earthquake
radius <- sqrt(earthquake$mag)

## Warning in sqrt(earthquake$mag): NaNs produced

symbols(earthquake$longitude, earthquake$latitude, squares=radius, inches=0.15, fg="white", bg=color, xlab="longitude", ylab="latitude")

This map is great, except that it doesn’t have the outline of the world. However, we can see based on the color and sizes of the squares what quantile each point will represent. The blue squares represent the largest (and most frequent) quantile of magnitude, while the green and the red get smaller and smaller, accordingly. We will try something different. The following code and package will give us a map of the world with a ton of points from the last thirty days. We can tell from the previous map what each color represents as far as how big the earthquakes are. When combining the three visualizations are, we have the ability to spin the globe to see where all the earthquakes happened, we are able to measure them all based on size, and we are able to see where they are on Earth, based on size.

library(rworldmap)
# get map
worldmap <- getMap(resolution = "coarse")
plot(worldmap, xlim = c(-80, 160), ylim = c(-50, 100), 
     asp = 1, bg = "lightblue", col = "black")
# add points
points(earthquake$longitude, earthquake$latitude, 
       col = color, cex = .01)

Question 3

The data set we have chosen for this question is the all_seasons.csv, which is a historical dataset from Kaggle. The dataset contains all NBA players from each season 1996 to 2019 as the 2019 season is not complete at the time of the time of the upload. It captures demographic variables such as age, height, weight and place of birth, biographical details like the team played for, draft year and round. In addition, it has basic box score statistics such as games played, average number of points, rebounds, assists, etc. We want to see the top 25 players in the NBA, and top 25 average points for each team in the NBA.

library(rvest)

## Loading required package: xml2

## 
## Attaching package: 'rvest'

## The following object is masked from 'package:purrr':
## 
##     pluck

## The following object is masked from 'package:readr':
## 
##     guess_encoding

library(tidyverse)
library(magrittr)

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

library(knitr)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:igraph':
## 
##     %--%, union

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(devtools)

## Loading required package: usethis

library(portfolio)

## Loading required package: grid

## Loading required package: lattice

## 
## Attaching package: 'portfolio'

## The following object is masked from 'package:devtools':
## 
##     create

bb <- read_csv("all_seasons.csv")

## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   player_name = col_character(),
##   team_abbreviation = col_character(),
##   college = col_character(),
##   country = col_character(),
##   draft_year = col_character(),
##   draft_round = col_character(),
##   draft_number = col_character(),
##   season = col_character()
## )

## See spec(...) for full column specifications.

head(bb,5)

For the first plot, this shows the top 25 oldest players in the NBA.

bb %>% 
 group_by(player_name) %>% 
 summarise(Age = max(age)) %>% 
 arrange(desc(Age)) %>%
 top_n(25, Age) %>%
 ggplot() +
 geom_col(aes(x = reorder(player_name, Age), y = Age), fill = "blue") +
 coord_flip()

## `summarise()` ungrouping output (override with `.groups` argument)

 scale_y_continuous(labels = unit_format(unit="Age", scale=1))

## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --    1

For the next plot, this shows the top 25 average points for each team in the NBA. We can see that the New Orleans Pelicans, Golden State Warriors, and Sacramento Kings have the highest points average through all the NBA seasons.

bb %>% 
 group_by(team_abbreviation) %>% 
 summarise(points = mean(pts)) %>% 
 arrange(desc(points)) %>%
 top_n(25, points) %>%
 ggplot() +
 geom_col(aes(x = reorder(team_abbreviation, points), y = points), fill = "dark green") +
 coord_flip()

## `summarise()` ungrouping output (override with `.groups` argument)

 scale_y_continuous(labels = unit_format(unit="Points", scale=1))

## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --    1

STAT 451 Final Take Home

Kenneth Lefin and Kevin Teresi

11/21/2020

Question 1

Question 2

Question 3