Clone the provided repository (1 point) Write a vignette using one TidyVerse package (15 points) Write a vignette using more than one TidyVerse packages (+ 2 points) Make a pull request on the shared repository (1 point) Update the README.md file with your example (2 points) Submit your GitHub handle name & link to Peergrade (1 point) Grade your 3 peers and provide the feedback in Peergrade (2 points) Submit the best peer link & your link to Blackboard (1 point
“The Tidyverse” is a collection of packages that work in tandem since they all “share common data representations and API design” \(^1\). Using the command “library(tidyverse)”, we can see all packages included within the tidyverse.
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.0
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(dplyr)
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(mapproj)
In this assignment, the packages ggplot and dpylr will be exemplified.
ggplot is utilized for visualizations by declaring the input data for a graphic, then specifying the set of plot aesthetics to enhance the figure.
data('possum', package = "openintro")
head(possum, 4)
## # A tibble: 4 x 8
## site pop sex age head_l skull_w total_l tail_l
## <int> <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 Vic m 8 94.1 60.4 89 36
## 2 1 Vic f 6 92.5 57.6 91.5 36.5
## 3 1 Vic f 6 94 60 95.5 39
## 4 1 Vic f 6 93.2 57.1 92 38
Graphing data:
Plotting a historgram of possums’ tail lengths
ggplot(data = possum, aes(x=tail_l)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
He would probably plot his data like so:
This is a data visual by plotting polygons as data points. It’s a comparison between the age of possums to the length of their tails. Not particularly easy to read, but nice to look at.
To interpret this cubism graph, each edge is a coordinate of \((age, tail)\). Compare with the histogram below:
ggplot(data = possum, aes(x=age, y = tail_l)) + geom_polygon(aes(x = age, y = tail_l))
Here is the a scatterplot of the same data as above in the form of a histogram with a line of best fit.
ggplot(data = possum, aes(x = age, y = tail_l)) + geom_point() + geom_smooth(aes(x = age, y = tail_l), method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
ggplot also gives us the ability to plot geographical coordinates, which provides us a visual location of variables. To illustrate an example, a dataset from the “maps” library can be utilized to plot various portions of the World (according to Countries and political boundaries). This will be done in 2D, and as it is on a globe.
2-D map of the world
the_world <- map_data("world") %>% as_tibble()
head(the_world, 5)
## # A tibble: 5 x 6
## long lat group order region subregion
## <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -69.9 12.5 1 1 Aruba <NA>
## 2 -69.9 12.4 1 2 Aruba <NA>
## 3 -69.9 12.4 1 3 Aruba <NA>
## 4 -70.0 12.5 1 4 Aruba <NA>
## 5 -70.1 12.5 1 5 Aruba <NA>
globe <- the_world %>% ggplot() + geom_map(
aes(long, lat, map_id = region),
map = the_world,
color = "darkolivegreen", fill = "darkolivegreen", size = 0.3)
## Warning: Ignoring unknown aesthetics: x, y
globe
The globe: Viewing the eastern parts of Africa and Europe, Asia, and Australia. This is made possible by adding the function \(coordmap("ortho", orientation = c(x,y,z))\). This gives the appearance as if one were looking at one side of the globe, yet unable to rotate it.
the_globe <- the_world %>% ggplot() + geom_map(
aes(long, lat, map_id = region),
map = the_world,
color = "chartreuse4", fill = "chartreuse4", size = 0.1) +
coord_map("ortho", orientation = c(10, 95, 0))
the_globe
Map of Italy as it is on the globe
tavola_italiana <- map_data("italy") %>% as_tibble()
tavola_italiana %>%
ggplot(aes(long, lat, map_id = region)) +
geom_map(
map = tavola_italiana,
color = "brown3", fill = "brown3", size = 0.3
) + coord_map("ortho", orientation = c(41, 12, 0))
Excludes AK and HI
usa_map <- map_data("state") %>% as_tibble()
usa_map %>%
ggplot(aes(long, lat, map_id = region)) +
geom_map(
map = usa_map,
color = "firebrick2", fill = "deepskyblue", size = 0.3
) + coord_map("ortho", orientation = c(39, -98, 0))
In this example, voting data regarding the 1940 election between Franklin D. Roosevelt and Wendell Willkie is projected onto the above map of the United States as a choropleth map.
From the built in Maps library, there exists a dataset that contains republican election data up to 1976. This is entitled “votes.repub”. \(^2\)
republicans <- maps::votes.repub %>%
as.tibble(rownames = "state") %>%
select(state, '1940') %>%
rename(repub_prop = '1940') %>%
mutate(repub_prop = repub_prop / 100) %>%
mutate(state = str_to_lower(state))
republican_voting_table <- usa_map %>%
left_join(republicans, by = c("region"="state"))
head(republican_voting_table,3)
## # A tibble: 3 x 7
## long lat group order region subregion repub_prop
## <dbl> <dbl> <dbl> <int> <chr> <chr> <dbl>
## 1 -87.5 30.4 1 1 alabama <NA> 0.143
## 2 -87.5 30.4 1 2 alabama <NA> 0.143
## 3 -87.5 30.4 1 3 alabama <NA> 0.143
Plotting the voting map via ggplot
republican_voting_table %>%
ggplot(aes(long,lat, group = subregion)) +
geom_map(
aes( map_id = region),
map = usa_map,
color = "gray80", fill = 'gray30', size = 0.3
) +
coord_map("ortho", orientation = c(39, -98, 0)) +
geom_polygon(aes(group = group, fill = repub_prop), color = 'gray5') +
scale_fill_gradient2(low = "blue3", mid = "white", high = "darkred", midpoint = 0.5, labels = scales::label_pvalue()) +
labs(
title = "1940 Voting Outcome",
x = "(D): Blue; (R): Red", y = "", fill = " "
) +
theme (
plot.title = element_text(size = 28, face = "bold", color = "black"),
legend.position = "right")
To conclude, ggplot can be used to plot all sorts of information. From graphs globes, this is a very versatile tool to use. With the help of DPLYR, we can also combine two data sets to have them work in tandem. Utilizing multiple packages within the tidyverse allows one to expand their data functions and come up with elaborate and interesting projects.
\(^1\) https://tidyverse.tidyverse.org/#:~:text=The%20tidyverse%20is%20a%20set%20of%20packages%20that,packages%20from%20the%20tidyverse%20in%20a%20single%20command.