Lab 2 Assignment

Complete ALL of the essentials below correctly to earn an ‘S’ on the lab.
Complete the Depth portion successful to earn credit toward a depth boost (every 2 lab depth assignments completed earns a 1/3 letter grade boost to your final grade)

Render your document as a .pdf or .html and submit it to the google folder on Moodle for grading.

Load Packages

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.5      ✔ purrr   1.0.1 
✔ tibble  3.1.6      ✔ dplyr   1.0.10
✔ tidyr   1.3.0      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.1 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(patchwork)
library(ggsci)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Load data to use!

We can use this fun ferris wheels data I found online to make some practice graphs!

wheels <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv')

kable(head(wheels)) %>%
  kable_styling()
X name height diameter opened closed country location number_of_cabins passengers_per_cabin seating_capacity hourly_capacity ride_duration_minutes climate_controlled construction_cost status design_manufacturer type vip_area ticket_cost_to_ride official_website turns
1 360 Pensacola Beach 200.00 NA 2012-07-03 2013-01-01 USA Pensacola Beach; Florida 42 6 252 1260 12.0 Yes Unknown Moved Realty Masters of FL Transportable Yes NA NA 4
2 Amuran 303.00 199.8 2004-01-01 NA Japan Kagoshima; Kyushu 36 NA NA NA 14.5 Yes Unknown Operating NA NA NA NA NA 1
3 Asiatique Sky 200.00 200.0 2012-12-15 NA Tailand Asiatique the Riverfront 42 NA NA NA NA Yes Unknown Operating Dutch Wheels (Vekoma) NA NA NA http://www.asiatiquesky.com/ NA
4 Aurora Wheel 295.00 272.0 NA NA Japan Nagashima Spa Land; Mie; Honshu NA NA NA NA NA NA Unknown Operating NA Fixed NA NA http://www.nagashima-onsen.co.jp/ NA
5 Baghdad Eye 180.00 NA 2011-01-01 NA Iraq Al-Zawraa Park; Baghdad 40 6 240 960 15.0 NA $6 million USD Operating NA NA NA 3.5 NA NA
6 Beijing Great Wheel 692.64 642.7 NA NA China Chaoyang Park; Beijing 48 40 1920 5760 20.0 yes $290 million USD Delayed The Great Wheel Corporation Fixed NA NA NA 1

Essentials

1.) What’s in a graph?

Write a paragraph explaining some tenants of good vs. bad graphics. Be specific!

The point of a graph is to convey data to an audience in a visual way. Therefore, a good graph doesn’t have too many elements in order to keep it simple yet informative. If you are trying to show too many different variables it can become difficult for the reader to understand. A good graph also shows the data (avoid bar graphs which generally obscure data) and contains a clear legend. The y-axis should start at 0 and both axes should be labelled with a clear scale. It probably should have a figure legend or title so that people know what they’re looking at. A good graph also has indicators of significance.

A bad graph is basically one that does not do the things listed above and thus is difficult to read and does not fulfill its purpose. A bad graph may have no color and/or grouping or that grouping is very difficult to see/distinguish.

These are all general aspects of what makes a good or bad graph, but I think it does vary slightly depending on the situation, what type of graph, and what information one is trying to convey. Also, as we saw in lab, you can have a very effective graph that is also not very good.

2.) Graphs

Make the following plots with the ferris wheel data: histogram, boxplot, bar graph, line graph, scatterplot

Histogram

wheels3 <- filter(wheels, country == "Japan"| country == "China"| country == "USA")

#Playing around with trying to get head() in a prettier table
kable(head(wheels3)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
X name height diameter opened closed country location number_of_cabins passengers_per_cabin seating_capacity hourly_capacity ride_duration_minutes climate_controlled construction_cost status design_manufacturer type vip_area ticket_cost_to_ride official_website turns
1 360 Pensacola Beach 200.00 NA 2012-07-03 2013-01-01 USA Pensacola Beach; Florida 42 6 252 1260 12.0 Yes Unknown Moved Realty Masters of FL Transportable Yes NA NA 4
2 Amuran 303.00 199.8 2004-01-01 NA Japan Kagoshima; Kyushu 36 NA NA NA 14.5 Yes Unknown Operating NA NA NA NA NA 1
4 Aurora Wheel 295.00 272.0 NA NA Japan Nagashima Spa Land; Mie; Honshu NA NA NA NA NA NA Unknown Operating NA Fixed NA NA http://www.nagashima-onsen.co.jp/ NA
6 Beijing Great Wheel 692.64 642.7 NA NA China Chaoyang Park; Beijing 48 40 1920 5760 20.0 yes $290 million USD Delayed The Great Wheel Corporation Fixed NA NA NA 1
8 Big O 197.00 200.0 2006-01-01 NA Japan Tokyo Dome City NA NA NA NA 15.0 Yes Unknown Operating NA Centerless NA 7.87 http://www.tokyo-dome.co.jp/e/attractions/ NA
10 Changsha Ferris Wheel 394.00 325.0 2004-10-01 NA China Changsha; Hunan 48 8 384 1152 20.0 Yes Unknown Operating NA Fixed NA 8.15 http://changshahua.com/entertainment/outdoor-activities/changsha-ferris-wheel/ NA
phist <- ggplot(data = wheels3, aes(x = height)) +
  geom_histogram(bins = 10) +
  theme_bw()

phist
Warning: Removed 1 rows containing non-finite values (stat_bin).

Boxplot

pbox <- ggplot(data = wheels3, aes(x = country, y = height, fill = country)) +
  geom_boxplot() +
  theme_classic() +
  scale_fill_lancet()
pbox
Warning: Removed 1 rows containing non-finite values (stat_boxplot).

Bar graph

pbar <- ggplot(data = wheels3, aes(x = country, fill = country)) +
  geom_bar() +
  theme_classic() +
  scale_fill_lancet()


pbar

Line graph

pline <- ggplot(data = wheels3, aes(x = height, y = diameter, color = country)) +
  geom_line() +
  theme_classic() +
  scale_color_lancet()

pline
Warning: Removed 2 row(s) containing missing values (geom_path).

Scatterplot

ggplot(data = wheels3, aes(x = height, y = diameter, color = country)) +
         geom_point() +
         theme_classic() +
  scale_color_lancet()
Warning: Removed 15 rows containing missing values (geom_point).

3.) Backgrounds

Using your scatterplot from #2 and remove the gray background from the plot. Continue using this same plot for 3-6

See above

4.) Colors!

Change the colors away from default colors. Show me an example of manually changing the colors and an example of you using ggsci to change the colors.

Changing color manually

ggplot(data = wheels3, aes(x = height, y = diameter, color = country)) +
  geom_point() +
  theme_classic() +
  scale_color_manual(values = c('red', 'black', 'blue'))
Warning: Removed 15 rows containing missing values (geom_point).

Changing color using ggsci

ppoint <- ggplot(data = wheels3, aes(x = height, y = diameter, color = country)) +
  geom_point() +
  theme_classic() +
  scale_color_lancet()

ppoint
Warning: Removed 15 rows containing missing values (geom_point).

5.) Shape and size

Change the shape and size of your points

ggplot(data = wheels3, aes(x = height, y = diameter, color = country, shape = country, size = country)) +
  geom_point() +
  theme_classic() +
  scale_color_manual(values = c('hotpink', 'darkgreen', 'darkorchid'))
Warning: Using size for a discrete variable is not advised.
Warning: Removed 15 rows containing missing values (geom_point).

6.) Lines

Add lines to connect your points to one another.

ggplot(data = wheels3, aes(x = height, y = diameter, color = country)) +
  geom_point() +
  geom_line() +
  theme_classic() +
  scale_color_lancet()
Warning: Removed 15 rows containing missing values (geom_point).
Warning: Removed 2 row(s) containing missing values (geom_path).

7.) Color vs Fill

Using your bar graph, change both the color and fill and see how those are different.

Changing fill

wheels4 <- wheels3 %>%
  drop_na(climate_controlled)

ggplot(data = wheels4, aes(x = country, fill = climate_controlled)) +
  geom_bar(color = "black") +
  theme_classic() +
  scale_fill_npg()

Changing color

wheels4 <- wheels3 %>%
  drop_na(climate_controlled)

ggplot(data = wheels4, aes(x = country, color = climate_controlled)) +
  geom_bar(fill = "white") +
  theme_classic() +
  scale_color_npg()

8.) Facet_wrap!

Use facet_wrap to alter one of your graphs!

ggplot(data = wheels3, aes(x = height, y = diameter, color = country)) +
  geom_point() +
  facet_wrap(~country, nrow = 3) +
  theme_bw() +
  scale_color_lancet()
Warning: Removed 15 rows containing missing values (geom_point).

Depth

1.) Patchwork

Using patchwork, put all of your graphs from Essentials #2 on the same output page!

library(patchwork)
(phist + pbox)/(ppoint + pline)/pbar
Warning: Removed 1 rows containing non-finite values (stat_bin).
Warning: Removed 1 rows containing non-finite values (stat_boxplot).
Warning: Removed 15 rows containing missing values (geom_point).
Warning: Removed 2 row(s) containing missing values (geom_path).

2.) Pipes!

Using the wheels data, group data, calculate an average, and plot data with error bars! Ask for help if you need it. We may not have learned all of this in class just yet.

wheels5 <- wheels3 %>% 
  group_by(country) %>% 
  drop_na(height) %>%
  summarize(meanheight = mean(height), sd = sd(height), n = n(), se = sd/sqrt(n))
wheels5
# A tibble: 3 × 5
  country meanheight    sd     n    se
  <chr>        <dbl> <dbl> <int> <dbl>
1 China         432. 110.      9  36.6
2 Japan         317.  74.1    12  21.4
3 USA           282. 184.     19  42.1
#Plot the data

ggplot(wheels5, aes(x = country, y = meanheight, color = country)) +
  geom_point()+
  geom_errorbar(data = wheels5, aes(x = country, ymin = meanheight-se, ymax = meanheight+se), width = 0.2) +
  scale_color_lancet()

3.) Labels

Using theme() and labs() add custom labels to your X and Y axis. Add a title. Change the size of the text on both axes. This may be beyond our tutorial in class, so ask Justin and/or use google or resources linked above. Theme() is extremely powerful and will always be useful for us!

ggplot(wheels5, aes(x = country, y = meanheight, color = country)) +
  geom_point()+
  geom_errorbar(data = wheels5, aes(x = country, ymin = meanheight-se, ymax = meanheight+se), width = 0.2) +
  scale_color_lancet() +
  theme_classic() +
  labs(title = "Mean Ferris Wheel Height by Country", x = "Country", y = "Mean height") +
  theme(axis.text.x = element_text(vjust = 0.5, size = 14), axis.title = element_text(size = 20), plot.title = element_text(size = 26), legend.text = (element_text(size = 12)), legend.title = element_text(size = 14))

When finished, render as html or pdf and confirm that your file looks the way it should. Then submit on Moodle (via the google form).