Lab 2 Assignment

Author

Maya Frey

Complete ALL of the essentials below correctly to earn an ‘S’ on the lab.
Complete the Depth portion successful to earn credit toward a depth boost (every 2 lab depth assignments completed earns a 1/3 letter grade boost to your final grade)

Render your document as a .pdf or .html and submit it to the google folder on Moodle for grading.

Load Packages

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.3.0      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(patchwork)
library(ggsci)

Load data to use!

We can use this fun ferris wheels data I found online to make some practice graphs!

wheels <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv')

head(wheels)
  X                name height diameter     opened     closed country
1 1 360 Pensacola Beach 200.00       NA 2012-07-03 2013-01-01     USA
2 2              Amuran 303.00    199.8 2004-01-01       <NA>   Japan
3 3       Asiatique Sky 200.00    200.0 2012-12-15       <NA> Tailand
4 4        Aurora Wheel 295.00    272.0       <NA>       <NA>   Japan
5 5         Baghdad Eye 180.00       NA 2011-01-01       <NA>    Iraq
6 6 Beijing Great Wheel 692.64    642.7       <NA>       <NA>   China
                         location number_of_cabins passengers_per_cabin
1        Pensacola Beach; Florida               42                    6
2               Kagoshima; Kyushu               36                   NA
3        Asiatique the Riverfront               42                   NA
4 Nagashima Spa Land; Mie; Honshu               NA                   NA
5         Al-Zawraa Park; Baghdad               40                    6
6          Chaoyang Park; Beijing               48                   40
  seating_capacity hourly_capacity ride_duration_minutes climate_controlled
1              252            1260                  12.0                Yes
2               NA              NA                  14.5                Yes
3               NA              NA                    NA                Yes
4               NA              NA                    NA               <NA>
5              240             960                  15.0               <NA>
6             1920            5760                  20.0                yes
  construction_cost    status         design_manufacturer          type
1           Unknown     Moved        Realty Masters of FL Transportable
2           Unknown Operating                        <NA>          <NA>
3           Unknown Operating       Dutch Wheels (Vekoma)          <NA>
4           Unknown Operating                        <NA>         Fixed
5    $6 million USD Operating                        <NA>          <NA>
6  $290 million USD   Delayed The Great Wheel Corporation         Fixed
  vip_area ticket_cost_to_ride                  official_website turns
1      Yes                <NA>                              <NA>     4
2     <NA>                <NA>                              <NA>     1
3     <NA>                <NA>      http://www.asiatiquesky.com/    NA
4     <NA>                <NA> http://www.nagashima-onsen.co.jp/    NA
5     <NA>                 3.5                              <NA>    NA
6     <NA>                <NA>                              <NA>     1

Essentials

1.) What’s in a graph?
Write a paragraph explaining some tenants of good vs. bad graphics. Be specific! A good graph is one that clearly conveys the data by making the patterns in the data easy to see and understand. In making the patterns easily visible, a good graph will make the actual data visible so someone reading it can see the distribution of data underlying the trends and determine if the data has been interpreted in an accurate way. The magnitudes of the values on the graph should also be represented honestly. For example, a bad graph might have the y-axis manipulated so it does not start at 0 and values look smaller or larger relative to each other than they actually are. The data should be represented in a way that is honest and accurate.

2.) Make the following plots with the ferris wheel data: histogram, boxplot, bar graph, line graph, scatterplot

# Histogram
ggplot(data=wheels, aes(height))+geom_histogram()+theme_bw()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).

#Boxplot
ggplot(data=wheels, aes(x=status, y= height)) + geom_boxplot() +theme_bw()+ theme(axis.text.x=element_text(angle=90, vjust=0.5))
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).

#Bar graph
ggplot(data=wheels, aes(country))+geom_bar()+theme_classic()+theme(axis.text.x=element_text(angle=90, vjust=0.5))

#Line graph
ggplot(data=wheels, aes(x= height, y= number_of_cabins))+geom_line()+theme_bw()
Warning: Removed 2 rows containing missing values (`geom_line()`).

#Scatterplot
ggplot(data=wheels, aes(x = height, y = seating_capacity)) + geom_point()+theme_bw()
Warning: Removed 18 rows containing missing values (`geom_point()`).

3.) Using your scatterplot from #2 and remove the gray background from the plot. Continue using this same plot for 3-6

ggplot(data=wheels, aes(x=height, y=seating_capacity))+geom_point()+theme_bw()
Warning: Removed 18 rows containing missing values (`geom_point()`).

4.) Change the colors away from default colors. Show me an example of manually changing the colors and an example of you using ggsci to change the colors.

#Manually changing colors
wheels2 <- filter(wheels, country == "USA" | country == "China")
ggplot(data=wheels2, aes(x=height, y=seating_capacity, color=country))+geom_point()+theme_classic()+scale_fill_manual(values=c('red', 'blue'))
Warning: Removed 2 rows containing missing values (`geom_point()`).

#ggsci to change colors
ggplot(data=wheels2, aes(x=height, y=seating_capacity, color=country))+geom_point()+theme_classic()+scale_color_nejm()
Warning: Removed 2 rows containing missing values (`geom_point()`).

5.) Change the shape and size of your points!

#Change shape of points
ggplot(data=wheels2, aes(x=height, y=seating_capacity, shape= country))+geom_point()+theme_classic()
Warning: Removed 2 rows containing missing values (`geom_point()`).

#Change size of points
ggplot(data=wheels, aes(x=height, y=seating_capacity, size=number_of_cabins))+geom_point()+theme_classic()
Warning: Removed 18 rows containing missing values (`geom_point()`).

6.) Add lines to connect your points to one another.

ggplot(data=wheels, aes(x=height, y=seating_capacity))+geom_point()+theme_bw()+geom_line()
Warning: Removed 18 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_line()`).

7.) Using your bar graph, change both the color and fill and see how those are different.

#Change color
ggplot(data=wheels2, aes(country, color=status))+geom_bar() + theme(axis.text.x=element_text(angle=90, vjust=0.5))+theme_classic()

#Change fill
ggplot(data=wheels2, aes(country, fill=status))+geom_bar() + theme(axis.text.x=element_text(angle=90, vjust=0.5))+theme_classic()

8.) Use facet_wrap to alter one of your graphs!

ggplot(data=wheels2, aes(x=status, y= height)) + geom_boxplot() +theme_bw()+ theme(axis.text.x=element_text(angle=90, vjust=0.5))+facet_wrap(~country)

Depth

1.) Using patchwork, put all of your graphs from Essentials #2 on the same output page!

library(patchwork)
hist <- ggplot(data=wheels, aes(height))+geom_histogram()+theme_bw()
#Boxplot
box <- ggplot(data=wheels, aes(x=status, y= height)) + geom_boxplot() +theme_bw()+ theme(axis.text.x=element_text(angle=90, vjust=0.5))
#Bar graph
bar <- ggplot(data=wheels, aes(country))+geom_bar()+theme_classic()+theme(axis.text.x=element_text(angle=90, vjust=0.5))
#Line graph
line <- ggplot(data=wheels, aes(x= height, y= number_of_cabins))+geom_line()+theme_bw()
#Scatterplot
scatter <- ggplot(data=wheels, aes(x = height, y = seating_capacity)) + geom_point()+theme_bw()
hist + box + bar + line + scatter
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1 rows containing non-finite values (`stat_bin()`).
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 2 rows containing missing values (`geom_line()`).
Warning: Removed 18 rows containing missing values (`geom_point()`).

2.) Using the wheels data, group data, calculate an average, and plot data with error bars! Ask for help if you need it. We may not have learned all of this in class just yet.

#Create new df with mean height by (filtered) country
wheels2_meanheight <- wheels2 %>%
  group_by(country) %>%
  drop_na(height) %>%
  summarize(meanheight = mean(height), sd= sd(height), n = n(), se = sd/sqrt(n))
#Graph mean height by (filtered) country with error bars
ggplot(data=wheels2_meanheight, aes(x=country, y=meanheight))+geom_point()+geom_errorbar(data=wheels2_meanheight, aes(x=country, ymin=meanheight-se, ymax=meanheight+se))+theme_classic()

3.) Using theme() and labs() add custom labels to your X and Y axis. Add a title. Change the size of the text on both axes. This may be beyond our tutorial in class, so ask Justin and/or use google or resources linked above. Theme() is extremely powerful and will always be useful for us!

ggplot(data=wheels2, aes(x=height, y=seating_capacity, color=country))+geom_point()+theme_classic()+scale_color_nejm()+ theme(axis.text = element_text(size=12), axis.title = element_text(size=14))+labs(x='Height', y = 'Seating Capacity', title = 'Height versus Seating Capacity by Country', color = 'Country') 
Warning: Removed 2 rows containing missing values (`geom_point()`).

When finished, render as html or pdf and confirm that your file looks the way it should. Then submit on Moodle (via the google form).