Lab 2 Assignment

Complete ALL of the essentials below correctly to earn an ‘S’ on the lab.
Complete the Depth portion successful to earn credit toward a depth boost (every 2 lab depth assignments completed earns a 1/3 letter grade boost to your final grade)

Render your document as a .pdf or .html and submit it to the google folder on Moodle for grading.

Load Packages

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.3.0      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(patchwork)
library(ggsci)

Load data to use!

We can use this fun ferris wheels data I found online to make some practice graphs!

wheels <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv')

head(wheels)
  X                name height diameter     opened     closed country
1 1 360 Pensacola Beach 200.00       NA 2012-07-03 2013-01-01     USA
2 2              Amuran 303.00    199.8 2004-01-01       <NA>   Japan
3 3       Asiatique Sky 200.00    200.0 2012-12-15       <NA> Tailand
4 4        Aurora Wheel 295.00    272.0       <NA>       <NA>   Japan
5 5         Baghdad Eye 180.00       NA 2011-01-01       <NA>    Iraq
6 6 Beijing Great Wheel 692.64    642.7       <NA>       <NA>   China
                         location number_of_cabins passengers_per_cabin
1        Pensacola Beach; Florida               42                    6
2               Kagoshima; Kyushu               36                   NA
3        Asiatique the Riverfront               42                   NA
4 Nagashima Spa Land; Mie; Honshu               NA                   NA
5         Al-Zawraa Park; Baghdad               40                    6
6          Chaoyang Park; Beijing               48                   40
  seating_capacity hourly_capacity ride_duration_minutes climate_controlled
1              252            1260                  12.0                Yes
2               NA              NA                  14.5                Yes
3               NA              NA                    NA                Yes
4               NA              NA                    NA               <NA>
5              240             960                  15.0               <NA>
6             1920            5760                  20.0                yes
  construction_cost    status         design_manufacturer          type
1           Unknown     Moved        Realty Masters of FL Transportable
2           Unknown Operating                        <NA>          <NA>
3           Unknown Operating       Dutch Wheels (Vekoma)          <NA>
4           Unknown Operating                        <NA>         Fixed
5    $6 million USD Operating                        <NA>          <NA>
6  $290 million USD   Delayed The Great Wheel Corporation         Fixed
  vip_area ticket_cost_to_ride                  official_website turns
1      Yes                <NA>                              <NA>     4
2     <NA>                <NA>                              <NA>     1
3     <NA>                <NA>      http://www.asiatiquesky.com/    NA
4     <NA>                <NA> http://www.nagashima-onsen.co.jp/    NA
5     <NA>                 3.5                              <NA>    NA
6     <NA>                <NA>                              <NA>     1

Essentials

1.) What’s in a graph? Write a paragraph explaining some tenants of good vs. bad graphics. Be specific!

#A good graph shows the data, has patterns easy to use, honest magnitudes and clear graphical elements, and a bad graph ignores the number of data, manipulates patterns of graph to promote what they believe, and it's hard to understand and the magnitude of the data is manipulated.

2.) Make the following plots with the ferris wheel data: histogram, boxplot, bar graph, line graph, scatterplot

#Histogram
ggplot(data=wheels,aes(x=number_of_cabins))+geom_histogram(binwidth=5)
Warning: Removed 11 rows containing non-finite values (`stat_bin()`).

#boxplot
ggplot(data=wheels,aes(x=country,y=height))+geom_boxplot(data=wheels)+theme(axis.text.x = element_text(angle=90,vjust=0.5,size=14)) 
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).

#Theme()to rotate the text in x axis

#bar graph
ggplot(data=wheels, aes(country))+geom_bar()+theme(axis.text.x = element_text(angle=90,vjust=0.5,size=10)) 

#line graph
#wheels2 <- wheels %>% 
#drop_na(diameter,number_of_cabins) %>%
#ggplot(data=wheels2,aes(x=diameter,y=number_of_cabins))+geom_line() <- this didin't work...

ggplot(data=wheels,aes(x=diameter,y=number_of_cabins))+geom_line()
Warning: Removed 35 rows containing missing values (`geom_line()`).

#scatterplot
ggplot(data=wheels,aes(x=diameter,y=number_of_cabins))+geom_point()
Warning: Removed 38 rows containing missing values (`geom_point()`).

3.) Using your scatterplot from #2 and remove the gray background from the plot. Continue using this same plot for 3-6

#scatterplot
ggplot(data=wheels,aes(x=diameter,y=number_of_cabins))+geom_point()+theme_bw()
Warning: Removed 38 rows containing missing values (`geom_point()`).

4.) Change the colors away from default colors. Show me an example of manually changing the colors and an example of you using ggsci to change the colors.

#Manually changing the colors
ggplot(data=wheels,aes(x=diameter,y=number_of_cabins,color=country))+geom_point()+theme_bw()
Warning: Removed 38 rows containing missing values (`geom_point()`).

#Using ggsci to change colors
ggplot(data=wheels,aes(x=diameter,y=number_of_cabins,color=country))+geom_point()+theme_bw()+scale_color_ucscgb()
Warning: Removed 38 rows containing missing values (`geom_point()`).

5.) Change the shape and size of your points!

#Change the shape
ggplot(data=wheels,aes(x=diameter,y=number_of_cabins,color=country))+geom_point(shape=18)+theme_bw()
Warning: Removed 38 rows containing missing values (`geom_point()`).

#Change the size
ggplot(data=wheels,aes(x=diameter,y=number_of_cabins,color=country))+geom_jitter(size=2)+theme_bw()
Warning: Removed 38 rows containing missing values (`geom_point()`).

6.) Add lines to connect your points to one another.

ggplot(data=wheels,aes(x=diameter,y=number_of_cabins,color=country))+geom_point(shape=18)+theme_bw()+geom_line()
Warning: Removed 38 rows containing missing values (`geom_point()`).
Warning: Removed 35 rows containing missing values (`geom_line()`).

7.) Using your bar graph, change both the color and fill and see how those are different.

#bar graph, add a color for each status of ferris wheels
ggplot(data=wheels, aes(country))+geom_bar(aes(fill=status))+theme(axis.text.x = element_text(angle=90,vjust=0.5,size=10))

8.) Use facet_wrap to alter one of your graphs!

#Histogram
ggplot(data=wheels,aes(x=number_of_cabins,fill=country))+geom_histogram(binwidth=5)+facet_wrap(~status)+theme_bw()
Warning: Removed 11 rows containing non-finite values (`stat_bin()`).

Depth

1.) Using patchwork, put all of your graphs from Essentials #2 on the same output page!

#Histogram
p1<-ggplot(data=wheels,aes(x=number_of_cabins))+geom_histogram(binwidth=5)

#boxplot
p2<-ggplot(data=wheels,aes(x=country,y=height))+geom_boxplot(data=wheels)+theme(axis.text.x = element_text(angle=90,vjust=0.5,size=5))

#bar graph
p3<-ggplot(data=wheels, aes(country))+geom_bar()+theme(axis.text.x = element_text(angle=90,vjust=0.5,size=5))

#line graph
p4<-ggplot(data=wheels,aes(x=diameter,y=number_of_cabins))+geom_line()

#scatterplot
p5<-ggplot(data=wheels,aes(x=diameter,y=number_of_cabins))+geom_point()

#patchwork them together
p1+p2+p3+p4+p5
Warning: Removed 11 rows containing non-finite values (`stat_bin()`).
Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 35 rows containing missing values (`geom_line()`).
Warning: Removed 38 rows containing missing values (`geom_point()`).

2.) Using the wheels data, group data, calculate an average, and plot data with error bars! Ask for help if you need it. We may not have learned all of this in class just yet.

#Group data
#Calculate an average
#Plot data with error bars

wheels2 <- wheels %>% 
group_by(country) %>% #group by countries
drop_na(number_of_cabins) %>% #omit na
summarize(meancabins = mean(number_of_cabins),sd=sd(number_of_cabins),n=n(),se=sd/sqrt(n)) %>%
drop_na(sd)
wheels2
# A tibble: 7 × 5
  country   meancabins    sd     n    se
  <chr>          <dbl> <dbl> <int> <dbl>
1 Australia       33   10.8      3  6.24
2 China           55.8  7.50     9  2.50
3 Japan           50.2 14.0      9  4.67
4 Malaysia        48    8.49     2  6   
5 Taiwan          44.7  7.57     3  4.37
6 UK              37    3.83     4  1.91
7 USA             36.1  6.49    18  1.53
ggplot(data=wheels2,aes(x=country,y=meancabins))+geom_point()+geom_errorbar(aes(x=country,ymin=meancabins-se,ymax=meancabins+se))

3.) Using theme() and labs() add custom labels to your X and Y axis. Add a title. Change the size of the text on both axes. This may be beyond our tutorial in class, so ask Justin and/or use google or resources linked above. Theme() is extremely powerful and will always be useful for us!

ggplot(data=wheels2,aes(x=country,y=meancabins))+geom_point()+geom_errorbar(aes(x=country,ymin=meancabins-se,ymax=meancabins+se))+labs(y="Mean number of cabins of ferris wheels",x="Country",title="Mean number of cabins of ferris wheels in different countries")+theme(title =element_text(size=12, face='bold'))

When finished, render as html or pdf and confirm that your file looks the way it should. Then submit on Moodle (via the google form).