For these exercises, you will explore the titanic data from kaggle.com, which was downloaded from here: https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/problem12.html. You will need to download the data and load into R. As this is a comma separated file, you will need to explore the read.csv() function.

Load ggplot2

library(ggplot2)

Exercise 1 Questions

Question 1.1

Load titanic.csv and save to an object named titanic.

Question 1.2

Explore the data. What is the structure of the data? Try str(). What are the column names? Try colnames(). How can you get help if you do not know how to use these functions?

Answers to question 1.2:

Question 1.3

Make a simple scatter plot. Is there a relationship between the age of the passenger and the passenger fare?

Answer to question 1.3:

Question 1.4

Color the points from question 3 by Pclass. Remember that Pclass is a proxy for socioeconomic status. While the values are treated as numeric upon loading, they are really categorical and should be treated as such. You will need to coerce Pclass into a categorical (factor) variable. See factor() and as.factor().

Question 1.5

Manually scale the colors in question 4. 1st class = yellow, 2nd class = purple, 3rd class = seagreen. Also change the legend labels (1 = 1st Class, 2 = 2nd Class, 3 = 3rd Class).

Question 1.6

Facet the plot made in 5 by the column ‘Sex’.

Question 1.7

Let’s use some other geoms. Plot the number of passengers (a simple count) that survived by ticket class and facet by sex.

Question 1.8

Add a variable to the data frame called age_cat (child = <12, adolescent = 12-17,adult= 18+). Plot the number of passengers (a simple count) that survived by age_cat, fill by Sex, and facet by class and survival.

Exercise 2 Questions

Let’s use the dataset mtcars. According to the help documentation (?mtcars), “the data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).” Each question below will depend on code from the previous question.

Question 2.1

Let’s check out the structure of the data.

Question 2.2

How might we plot automobile weight (wt) versus miles per gallon (mpg)?

Question 2.3

What if we want to represent the number of cylinders (cyl) by color and shape?

Question 2.4

Make the size of the points change by the quarter mile time (qsec).

Question 2.5

Create subplots by transmission (am).

Question 2.6

Model the trend using geom_smooth(). What is the default method used by geom_smooth()?

Answer to question 2.6: