Question 1 (2pts)

Load data in below

REMEMBER TO PUT THE DATA IN THE SAME FOLDER WITH YOUR Rmd FILE JUST LIKE CLASS LAST WEEK

titanic <- read.csv("titanic.csv")

If you can see the data in your environment tab, you have completed question 1! If you still have difficulty in reading data in this way, you can pull a data set into R by just doing File > Import Dataset > From Text (base) as Ruth mentioned in class.

Question 2 (6pts)

Create a histogram of the ages from the titanic dataset. What kind of distribution does it resemble? Does it have a skew? (Use the hist function in R)

Answer question 2 in the space below

hist(titanic$Age, main = "number of passengers vs. age", xlab = "number of passengers", ylab = "age")

This histogram is skewed to the right.

Question 3 (6pts)

Create a scatter plot that shows the relationship between age and fare, if I am interested in answering the question “Did older people buy more expensive tickets on the titanic?” which variable should be on the x-axis and which one should be on the y-axis? Make sure to get the axes right after answering this question. Finally what is your answer to this question? (Use the plot function in R)

Answer question 3 is the space below

plot(titanic$Age, main = "Age and Ticket Fare", xlab = "Fare", ylab = "Age")

There seems to be no correlation between fares paid and age.

Question 4 (6pts)

Below I have written some code for you all you need to do is install the package dplyr, look for the package tab in the bottom right. The output should be a bar chart that shows you the survival rate by class, what do you notice?

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
titanicClass <- titanic %>% 
  group_by(class = titanic$Pclass)%>% 
  summarise(frequency = n())

titanicSurvival <- titanic %>% 
                   select(Survived, Pclass) %>% 
                   filter(Survived == 1) %>% 
                   group_by(Pclass) %>% 
                   summarise(survive = n())
    
perc_surv <- titanicSurvival$survive/titanicClass$frequency

barplot(perc_surv,names.arg = c("First Class","Second Class","Third Class"))

Answer question 4 is the space below

I notice that as the classes go down, the survival rate also goes down.

Now that you are done please knit the file, try to knit to html. Submit this html file to the canvas assignment.