Describe why you picked that particular variable at the beginning of your report. I chose Ship Id because I thought it would be easy and because it had to numerical not a characteristic.

There are other ways we can display a variable that are different because they do not focus on individual observations. For example we could use a histogram. In that case we need a new plot object that only has an x variable.

Previously we had mad a histogram that included lines showing the 25th, 50th and 75th percentile, also known as the first, second and third quartiles.

Let’s do this in a slightly different way by making objects for each of our vertical lines.

You have to do the other two, follow the same naming pattern.

Notice that we’re adding to the name so we know that these are vertical.

quartile1v <- 
  geom_vline(xintercept=quantile(ships$`Ship Id`, probs=c(.25),
                                 na.rm=TRUE),  color="red", linetype="dashed", size=2) 

quartile2v <- 
  geom_vline(xintercept=quantile(ships$`Ship Id`, probs=c(.50),
                                 na.rm=TRUE),  color="blue", linetype="dashed", size=2) 

quartile3v <- 
  geom_vline(xintercept=quantile(ships$`Ship Id`, probs=c(.75),
                                 na.rm=TRUE),  color="pink", linetype="dashed", size=2) 

Now let’s make a our histogram again using these objects

plot1<- ggplot(ships, aes(x=`Ship Id`)) 
plot1 + geom_histogram(binwidth = 100 ) + quartile1v + quartile2v + quartile3v +
ggtitle("Figure 1: Number of passengers on ships that sunk")

For comparison sake let’s also get the summary of the distribution of No. of passengers and list the actual values (since there are only 18 it’s not too many).

ships$`Ship Id`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
summary(ships$`Ship Id`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    5.25    9.50    9.50   13.75   18.00

Make sure you can see how the histogram relates to the summary and raw data.

Box plots

Let’s create horizontal lines for the 3 quartiles. (You have to do the other 2)

quartile1h <- 
  geom_hline(yintercept=quantile(ships$`Ship Id`, probs=c(.25),
                                       na.rm=TRUE),  color="yellow", linetype="dashed", size=1) 
        
        quartile2h <- 
          geom_hline(yintercept=quantile(ships$`Ship Id`, probs=c(.50),
                                         na.rm=TRUE),  color="orange", linetype="dashed", size=1) 
        
        quartile3h <- 
          geom_hline(yintercept=quantile(ships$`Ship Id`, probs=c(.75),
                                         na.rm=TRUE),  color="brown", linetype="dashed", size=1) 

What changes did you have to make to get the horizontal lines?

Now make the box plot but include the 4 horizontal lines.

        plot2<- ggplot(ships, aes(x = factor(0), y= `Ship Id`))
        
        plot2+  geom_boxplot()+ quartile1h+ quartile2h+
          quartile3h+
          ggtitle("Ship Id")

How do the lines relate to the box plot?

The lines shows us the Q1 mean Q3.

How do the box plots with the lines relate to the histogram with the lins

    The box plots with lines relate to the histogram with the lines because they give us the same value but in a different display. 
    

Now let’s make boxplots with lines and titles for different x variables:

    Survived, Quick,  Cause, `Women and children first
    
        plot4<-ggplot(ships, aes(x=factor(Survived) , y=`Ship Id` ))
        
        plot4+  geom_boxplot()+ quartile1h+ quartile2h+
        quartile3h+
        ggtitle("Survived")

        plot5<-ggplot(ships, aes(x=factor(Quick) , y=`Ship Id` ))
        plot5+  geom_boxplot()+ quartile1h+ quartile2h+
        quartile3h+
        ggtitle("Quick")

        plot6<-ggplot(ships, aes(x=factor(Cause) , y=`Ship Id` ))
        plot6+  geom_boxplot()+ quartile1h+ quartile2h+
        quartile3h+
        ggtitle("Cause")

        plot7<-ggplot(ships, aes(x=factor(`Women and children first`) , y=`Ship Id` ))
        plot7+  geom_boxplot()+ quartile1h+ quartile2h+
        quartile3h+
        ggtitle("`Women and children first`")

        # Display plot3 using geom_boxplot().

Write a few sentences about how each of the nominal variables relates to Number of Passengers

    Each nominal variables relates to the number of passengers because it affects the the mean and range in extreme values.