There is always more R to learn.

In BS1040, BS1070 and MB1080, I try to give a well rounded introduction to Statistics and R. But there is always more to learn. In this practical, you have to learn some new R. Some of this is trivial (setting arbitrary axes limits etc.) but some is important (what’s a paired t-test).

So first let’s set everything up and get the data into R.

library(ggplot2)
library(dplyr)  # you need this for filtering

blood <- read.csv("~/Dropbox/Teaching/volko/Blood Pressure Data-Long.csv", stringsAsFactors = TRUE)  # you need to change the bit in quotations to your data. You learned how to load data here https://rpubs.com/eamonnmallon/644715

Question 2: Histograms

There are three new things here:

filter(blood, Position == "Sitting") %>% #only using the sitting data, then "piping" it into ggplot. You learned about filtering here https://rpubs.com/eamonnmallon/527169
  ggplot( aes(x=BP, fill=Type)) +
  geom_histogram(binwidth = 5, center = 2.5, closed = "left") +   #using boundary has the same effect, closed determines treatment of boundary data
  scale_x_continuous (limits = c(40, 180)) + #making the x axes the same in both histograms
  labs(
    x = "Blood Pressure (sitting, mm Hg)",
    y = "Count (n)"
  )+
 xlim(40,180)+
  ylim(0,50)+
  facet_wrap(~Type) # this creates two histograms, one for each type of blood pressure reading

ggsave("~/Dropbox/Teaching/volko/histo.pdf") # This will save the histograms as a pdf. Change this to save it where you want

Question 3: Descriptive statistics

The new thing below is separating the data by Position and Type on the fly again using pipes and the new group_by command.

blood %>%
    group_by(Position, Type) %>%
    summarise(median_BP = median(BP), mean_BP = mean(BP), standard_deviation_BP = sd(BP))

Question 4: Boxplot

In BS1040 session 2 you learned how to make a boxplot to show two variables. Here you have to add something extra to include 3 variables (BP, Type and Position). We do that by adding a color aes.

ggplot(data = blood, aes(x=Type, y=BP)) + geom_boxplot(aes(color=Position),coef =1000) + theme_bw() +
# coef = 1000 is so that the whiskers do the maximum and minimum values. This is non standard, normally it defaults to 1.5 times the inter-quartile range.
  scale_y_continuous (limits = c(40, 180)) + #limits requested
  labs(title="Blood pressure boxplot",x="Blood pressure type", y="Blood pressure (mm Hg)")

ggsave("~/Dropbox/Teaching/volko/blood.pdf") # This will save the boxplot as a pdf. Change this to save it where you want.

Question 7: Paired t-tests

We can’t use the t-test we learned in BS1040. Its because the data is repeated. Its the same group of people having their blood pressured measured standing and sitting. In the language we learned in the ANOVA lecture, the data is not independent. That’s why we use the paired t-test. Rather than comparing the data randomly, it compares sitting and standing data for each person and considers this when calculating the t-value. Can you see the small change in the code to make it paired?

# Separating the data into diastolic and systolic. Note I didn't use a pipe
# here as above. I could have but its not the only way to do it.
dia <- filter(blood, Type == "Diastolic")
sys <- filter(blood, Type == "Systolic")
# Diastolic t-test
t.test(dia$BP[dia$Position == "Sitting"], dia$BP[dia$Position == "Standing"], paired = TRUE)
# Systolic t-test
t.test(sys$BP[dia$Position == "Sitting"], sys$BP[dia$Position == "Standing"], paired = TRUE)