In BS1040, BS1070 and MB1080, I try to give a well rounded introduction to Statistics and R. But there is always more to learn. In this practical, you have to learn some new R. Some of this is trivial (setting arbitrary axes limits etc.) but some is important (what’s a paired t-test).
So first let’s set everything up and get the data into R.
library(ggplot2)
library(dplyr) # you need this for filtering
blood <- read.csv("~/Dropbox/Teaching/volko/Blood Pressure Data-Long.csv", stringsAsFactors = TRUE) # you need to change the bit in quotations to your data. You learned how to load data here https://rpubs.com/eamonnmallon/644715
There are three new things here:
The first new thing below is separating the data by Position on the fly. We do this using pipes we briefly learned about in BS1040.
R and especially the tidyverse (including ggplot) is very opinionated about how things should be done (its defaults). But it is also almost infinitely customisable. This practical requires a customised histogram that starts at 40 on the x axis scale and wants the first bin to go from 40 to 45. By default, R centres the first bin on 40 to include data between 37.5 and 42.5. I’ve commented the code below that shows the extra commands required to do this.The instruction page shows you all the things you can change witha histogram.
Rather than making two separate histograms by repeating code we can just use the command facet_wrap to produce a histogram for each typle of blood pressure in the same image.
filter(blood, Position == "Sitting") %>% #only using the sitting data, then "piping" it into ggplot. You learned about filtering here https://rpubs.com/eamonnmallon/527169
ggplot( aes(x=BP, fill=Type)) +
geom_histogram(binwidth = 5, center = 2.5, closed = "left") + #using boundary has the same effect, closed determines treatment of boundary data
scale_x_continuous (limits = c(40, 180)) + #making the x axes the same in both histograms
labs(
x = "Blood Pressure (sitting, mm Hg)",
y = "Count (n)"
)+
xlim(40,180)+
ylim(0,50)+
facet_wrap(~Type) # this creates two histograms, one for each type of blood pressure reading
ggsave("~/Dropbox/Teaching/volko/histo.pdf") # This will save the histograms as a pdf. Change this to save it where you want
The new thing below is separating the data by Position and Type on the fly again using pipes and the new group_by command.
blood %>%
group_by(Position, Type) %>%
summarise(median_BP = median(BP), mean_BP = mean(BP), standard_deviation_BP = sd(BP))
In BS1040 session 2 you learned how to make a boxplot to show two variables. Here you have to add something extra to include 3 variables (BP, Type and Position). We do that by adding a color aes.
ggplot(data = blood, aes(x=Type, y=BP)) + geom_boxplot(aes(color=Position),coef =1000) + theme_bw() +
# coef = 1000 is so that the whiskers do the maximum and minimum values. This is non standard, normally it defaults to 1.5 times the inter-quartile range.
scale_y_continuous (limits = c(40, 180)) + #limits requested
labs(title="Blood pressure boxplot",x="Blood pressure type", y="Blood pressure (mm Hg)")
ggsave("~/Dropbox/Teaching/volko/blood.pdf") # This will save the boxplot as a pdf. Change this to save it where you want.
We can’t use the t-test we learned in BS1040. Its because the data is repeated. Its the same group of people having their blood pressured measured standing and sitting. In the language we learned in the ANOVA lecture, the data is not independent. That’s why we use the paired t-test. Rather than comparing the data randomly, it compares sitting and standing data for each person and considers this when calculating the t-value. Can you see the small change in the code to make it paired?
# Separating the data into diastolic and systolic. Note I didn't use a pipe
# here as above. I could have but its not the only way to do it.
dia <- filter(blood, Type == "Diastolic")
sys <- filter(blood, Type == "Systolic")
# Diastolic t-test
t.test(dia$BP[dia$Position == "Sitting"], dia$BP[dia$Position == "Standing"], paired = TRUE)
# Systolic t-test
t.test(sys$BP[dia$Position == "Sitting"], sys$BP[dia$Position == "Standing"], paired = TRUE)