3. Descriptive statistics
R is very intuitive for certain basic functions. For example, you can obtain mean and standard deviation using basic functions
mean(mydata$sqw2, na.rm = TRUE)
## [1] 1.78919
mean(mydata$sqw1, na.rm = TRUE)
## [1] 1.680725
sd(mydata$sqw2, na.rm = TRUE)
## [1] 0.5633153
sd(mydata$sqw2, na.rm = TRUE)
## [1] 0.5633153
For other basic functions, you can consult the R reference card. On page 2, under the “Math” heading, you will some descriptive statistics of interest.
But then, since this is a cluster randomised controlled trial, you might want to know the means and standard deviations by class.
Task 1: Statistics by group
Create a summary dataset containing the mean and standard deviation of the outcome, grouped by class.
For this, you’ll need to load the dplyr package:
library(dplyr)
summary_data <- mydata %>%
group_by(class) %>%
summarise(mean_sqw2=mean(sqw2),
stdev_sqw2=sd(sqw2))
You can browse your newly created dataset by clicking on it in the environment tab, or you can type summary_data to print it on the console.
summary_data
## # A tibble: 22 x 3
## class mean_sqw2 stdev_sqw2
## <int> <dbl> <dbl>
## 1 1 1.54 0.685
## 2 2 2.05 0.547
## 3 3 1.79 0.542
## 4 4 1.90 0.470
## 5 5 1.78 0.583
## 6 6 2.07 0.389
## 7 7 1.68 0.602
## 8 8 1.80 0.546
## 9 9 1.53 0.572
## 10 10 1.87 0.537
## # ... with 12 more rows
If you want to export this dataset and use it in another software package, like Excel, you can use the write family of functions.
write.csv(summary_data, "summary_data.csv")
That will save a csv file (which you can open in Excel) in your working directory.
Task 2: Relationship between two variables
You can obtain the Pearson correlation by typing:
cor(mydata$sqw2, mydata$sqw1, use="comp") # "comp" stands for "complete observations"
## [1] 0.7648664
A basic bivariate plot (using ggplot) will include the name of the dataset, a y and x variable and the actual figure/symbol used to represent the values. It would have the following form:
First, we need to load the ggplot2 package
library(ggplot2)
plot1<-ggplot(mydata,
aes(y=sqw2, x=sqw1)) +
geom_point()
plot1

We save plots to an object to be able to reproduce them more quickly. It is good practice, although not essential.
You can modify plot1 by including a “colour” subcommand inside the “geom_point()” command. This would be for example: geom_point(colour=“red”)
For more tips and example code, you can visit the following link: http://www.cookbook-r.com/Graphs/
ggplot2 works with “layers”. The basic plot above can be enhanced by adding a different geom (layer) that gives further instructions to ggplot. Below we’ll see some examples:
Task 3: Fitted line
Add a single fitted line to plot1
plot2<- ggplot(mydata, aes(y=sqw2, x=sqw1)) +
geom_point() +
geom_smooth(method="lm", aes(y=sqw2, x=sqw1), se=T)
plot2

Task 4: Fitted line per class
Add fitted lines per class. TO do this, you don’t need to add a different geom, but insert a group statement inside the aes subcommand.
plot3<- ggplot(mydata, aes(y=sqw2, x=sqw1, group=factor(class))) +
geom_point() +
geom_smooth(method="lm", aes(colour=factor(class)), se=F)
plot3

Bonus track
It’s easy to produce more elaborated graphics with ggplot2. Look at the example below. Try to identify the different layers that were added to plot3.
plot4<- ggplot(mydata, aes(y=sqw2, x=sqw1, by=factor(class))) +
geom_point(aes(colour=factor(class))) +
geom_smooth(method="lm", aes(y=sqw2, x=sqw1), se=F) +
facet_wrap(~class) + theme(legend.position = "none") +
ylab("After") + xlab("Before") +
theme_minimal() +
ggtitle("Relationship between exercise before and after treatment by class")
plot4

