Plotting Timeseries

The document contains code for plotting timeseries data. The data being used here comes from sleep wake recordings performed in more than 1000 mice belonging to a single strain. The data consists of sleep percent for each mouse and was collected over a period of two years.

The data file has already been cleaned. The first step is to load the required packages and the file containing the data. The plots will be made using ggplot2.

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.1.3

#load file
KOMP2_con<-read.csv("KOMP2_con.csv")

The next step is to prepare the plots. X axis contains the dates and Y axis contains the values for daily sleep percent. The data points are differentiated by sex through color and shape. (Females=Red Circles; Males= Green Triangles)

p<-ggplot(KOMP2_con, aes(x = TestDate, y = Sleep.Daily.Percent, colour=Sex, shape=Sex))+geom_point()
p

The resulting graph is just the collection of data points. Though it tells something about the sex differences like the females sleep less than the males, other than that it is not very informative. We will go ahead and take the mean of sleep percentages for each date and see if it tells us more.

q<-p+ stat_summary(aes(y = Sleep.Daily.Percent,group=1), fun.y=mean, colour="blue", geom="line",group=1,size=1.2)
q

One of the aims of this exercise is to decide if the data is consistent over time. Let’s add an intercept for mean of the whole data and two more intercepts for +/- 1 SD (standard deviation).

#Add an intercept of the mean for whole data
r<-q+geom_hline(yintercept=mean(KOMP2_con$Sleep.Daily.Percent),size=1,color="grey") 

#Add intercepts for standard deviations above and below the mean
s<-r+geom_hline(yintercept=mean(KOMP2_con$Sleep.Daily.Percent)+sd(KOMP2_con$Sleep.Daily.Percent),size=1,color="grey")+geom_hline(yintercept=mean(KOMP2_con$Sleep.Daily.Percent)-sd(KOMP2_con$Sleep.Daily.Percent),size=1,color="grey")
s

Now we can see that except for a few data points towards the beginning and end, majority of the recodings are pretty consistent and fall between the intercepts for standard deviation.

We already know that there is difference in sleep percentages of males and females. Let’s add a trend line for sex to see how they differ.

t<-s+geom_smooth(method="lm",se=FALSE,aes(group=Sex))
t

There is a marked difference between the males and females, where males are sleeping more than population average and females less than population average. Though this plot is informative, i find it very cluttered. Let’s make it less cluttered by removing the data points. This can be done by removing the geom_points() from the code. I am a minimalist and so i will change the background to white as well.

Below is the complete rewritten code without the data points with white background.

library(ggplot2)

#load file
KOMP2_con<-read.csv("KOMP2_con.csv")
p<-ggplot(KOMP2_con, aes(x = TestDate, y = Sleep.Daily.Percent, colour=Sex, shape=Sex))
#geom_point() removed

q<-p+ stat_summary(aes(y = Sleep.Daily.Percent,group=1), fun.y=mean, colour="blue", geom="line",group=1,size=1.2)
#Add an intercept of the mean for whole data
r<-q+geom_hline(yintercept=mean(KOMP2_con$Sleep.Daily.Percent),size=1,color="grey") 

#Add intercepts for standard deviations above and below the mean
s<-r+geom_hline(yintercept=mean(KOMP2_con$Sleep.Daily.Percent)+sd(KOMP2_con$Sleep.Daily.Percent),size=1,color="grey")+geom_hline(yintercept=mean(KOMP2_con$Sleep.Daily.Percent)-sd(KOMP2_con$Sleep.Daily.Percent),size=1,color="grey")

#Adding trend lines for male and female sleep percent
t<-s+geom_smooth(method="gam",se=FALSE,aes(group=Sex))
t+theme(panel.background = element_rect(fill = 'white', colour = 'black'))

This is still a bit cluttered but less as compared to the earlier plot and provides all the information i am interested in.

Plotting Timeseries

Shreyas Joshi

Tuesday, March 22, 2016