When to use population cosinor analysis

Population cosinor analysis should be performed when data are collected as a function of time on 3 or more individuals (Corneilissen, 2014, “Cosinor-based rhythmometry”). This will be the case almost all of the time you are using the cosinor method for actigraphy data.

Data must be cleaned and imputed - only complete datasets can be analysed. The functions used in this analysis come primarily from the package ‘cosinor2’ (package ‘cosinor’ is a dependency).

The reference material for tests included is as follows:

Load/install the packages you need

ipak <- function(pkg){
        new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
        if (length(new.pkg)) 
                install.packages(new.pkg, dependencies = TRUE)
        sapply(pkg, require, character.only = TRUE)
}

packages <- c("cosinor2", "tidyverse", "data.table", "stringr", "NISTunits")
ipak(packages)

Load in your data and calculate number of rows

Create a new folder for each group and place all files you wish to process inside their respective folders. When running this first chunk of code for the Group 1, you will be prompted to select a file. Select any file from the folder directory. Before running each code chunk, you must set your working directory to the relevant group directory. A setup for testing between two groups is provided below:

Group 1

data <- read.csv(file.choose(), header = T) # load in a dummy dataset

# Set your working directory to the folders with the Group_1 data
group_1 = list.files(pattern="Clinstag")
# Rename the files with the group extension
sapply(group_1, FUN = function(eachPath) {
        file.rename(from = eachPath, to = sub(pattern = "Clinstag", replacement = "Group1_Clinstag", eachPath))
})

Group 2

# Set your working directory to the folders with the Eod group data
group_2 = list.files(pattern="Clinstag")
# Rename the files with the group extension
sapply(group_2, FUN = function(eachPath) {
        file.rename(from = eachPath, to = sub(pattern = "Clinstag", replacement = "Group2_Clinstag", eachPath))
})

After you have run the above scripts, put all (newly renamed) files in the one directory, and then reassign the groups:

group_1 = list.files(pattern="Group1")
group_2 = list.files(pattern="Group2")

Calculate minimum number of rows

Calculate the number of days in each file in each group, as all files will need to be trimmed to the same length in the analyis. Hence, the output of the function below determines the sampling period in your paper. If there are any files with less than 10 days worth of data included, a message will be printed asking you to check how many files have less than 10 days worth of coverage. Consider excluding these files if it means that your sampling period goes to a minimum of 13 days (for example). The median in the database containing all the actigaphy files is 13 days, and the mean is 12.22 days, making it likely that if you are willing to exclude a few you can nearly double your sampling period from 7 to 13 days.

# Takes values for groups 1 and 2. Should be lists of files
find_min_row <- function(group1, group2){ # Just put name of group
        
        nrows <- sapply(group1, function(x) nrow(read.csv(x))) ## Group 1
        
        nrows_2 <- sapply(group2, function(x) nrow(read.csv(x))) ## Group 2
        
        minrows <<- min(nrows, nrows_2)
        paste0(as.table(nrows, nrows_2))
        ifelse(minrows < 28799, paste0("There are some cases with only ", (minrows/2880), " days of coverage. Consider deleting short cases."), paste0(minrows/2880, " Each case has at least 10 days worth of data"))
}
find_min_row(group_1, group_2) # Usage

Extract data for period and cosinor analysis

Determining the period of the data for each group enables us to construct the cosinor model so our derived estimates have reasonable error bounds, lowering the likelihood of failing the zero-amplitude test which indicates adequate model fit.

Seven days of data should be the minimum amount of data used to predict the population period. As one day = 2,880 epochs (at 30 seconds), however this is extremely time consuming depending on your computational resources (usually >24hours of continuous computing is required to finish one small batch). 3 days = 8,640 epochs is selected by default.

Create data frames for the population cosinor analysis (cosinor_data), and for the periodogram (period_data).

# Takes arguments minrows, and values for groups 1 and 2
extract_data <- function(minrows, group_1, group_2){ 
        
library(tidyverse)
library(data.table)
library(stringr)
library(cosinor2)
library(NISTunits)

# extracts minrows of data.Activity
cosinor_data <- read.csv(file = paste0(group_1[1])) %>% as.data.table() # Group 1
cosinor_data <- cosinor_data[1:minrows, names(cosinor_data)[names(cosinor_data)%like%"data.Activity"], with = F]

cosinor_data_2 <- read.csv(file = paste0(group_2[1])) %>% as.data.table() # Group 2
cosinor_data_2 <- cosinor_data[1:minrows, names(cosinor_data_2)[names(cosinor_data_2)%like%"data.Activity"], with = F]

# Loop over the files and create 'cosinor_data' for each group
for(file in group_1[2:length(group_1)]){ # Group 1
  temp1 <- read.csv(file = paste0(file)) %>% as.data.table()
  temp11 <- temp1[1:minrows, names(temp1)[names(temp1)%like%"data.Activity"], with = F]
  cosinor_data <- cbind(cosinor_data, temp11)
}

for(file in group_2[2:length(group_2)]){ # Group 2
  temp1_2 <- read.csv(file = paste0(file)) %>% as.data.table()
  temp11_2 <- temp1_2[1:minrows, names(temp1_2)[names(temp1_2)%like%"data.Activity"], with = F]
  cosinor_data_2 <- cbind(cosinor_data_2, temp11_2)
}


# Group 1
colnames <- as.list(group_1) # generate object with filenames (in order)
colnames <- str_replace(colnames, pattern = ".csv", replacement = "") #  Remove ".csv" to clean naming 
names(cosinor_data) <- paste0(colnames) # Replace the existing dataframe names with list colnames

# Group 2
colnames_2 <- as.list(group_2) 
colnames_2 <- str_replace(colnames_2, pattern = ".csv", replacement = "")  
names(cosinor_data_2) <- paste0(colnames_2)


time_limit <- nrow(cosinor_data) # Create time vals and order to be at top for main dataset
time_lim_data <- data[1:minrows, ]
cosinor_data$Time <- time_lim_data$X # Group 1
cosinor_data_2$Time <- time_lim_data$X # Group 2


# Re-order Group 1
cosinor_data <<- cosinor_data %>%
  dplyr::select(Time, everything())

# Re-order Group 2
cosinor_data_2 <<- cosinor_data_2 %>%
  dplyr::select(Time, everything())


# Create time vals and order to be at top for period dataset - This extracts 3 days of data to determine period
period_time_lim <- data[1:8640, ] 
  
  # Group 1
  period_data <- cosinor_data[1:8640, ] 
  period_data$Time <- period_time_lim$X
  period_data <<- period_data %>%
    dplyr::select(Time, everything())
  
  # Group 2
  period_data_2 <- cosinor_data_2[1:8640, ] 
  period_data_2$Time <- period_time_lim$X
  period_data_2 <<- period_data_2 %>%
    dplyr::select(Time, everything())
  
}
extract_data(minrows, group_1, group_2)

Run the analysis to detect period

Adjust the following code to fit your data.

Timecol will always be ‘1’ in our data. In this example, if we have 5 included datasets, columns 2 and 6 contain first and last subjects. This will take well over an hour to run, perhaps several hours. The output (tau) will be given in number of 30-second epochs.

# Periodograms for each group
periodogram(timecol = 1, firstsubj = 2, lastsubj = 5, data = period_data) # Group 1
periodogram(timecol = 1, firstsubj = 2, lastsubj = 4, data = period_data_2) # Group 2

Run the population cosinor Analysis

To try and maximise the accuracy of our estimates, enter the period for each group in the code below (period = XXXX)

# Run the analysis for each group, making sure you input the period length from the ouput of periodogram.
fit_population_cosinor_Group1 <- population.cosinor.lm(firstsubj = 2, lastsubj = 5, # Group 1
                                                       timecol = 1, period = 2896, 
                                                       na.action = "na.exclude", data=cosinor_data)

fit_population_cosinor_Group2 <- population.cosinor.lm(firstsubj = 2, lastsubj = 4, # Group 2
                                                       timecol = 1, period = 2820, 
                                                       na.action = "na.exclude", data=cosinor_data_2)

Convert acrophase to decimal hours

Acrophase are expressed in radians. The ‘-’ indicates a clockwise direction.

Radians are first converted to degrees, and based on the period length of the group, decimal time in hours is returned.

This function requires 3 arguments. The first requires the acrophase statistic in the population cosinor model (which by default is set to Group_1’s acrophase), followed by tau (period), and the amount of digits to round to, which is default dp = 2.

library(NISTunits)
# Convert radians to decimal clock hours
radians_to_hours <- function(data = fit_population_cosinor_Group1$coefficients$Acrophase, tau, dp = 2) {
        degrees <- NISTradianTOdeg(fit_population_cosinor_Group1$coefficients$Acrophase)
        seconds <- tau*30
        minutes <- seconds/60
        hours <- minutes/60
        hours <- round(hours, dp)
        elapsed_hours <- hours/360
        clocktime <- elapsed_hours*degrees
        clocktime <- round(clocktime, dp)
        clocktime
}

radians_to_hours(data, 2896, 7) # Group 1
radians_to_hours(data = fit_population_cosinor_Group2, 2820, 7) # Group 2

Parameter tests

Zero-amplitude test

To assess the ‘explanatory power’ of our model, we must ensure that there is a detectable rhythm through the noise that has accompanied the data. The zero-amplitude test (also called the rhythm detection text) is applied to the activity-rhythm for this purpose. The value should be significant in order to interpret the model statistics.

The following explains the rationale for the zero-amplitude test (reproduced from Cornélissen, G. (2014)):

A cosine curve with a given period is fitted to the data (top) by least squares. This approach consists of minimizing the sum of squared deviations between the data and the fitted cosine curve. The larger this residual sum of squares is, the greater the uncertainty of the estimated parameters is. This is illustrated by the elliptical 95% confidence region for the amplitude-acrophase pair (bottom). When the error ellipse does not cover the pole, the zero-amplitude (no-rhythm) test is rejected and the alternative hypothesis holds that a rhythm with the given period is present in the data (left). Conservative 95% confidence limits for the amplitude and acrophase can then be obtained by drawing concentric circles and radii tangent to the error ellipse, respectively. When the error ellipse covers the pole, the null hypothesis of no-rhythm (zero amplitude) is accepted (right). Results (P-value from the zero-amplitude test, percentage rhythm or proportion of the overall variance accounted for by the fitted model, MESOR ± SE, amplitude and 95% confidence limits, acrophase and 95% confidence limits) are listed in each case.

# Zero-amplitude test for each group
cosinor.detect(fit_population_cosinor_Group1)
cosinor.detect(fit_population_cosinor_Group2)

Percent rhythm

To measure the relative strength of the rhythm, percent rhythm is calculated. Percent Rhythm is the coefficient of determination obtained by squaring the correlation between observed and estimated values.

# Percent rhythym for each group
cosinor.PR(fit_population_cosinor_Group1)
cosinor.PR(fit_population_cosinor_Group2)

Compare each group

The following runs tests that compare MESOR’s, amplitudes, and acrophases of two different populations, and creates an excel .csv in the working directory with the results.

# Test differences between groups
results <- cosinor.poptests(fit_population_cosinor_Group1, fit_population_cosinor_Group2)
cosinor.poptests(fit_population_cosinor_Group1, fit_population_cosinor_Group2)
write.csv(as.matrix(results), file = "Population_Cosinor_Results_.csv")

The previous method does not compare percent rhythm (PR) between groups. The following creates .csv files for each group, with percent rhythm values for all participants in those groups. After this code has been run you can run t-test/f-test in R or other software (such as SPSS).

Adjust the period statistic for each group to the population tau, known from previous periodogram outputs.

The .csv’s are labelled ‘Group 1 Percent Rhythm’ (i.e. by their group) and automatically save to the working directory.

For Group 1

# Comparing average Percent Rhythm for each group
# Group 1
fit_pr_Group1 <- function(data, period) {
        library(cosinor)
        library(cosinor2)
        cosinor <- cosinor.lm(data.Activity ~ time(X), data = data, period= period)
        per_rhythm <- cosinor.PR(cosinor)
        per_rhythm <- per_rhythm[, 2]
        write.table(per_rhythm, file = "Group 1 Percent Rhythm.csv", row.names = paste(file), col.names = FALSE, 
                    append = T, sep = ",")
}

for(file in group_1){ # Group 1
        data <- read.csv(file = paste0(file)) %>% as.data.table()
        pr_test <- fit_pr_Group1(data = data, period = 2880)
}

For Group 2

# Group 2
fit_pr_Group2 <- function(data, period) {
        library(cosinor)
        library(cosinor2)
        cosinor <- cosinor.lm(data.Activity ~ time(X), data = data, period= period)
        per_rhythm <- cosinor.PR(cosinor)
        per_rhythm <- per_rhythm[, 2]
        write.table(per_rhythm, file = "Group 2 Percent Rhythm.csv", row.names = paste(file), col.names = FALSE, 
                    append = T, sep = ",")
}
for(file in group_2){ # Group 2
        data <- read.csv(file = paste0(file)) %>% as.data.table()
        pr_test <- fit_pr_Group2(data = data, period = 2880)
}

If the zero-amplitude test is not significant…

If the zero-amplitude test is not significant, typically this indicates excessive noise in some of the data, from clipping and other artefacts during data collection. To isolate the case(s) which may be contributing significantly to excess noise, a script below can be run to loop over each file in each of the groups (i.e. in objects ‘group_1’ and ‘group_2’), and return a readout of: the case ID, whether the zero-amplitude test was ‘Passed!’ (i.e. p value <.05) or ‘Failed’ (p value >.049), the r value if r > 0.30, or whether the files need to be checked for noise (‘CHECK NOISE’) due to an r value < 0.30.

Run ‘fit_test’ to load the function into the R environment.

# If zero-amplitude test fails:
# Function to fit cosinor.lm and return a string based on passing of failing the zero_amplitude test and percent rhythm test.
fit_test <- function(data, period) {
  library(cosinor)
  library(cosinor2)
  cosinor <- cosinor.lm(data.Activity ~ time(X), data = data, period = period)
  per_rhythm <- cosinor.PR(cosinor)
  zero_amp <- cosinor.detect(cosinor)
  results_zero_amp <- ifelse(zero_amp[, 4] > 0.049, paste0(zero_amp[, 4], "  Failed"), paste0(zero_amp[, 4], "  Passed!"))
  results_per_rhythm <- ifelse(per_rhythm[, 1] < 0.30, 
                               paste0(per_rhythm[, 1], "   CHECK NOISE"), paste0("  r = ", round(per_rhythm[, 1], 2)))
  as.matrix(paste0(results_zero_amp, results_per_rhythm))
}

As we want to know what cases are contributing to the population cosinor noise in each group, we use the same period output from the previous periodogram before running the for loop. This is defined in the object ‘noisy_tau’. The results of the loop are printed in the console.

For Group 1:

noisy_tau_Group1 <- 2880 # Assign period from population periodogram

for(file in group_1[2:length(group_1)]){ # Group 1
        temp1 <- read.csv(file = paste0(file)) %>% as.data.table()
        results <- fit_test(data = temp1, period = noisy_tau_Group1)
        print(as.matrix(paste0(file, results)))
}

For Group 2:

noisy_tau_Group2 <- 2880 # Assign period from population periodogram

for(file in group_2[2:length(group_1)]){ # Group 1
        temp1 <- read.csv(file = paste0(file)) %>% as.data.table()
        results <- fit_test(data = temp1, period = noisy_tau_Group2)
        print(as.matrix(paste0(file, results)))
}