How does sleep and physical activity affect the grades of college students in their courses for a single semester?
We will use mean minutes sleeping per day in a semester and mean daily sleep efficiency in a semester to evaluate how sleep affects college student GPAs.
We will use mean daily active minutes and mean daily steps to evaluate how activity affects college GPAs.
causal_diagram_1 <- dagitty('dag{
SemesterGPA <- Activity <- SemesterSeason;
SemesterGPA <- Sleep <- SemesterSeason;
SemesterGPA <- Semester
}')
gg_dag(causal_diagram_1)
In our project, we think sleep and activity are both mediator in relationship of semester season and semester GPA. Because sleep and activity is the descendant of the exposure semester season and an ancestor of the outcome GPA. Therefore, we think in different seasons, there will be a difference in people’s activity level and their sleep. Also, being in certain semester will affect one’s GPA as well, because we believe that the demands and workload of different semesters can vary, which can impact their GPA. And that is the main reason we will try to assign random intercept for each semester by using hierarchical model.
The priors for each of our beta coefficients are all Normal(0, 0.15). We chose a small standard deviation because our response variable is very small; GPAs can only be between 0 and 4. Because of this, our spread has to be very small. We chose to have the effects to be centered at zero because we standardized our variables and because we were not sure how large of an effect each of the variables will have. For example, if individuals are too active, to a point where they do not study as much as they should, this could negatively impact GPA. However, research also shows that those that are more active tend to have higher GPAs. Same for sleep variables. Students need to strike a healthy balance and get enough sleep, because too much sleep might not be helpful.
\[\text{semesterGPA} \sim \text{Normal}(\mu_i, \sigma)\]
\[z0 \sim \text{Normal}(0, 1) \]
\[ \mu_i \sim \text{mu_semester} + z0\text{[semester[i]]} * \text{sigma_semester} + \beta_1 * \text{stdmeanSteps[i]} + \beta_2 * \text{stdmeanActivity[i]} + \beta_3 * \text{stdmeanSleepEfficiency} + \beta_4 * \text{stdmeanSleep} + \beta_{semesterSeason[i]}\]
\[\beta_1 \sim Normal(mean = 0, sd = 0.15) \]
\[\beta_2 \sim Normal(mean = 0, sd = 0.15) \]
\[\beta_3 \sim Normal(mean = 0, sd = 0.15) \]
\[\beta_4 \sim Normal(mean = 0, sd = 0.15) \] \[\beta_{semesterSeason[i]=j} \sim Normal(mean = 0, sd = 0.15)\ for \ j \ = 1 \ and \ 2 \]
\[\sigma \sim Lognormal(mean = 0.5, sd= 1) \]
\[\text{mu_semester} \sim Normal(mean = 2, sd = 0.5) \]
\[\text{sigma_semester} \sim Lognormal(mean = 0.5, sd = 1) \]
We used name b5 for the prior of
semesterSeason because it is neater.
final <- read.csv('final.csv')
set.seed(15)
final_gpas <- c()
for (x in 1:50) {
b0 <- rnorm(1, 0, 0.15)
b1 <- rnorm(1, 0, 0.15)
b2 <- rnorm(1, 0, 0.15)
b3 <- rnorm(1, 0, 0.15)
b4 <- rnorm(1, 0, 0.15)
b5 <- rnorm(2,0,.15)
sigma <- rlnorm(1, mean = 0, sd = .25)
mu_semester <- rnorm(1, mean = 2.5, sd =0.5)
sigma_semester <- rlnorm(1, mean = 0, sd = 1)
z0 <- rnorm(7, mean = 0, sd = 1)
prior_sim <- tibble(
stdmeanSteps = final$stdmeanSteps,
stdmeanActivity = final$stdmeanActivity,
stdmeanEff = final$stdmeanEff,
stdmeanSleep = final$stdmeanSleep,
semester = final$semester,
semesterSeason = final$semesterSeason,
Sim.final = rnorm(nrow(final),
mean = mu_semester + (z0[1]*ifelse(semester == 1, 1, 0) + z0[2]*ifelse(semester == 2, 1, 0) +
z0[3]*ifelse(semester == 3, 1, 0) + z0[4]*ifelse(semester == 4, 1, 0) +
z0[5]*ifelse(semester == 5, 1, 0) + z0[6]*ifelse(semester == 6, 1, 0) +
z0[7]*ifelse(semester == 7, 1, 0)) * sigma_semester + b1 * stdmeanSteps +
b2 * stdmeanActivity + b3 * stdmeanEff + b4 * stdmeanSleep + b5[1]*ifelse(semesterSeason == 1, 1, 0) + b5[2]*ifelse(semesterSeason == 2, 1, 0), sd = sigma))
final_gpas <- append(final_gpas, prior_sim$Sim.final)
}
mean(final_gpas)
## [1] 2.802452
sd(final_gpas)
## [1] 2.812662
gf_dens(~final_gpas)
This indicates that our priors are not fantastic, but they are not terrible either. The majority of our data is between 0 and 4, which is what we are looking for. However, our priors do generate values that are impossible in our situation. For example, there are values that are less than 0 and greater than 4. This is because we are using a normal model to model the situation. However, when combined with the likelihood, the values of the posterior should be reasonable values.
gpa_data <- final
semester_data <- gpa_data |>
select(semester) |>
group_by(semester) |>
summarise() |>
ungroup()
stangpadata <- compose_data(semester_data = semester_data,
gpa_data)
gpa_stan_hierarchical <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
array[n] int semesterSeason;
array[n] int<lower=1> semester;
// Building level data
int<lower=1> n_semester_data;
//int<lower=1, upper=2> semesterSeason[n]; // 1: Fall, 2: Spring
//int<lower=1> semester[n];
}
parameters {
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
vector[2] b5;
real mu_semester; // mean of semester responses
vector[n_semester_data] z0; // building specific intercepts
real<lower=0> sigma; // residual standard deviation
real<lower=0> sigma_semester; // standard deviation between buildings
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = mu_semester + z0[semester[i]] * sigma_semester + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i] + b5[semesterSeason[i]];
}
// priors
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
b5 ~ normal(0, 0.15);
sigma ~ lognormal(0, 0.25); // for residual standard deviation
mu_semester ~ normal(2.5, 0.5); // for mean across semesters
sigma_semester ~ lognormal(0, 1); // for sd between semesters
// defining the likelihood
z0 ~ normal(0, 1);
semesterGPA ~ normal(mu, sigma);
}
generated quantities {
vector[n_semester_data] a0;
a0 = mu_semester + z0 * sigma_semester;
}
'
stan_gpa <- 'stan_gpa.RDS'
if (file.exists(stan_gpa)){
gpa_stan_model_hierarchical <- readRDS(stan_gpa)
}else{
gpa_stan_model_hierarchical <- stan(model_code = gpa_stan_hierarchical,
chains = 4, iter = 2000,
data = stangpadata)
saveRDS(gpa_stan_model_hierarchical, file = stan_gpa)
}
gpa_stan_hierarchical_Lik <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
array[n] int semesterSeason;
array[n] int<lower=1> semester;
// Building level data
int<lower=1> n_semester_data;
//int<lower=1, upper=2> semesterSeason[n]; // 1: Fall, 2: Spring
//int<lower=1> semester[n];
}
parameters {
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
vector[2] b5;
real mu_semester; // mean of semester responses
vector[n_semester_data] z0; // building specific intercepts
real<lower=0> sigma; // residual standard deviation
real<lower=0> sigma_semester; // standard deviation between buildings
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = mu_semester + z0[semester[i]] * sigma_semester + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i] + b5[semesterSeason[i]];
}
// priors
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
b5 ~ normal(0, 0.15);
sigma ~ lognormal(0, 0.25); // for residual standard deviation
mu_semester ~ normal(2.5, 0.5); // for mean across semesters
sigma_semester ~ lognormal(0,1); // for sd between semesters
// defining the likelihood
z0 ~ normal(0, 1);
semesterGPA ~ normal(mu, sigma);
}
generated quantities {
vector[n] log_lik;
for (i in 1:n) {
log_lik[i] = normal_lpdf(semesterGPA[i] | mu_semester + z0[semester[i]] * sigma_semester + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i] + b5[semesterSeason[i]], // fill in ... to match your model
sigma);
vector[n_semester_data] a0;
a0 = mu_semester + z0 * sigma_semester;
}
}
'
stan_gpa_Lik <- 'stan_gpa_Lik.RDS'
# the fitted model object in this R session will be called stan_fit
# change these names to something more specific and informative please!
if (file.exists(stan_gpa_Lik)){
gpa_stan_model_hierarchical_Lik <- readRDS(stan_gpa_Lik)
}else{ # if the file doesn't exist then run stan and save results
gpa_stan_model_hierarchical_Lik <- stan(model_code = gpa_stan_hierarchical_Lik,
chains = 4, iter = 2000,
data = stangpadata) # your code to fit your stan model goes here
saveRDS(gpa_stan_model_hierarchical_Lik, file = stan_gpa_Lik)
}
gpa_stan_model_hierarchical
## Inference for Stan model: rt_cmdstanr_aa499b92a5d6a61c631cd11a72bc7a54-202304261654-1-6bfe70.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff
## b1 0.02 0.00 0.01 -0.01 0.01 0.02 0.03 0.05 2959
## b2 -0.02 0.00 0.02 -0.05 -0.03 -0.02 -0.01 0.01 3067
## b3 0.06 0.00 0.01 0.03 0.05 0.06 0.07 0.08 4496
## b4 0.08 0.00 0.01 0.06 0.07 0.08 0.09 0.10 4472
## b5[1] 0.03 0.00 0.11 -0.19 -0.04 0.03 0.11 0.25 2216
## b5[2] 0.06 0.00 0.12 -0.16 -0.01 0.06 0.14 0.28 2330
## mu_semester 3.59 0.00 0.11 3.37 3.51 3.59 3.67 3.82 2152
## z0[1] 0.22 0.01 0.55 -0.86 -0.15 0.23 0.60 1.29 2255
## z0[2] -1.00 0.02 0.67 -2.38 -1.45 -0.98 -0.53 0.23 1550
## z0[3] 0.02 0.01 0.55 -1.07 -0.33 0.03 0.40 1.06 2232
## z0[4] 0.91 0.01 0.60 -0.22 0.50 0.89 1.30 2.18 2079
## z0[5] -0.04 0.01 0.63 -1.28 -0.46 -0.04 0.39 1.17 2131
## z0[6] -0.12 0.01 0.61 -1.34 -0.52 -0.10 0.29 1.04 2149
## z0[7] 0.41 0.01 0.62 -0.79 0.01 0.39 0.81 1.67 2222
## sigma 0.34 0.00 0.01 0.33 0.34 0.34 0.35 0.35 4471
## sigma_semester 0.11 0.00 0.06 0.04 0.07 0.09 0.13 0.26 1046
## a0[1] 3.61 0.00 0.12 3.40 3.53 3.61 3.69 3.84 2300
## a0[2] 3.50 0.00 0.11 3.28 3.43 3.50 3.58 3.73 2236
## a0[3] 3.60 0.00 0.11 3.37 3.52 3.60 3.67 3.82 2248
## a0[4] 3.68 0.00 0.11 3.46 3.60 3.67 3.75 3.91 2267
## a0[5] 3.59 0.00 0.12 3.36 3.51 3.59 3.67 3.82 2321
## a0[6] 3.59 0.00 0.12 3.36 3.51 3.58 3.66 3.81 2330
## a0[7] 3.63 0.00 0.12 3.41 3.56 3.63 3.71 3.87 2355
## lp__ 652.11 0.12 3.46 644.52 650.01 652.40 654.61 657.87 903
## Rhat
## b1 1
## b2 1
## b3 1
## b4 1
## b5[1] 1
## b5[2] 1
## mu_semester 1
## z0[1] 1
## z0[2] 1
## z0[3] 1
## z0[4] 1
## z0[5] 1
## z0[6] 1
## z0[7] 1
## sigma 1
## sigma_semester 1
## a0[1] 1
## a0[2] 1
## a0[3] 1
## a0[4] 1
## a0[5] 1
## a0[6] 1
## a0[7] 1
## lp__ 1
##
## Samples were drawn using NUTS(diag_e) at Wed Apr 26 16:55:57 2023.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
bayesplot::mcmc_trace(gpa_stan_model_hierarchical)
bayesplot::mcmc_rank_overlay(gpa_stan_model_hierarchical)
base_psamp <- as.data.frame(gpa_stan_model_hierarchical) |>
select(-lp__)
gf_dens(~base_psamp$b1, data = base_psamp, color = 'blue') |>
gf_dens(~base_psamp$b2, data = base_psamp, color = 'red') |>
gf_dens(~base_psamp$b3, data = base_psamp, color = 'black') |>
gf_dens(~base_psamp$b4, data = base_psamp, color = 'green') |>
gf_labs(title = "Quantitative Posteriors",
x = "Values of Beta",
subtitle= "Blue: Mean Steps, Red: Mean Activity, Black: Mean Sleep Efficiency, Green: Mean Sleep")
gf_dens(~base_psamp$`b5[1]`, data = base_psamp, color = 'purple') |>
gf_dens(~base_psamp$`b5[2]`, data = base_psamp, color = 'orange') |>
gf_labs(title = "Categorical Posteriors",
x = "Values of b5",
subtitle = "Purple: Fall, Orange: Spring")
The mean effect of standardized mean daily steps is 0.02 with a standard deviation of 0.01. However, the posterior does have plausible values of zero and negative values, so it is possible that standardized mean daily steps is zero or less than zero, indicating that there may not be an effect on semester grades.
The mean effect of standardized mean daily activity is -0.02 with a standard deviation of 0.02. The posterior does have plausible values of zero and positive values, so it is possible that standardized mean daily activity does not have an effect on semester grades.
The mean effect of standardized mean daily sleep efficiency is 0.06 with a standard deviation of 0.01. The entirety of the posterior falls above zero indicating that standardized mean daily sleep efficiency does have an effect on semester grades.
The mean effect of standardized mean daily minutes sleeping is 0.08 with a standard deviation of 0.01. The entirety of the posterior falls above zero indicating that standardized mean daily minutes sleeping does have an effect on semester grades.
The mean effect of the semester being fall is 0.03 with a standard deviation of 0.11. The posterior does have plausible values of zero and negative values, so it is possible that the semester being fall has no effect on semester grades.
The mean effect of the semester being spring is 0.06 with a standard deviation of 0.11. The posterior does have plausible values of zero and negative values, so it is possible that the semester being spring has no effect on semester grades.
The results are quite interesting. First, we did not get the results that we expected for the activity metrics. We expected the effect of activity to be significant and also non-negative. We were surprised that mean daily activity produced more plausible negative values than positive ones. We were curious if this is due to students who are more active, such a student athletes, having less time in the day to focus on their studies.
We were not as surprised by the effects of the sleep metrics. More sleep leads to more mental clarity and better recall, which in turn leads to better grades. The benefits of sleep are well documented and our research supports it.
gpa_stan_hierarchical_no_categorical <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
array[n] int<lower=1> semester;
// Building level data
int<lower=1> n_semester_data;
//int<lower=1> semester[n];
}
parameters {
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
real mu_semester; // mean of semester responses
vector[n_semester_data] z0; // building specific intercepts
real<lower=0> sigma; // residual standard deviation
real<lower=0> sigma_semester; // standard deviation between buildings
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = mu_semester + z0[semester[i]] * sigma_semester + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i];
}
// priors
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
sigma ~ lognormal(0,.25); // for residual standard deviation
mu_semester ~ normal(2.5, 0.5); // for mean across semesters
sigma_semester ~ lognormal(0,1); // for sd between semesters
// defining the likelihood
z0 ~ normal(0, 1);
semesterGPA ~ normal(mu, sigma);
}
generated quantities {
vector[n_semester_data] a0;
a0 = mu_semester + z0 * sigma_semester;
}
'
stan_gpa_hierarchical_no_categorical <- 'stan_gpa_hierarchical_no_categorical.RDS'
if (file.exists(stan_gpa_hierarchical_no_categorical)){
gpa_stan_model_hierarchical_no_categorical <- readRDS(stan_gpa_hierarchical_no_categorical)
}else{
gpa_stan_model_hierarchical_no_categorical <- stan(model_code = gpa_stan_hierarchical_no_categorical,
chains = 4, iter = 2000,
data = stangpadata)
saveRDS(gpa_stan_model_hierarchical_no_categorical, file = stan_gpa_hierarchical_no_categorical)
}
gpa_stan_hierarchical_no_categorical_Lik <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
array[n] int<lower=1> semester;
// Building level data
int<lower=1> n_semester_data;
//int<lower=1> semester[n];
}
parameters {
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
real mu_semester; // mean of semester responses
vector[n_semester_data] z0; // building specific intercepts
real<lower=0> sigma; // residual standard deviation
real<lower=0> sigma_semester; // standard deviation between buildings
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = mu_semester + z0[semester[i]] * sigma_semester + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i];
}
// priors
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
sigma ~ lognormal(0,.25); // for residual standard deviation
mu_semester ~ normal(2.5, 0.5); // for mean across semesters
sigma_semester ~ lognormal(0,1); // for sd between semesters
// defining the likelihood
z0 ~ normal(0, 1);
semesterGPA ~ normal(mu, sigma);
}
generated quantities {
vector[n] log_lik;
for (i in 1:n) {
log_lik[i] = normal_lpdf(semesterGPA[i] | mu_semester + z0[semester[i]] * sigma_semester + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i],
sigma);
vector[n_semester_data] a0;
a0 = mu_semester + z0 * sigma_semester;
}
}
'
stan_gpa_hierarchical_no_categorical_Lik <- 'stan_gpa_hierarchical_no_categorical_Lik.RDS'
if (file.exists(stan_gpa_hierarchical_no_categorical_Lik)){
gpa_stan_model_hierarchical_no_categorical_Lik <- readRDS(stan_gpa_hierarchical_no_categorical_Lik)
}else{
gpa_stan_model_hierarchical_no_categorical_Lik <- stan(model_code = gpa_stan_hierarchical_no_categorical_Lik,
chains = 4, iter = 2000,
data = stangpadata)
saveRDS(gpa_stan_model_hierarchical_no_categorical_Lik, file = stan_gpa_hierarchical_no_categorical_Lik)
}
gpa_stan_model_hierarchical_no_categorical
## Inference for Stan model: rt_cmdstanr_fb96b5954e1a1213aac6a181ebbec830-202304261705-1-1a81b0.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff
## b1 0.02 0.00 0.01 -0.01 0.01 0.02 0.03 0.05 3082
## b2 -0.02 0.00 0.01 -0.05 -0.03 -0.02 -0.01 0.01 2948
## b3 0.06 0.00 0.01 0.03 0.05 0.06 0.07 0.08 3981
## b4 0.08 0.00 0.01 0.06 0.07 0.08 0.09 0.11 3454
## mu_semester 3.64 0.00 0.04 3.55 3.62 3.64 3.66 3.72 1194
## z0[1] 0.09 0.01 0.50 -0.90 -0.24 0.10 0.42 1.10 1664
## z0[2] -1.23 0.02 0.64 -2.61 -1.63 -1.20 -0.79 -0.06 1275
## z0[3] -0.13 0.01 0.50 -1.12 -0.45 -0.12 0.21 0.85 1516
## z0[4] 0.83 0.01 0.54 -0.16 0.45 0.81 1.18 1.95 1798
## z0[5] 0.16 0.01 0.50 -0.85 -0.17 0.17 0.49 1.12 1627
## z0[6] 0.07 0.01 0.48 -0.88 -0.26 0.07 0.40 0.99 1533
## z0[7] 0.64 0.01 0.53 -0.36 0.29 0.63 0.98 1.72 1545
## sigma 0.34 0.00 0.01 0.33 0.34 0.34 0.35 0.36 3853
## sigma_semester 0.09 0.00 0.04 0.04 0.07 0.08 0.11 0.20 1160
## a0[1] 3.65 0.00 0.03 3.60 3.63 3.65 3.66 3.70 5001
## a0[2] 3.54 0.00 0.02 3.49 3.52 3.54 3.56 3.59 3759
## a0[3] 3.63 0.00 0.03 3.58 3.61 3.63 3.65 3.68 5601
## a0[4] 3.71 0.00 0.03 3.65 3.69 3.71 3.73 3.76 5339
## a0[5] 3.65 0.00 0.03 3.60 3.64 3.65 3.67 3.70 5146
## a0[6] 3.65 0.00 0.02 3.60 3.63 3.65 3.66 3.69 5778
## a0[7] 3.69 0.00 0.03 3.64 3.67 3.69 3.71 3.75 4845
## lp__ 652.35 0.10 3.34 644.93 650.41 652.63 654.72 657.93 1019
## Rhat
## b1 1
## b2 1
## b3 1
## b4 1
## mu_semester 1
## z0[1] 1
## z0[2] 1
## z0[3] 1
## z0[4] 1
## z0[5] 1
## z0[6] 1
## z0[7] 1
## sigma 1
## sigma_semester 1
## a0[1] 1
## a0[2] 1
## a0[3] 1
## a0[4] 1
## a0[5] 1
## a0[6] 1
## a0[7] 1
## lp__ 1
##
## Samples were drawn using NUTS(diag_e) at Wed Apr 26 17:05:56 2023.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
bayesplot::mcmc_trace(gpa_stan_model_hierarchical_no_categorical)
bayesplot::mcmc_rank_overlay(gpa_stan_model_hierarchical_no_categorical)
base_psamp_hierarchical_no_categorical <- as.data.frame(gpa_stan_model_hierarchical_no_categorical) |>
select(-lp__)
gf_dens(~base_psamp_hierarchical_no_categorical$b1, data = base_psamp, color = 'blue') |>
gf_dens(~base_psamp_hierarchical_no_categorical$b2, data = base_psamp, color = 'red') |>
gf_dens(~base_psamp_hierarchical_no_categorical$b3, data = base_psamp, color = 'black') |>
gf_dens(~base_psamp_hierarchical_no_categorical$b4, data = base_psamp, color = 'green')|>
gf_labs(title = "Quantitative Posteriors",
x = "Values of Beta",
subtitle= "Blue: Mean Steps, Red: Mean Activity, Black: Mean Sleep Efficiency, Green: Mean Sleep")
gpa_stan_categorical <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
array[n] int semesterSeason;
array[n] int<lower=1> semester;
// Building level data
int<lower=1> n_semester_data;
//int<lower=1, upper=2> semesterSeason[n]; // 1: Fall, 2: Spring
//int<lower=1> semester[n];
}
parameters {
real b0; // intercept
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
vector[2] b5;
real<lower=0> sigma; // residual standard deviation
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = b0 + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i] + b5[semesterSeason[i]];
}
// priors
b0 ~ normal(3, 0.75); // for the intercept
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
b5 ~ normal(0, 0.15);
sigma ~ lognormal(0, 0.25); // for residual standard deviation
// defining the likelihood
semesterGPA ~ normal(mu, sigma);
}
'
stan_gpa_categorical <- 'stan_gpa_categorical.RDS'
if (file.exists(stan_gpa_categorical)){
gpa_stan_model_categorical <- readRDS(stan_gpa_categorical)
}else{
gpa_stan_model_categorical <- stan(model_code = gpa_stan_categorical,
chains = 4, iter = 2000,
data = stangpadata)
saveRDS(gpa_stan_model_categorical, file = stan_gpa_categorical)
}
gpa_stan_categorical_Lik <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
array[n] int semesterSeason;
array[n] int<lower=1> semester;
// Building level data
int<lower=1> n_semester_data;
//int<lower=1, upper=2> semesterSeason[n]; // 1: Fall, 2: Spring
//int<lower=1> semester[n];
}
parameters {
real b0; // intercept
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
vector[2] b5;
real<lower=0> sigma; // residual standard deviation
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = b0 + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i] + b5[semesterSeason[i]];
}
// priors
b0 ~ normal(3, 0.75); // for the intercept
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
b5 ~ normal(0, 0.15);
sigma ~ lognormal(0, 0.25); // for residual standard deviation
// defining the likelihood
semesterGPA ~ normal(mu, sigma);
}
generated quantities {
vector[n] log_lik;
for (i in 1:n) {
log_lik[i] = normal_lpdf(semesterGPA[i] | b0 + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i] + b5[semesterSeason[i]], // fill in ... to match your model
sigma);
}
}
'
stan_gpa_categorical_Lik <- 'stan_gpa_categorical_Lik.RDS'
if (file.exists(stan_gpa_categorical_Lik)){
gpa_stan_model_categorical_Lik <- readRDS(stan_gpa_categorical_Lik)
}else{
gpa_stan_model_categorical_Lik <- stan(model_code = gpa_stan_categorical_Lik,
chains = 4, iter = 2000,
data = stangpadata)
saveRDS(gpa_stan_model_categorical_Lik, file = stan_gpa_categorical_Lik)
}
gpa_stan_model_categorical
## Inference for Stan model: rt_cmdstanr_b5e12d5068b2cdb13ae34adb9aacc686-202304261709-1-7afe4e.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## b0 3.63 0.00 0.11 3.43 3.56 3.63 3.70 3.83 1555 1
## b1 0.03 0.00 0.02 0.00 0.02 0.03 0.04 0.06 2720 1
## b2 -0.03 0.00 0.02 -0.06 -0.04 -0.03 -0.02 0.00 2957 1
## b3 0.06 0.00 0.02 0.03 0.05 0.06 0.07 0.08 3597 1
## b4 0.08 0.00 0.01 0.06 0.07 0.08 0.09 0.10 3068 1
## b5[1] -0.01 0.00 0.11 -0.21 -0.08 -0.01 0.07 0.20 1512 1
## b5[2] 0.03 0.00 0.11 -0.17 -0.04 0.03 0.11 0.24 1493 1
## sigma 0.34 0.00 0.01 0.33 0.34 0.34 0.35 0.36 3671 1
## lp__ 647.98 0.05 2.04 643.05 646.82 648.34 649.47 650.94 1576 1
##
## Samples were drawn using NUTS(diag_e) at Wed Apr 26 17:10:16 2023.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
bayesplot::mcmc_trace(gpa_stan_model_categorical)
bayesplot::mcmc_rank_overlay(gpa_stan_model_categorical)
base_psamp_categorical <- as.data.frame(gpa_stan_model_categorical) |>
select(-lp__)
gf_dens(~base_psamp_categorical$b1, data = base_psamp, color = 'blue') |>
gf_dens(~base_psamp_categorical$b2, data = base_psamp, color = 'red') |>
gf_dens(~base_psamp_categorical$b3, data = base_psamp, color = 'black') |>
gf_dens(~base_psamp_categorical$b4, data = base_psamp, color = 'green')|>
gf_labs(title = "Quantitative Posteriors",
x = "Values of Beta",
subtitle= "Blue: Mean Steps, Red: Mean Activity, Black: Mean Sleep Efficiency, Green: Mean Sleep")
gf_dens(~base_psamp_categorical$`b5[1]`, data = base_psamp, color = 'purple') |>
gf_dens(~base_psamp_categorical$`b5[2]`, data = base_psamp, color = 'orange')|>
gf_labs(title = "Categorical Posteriors",
x = "Values of b5",
subtitle = "Purple: Fall, Orange: Spring")
gpa_stan_standard <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
}
parameters {
real b0; // intercept
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
real<lower=0> sigma; // residual standard deviation
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = b0 + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i];
}
// priors
b0 ~ normal(3, 0.75); // for the intercept
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
sigma ~ lognormal(0,.25); // for residual standard deviation
// defining the likelihood
semesterGPA ~ normal(mu, sigma);
}
'
stan_gpa_standard <- 'stan_gpa_standard.RDS'
if (file.exists(stan_gpa_standard)){
gpa_stan_model_standard <- readRDS(stan_gpa_standard)
}else{
gpa_stan_model_standard <- stan(model_code = gpa_stan_standard,
chains = 4, iter = 2000,
data = stangpadata)
saveRDS(gpa_stan_model_standard, file = stan_gpa_standard)
}
gpa_stan_standard_Lik <- '
data {
int<lower=1> n;
vector[n] semesterGPA;
vector[n] stdmeanSteps;
vector[n] stdmeanActivity;
vector[n] stdmeanEff;
vector[n] stdmeanSleep;
}
parameters {
real b0; // intercept
real b1; // slope for meanSteps
real b2; // slope for meanActivity
real b3; // slope for meanEff
real b4; // slope for meanSleep
real<lower=0> sigma; // residual standard deviation
}
model {
vector[n] mu;
for (i in 1:n) {
mu[i] = b0 + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i];
}
// priors
b0 ~ normal(3, 0.75); // for the intercept
b1 ~ normal(0, 0.15); // for the slope for meanSteps
b2 ~ normal(0, 0.15); // for the slope for meanActivity
b3 ~ normal(0, 0.15); // for the slope for meanEff
b4 ~ normal(0, 0.15); // for the slope for meanSleep
sigma ~ lognormal(0,.25); // for residual standard deviation
// defining the likelihood
semesterGPA ~ normal(mu, sigma);
}
generated quantities {
vector[n] log_lik;
for (i in 1:n) {
log_lik[i] = normal_lpdf(semesterGPA[i] | b0 + b1 * stdmeanSteps[i] + b2 * stdmeanActivity[i] + b3 * stdmeanEff[i] + b4 * stdmeanSleep[i],
sigma);
}
}
'
stan_gpa_standard_Lik <- 'stan_gpa_standard_Lik.RDS'
if (file.exists(stan_gpa_standard_Lik)){
gpa_stan_model_standard_Lik <- readRDS(stan_gpa_standard_Lik)
}else{
gpa_stan_model_standard_Lik <- stan(model_code = gpa_stan_standard_Lik,
chains = 4, iter = 2000,
data = stangpadata)
saveRDS(gpa_stan_model_standard_Lik, file = stan_gpa_standard_Lik)
}
gpa_stan_model_standard
## Inference for Stan model: rt_cmdstanr_22ec0651d52736d11ad6239d2b231365-202304261726-1-38ecac.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## b0 3.64 0.00 0.01 3.62 3.63 3.64 3.65 3.66 4514 1
## b1 0.03 0.00 0.01 0.00 0.02 0.03 0.04 0.06 2538 1
## b2 -0.03 0.00 0.02 -0.06 -0.04 -0.03 -0.02 0.00 2678 1
## b3 0.06 0.00 0.01 0.03 0.05 0.06 0.07 0.08 4245 1
## b4 0.08 0.00 0.01 0.06 0.07 0.08 0.09 0.10 3713 1
## sigma 0.34 0.00 0.01 0.33 0.34 0.34 0.35 0.36 3700 1
## lp__ 647.07 0.04 1.75 642.83 646.14 647.34 648.36 649.51 1968 1
##
## Samples were drawn using NUTS(diag_e) at Wed Apr 26 17:27:01 2023.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
bayesplot::mcmc_trace(gpa_stan_model_standard)
bayesplot::mcmc_rank_overlay(gpa_stan_model_standard)
base_psamp_standard <- as.data.frame(gpa_stan_model_standard) |>
select(-lp__) #|>
#pivot_longer(cols = everything(),
#names_to = 'parameter',
#values_to = 'value')
gf_dens(~base_psamp_standard$b1, data = base_psamp, color = 'blue') |>
gf_dens(~base_psamp_standard$b2, data = base_psamp, color = 'red') |>
gf_dens(~base_psamp_standard$b3, data = base_psamp, color = 'black') |>
gf_dens(~base_psamp_standard$b4, data = base_psamp, color = 'green')|>
gf_labs(title = "Quantitative Posteriors",
x = "Values of Beta",
subtitle= "Blue: Mean Steps, Red: Mean Activity, Black: Mean Sleep Efficiency, Green: Mean Sleep")
#compare(gpa_stan_model_standard_Lik, gpa_stan_model_categorical_Lik, gpa_stan_model_hierarchical_no_categorical_Lik, gpa_stan_model_hierarchical_Lik, func = WAIC)
error when knitting and unsure of reason:
“Quitting from lines 891-892 (FinalProject.Rmd) Error in
compare(gpa_stan_model_standard_Lik, gpa_stan_model_categorical_Lik, :
unused argument (gpa_stan_model_hierarchical_Lik) Calls:
Here is an included image. Also error with LaTex for pdf knitting so it is in HTML.
WAIC Results
For a comparison of models, we compared 4 different models. The first model is our main model which is a normal distribution model with a hierarchical intercept of semester and then 5 predictor variables. The second model is the same model but without the categorical predictor of Fall versus Spring. The third model is the same as the first model but removes the hierarchical piece of it. The final model is the same as the third model but once again, without the categorical predictor of Fall versus Spring. Using the compare function in r, our first and second models perform better than our third and fourth models based on a lower WAIC score which indicates lower out of sample deviance. The first two models also have much smaller standard error of the difference in WAIC values compared to the third and fourth models, as well as higher estimated effective number of parameters. This gives us confidence in the hierarchical piece of the model and that it was a good decision. The lack of difference in comparison metrics between models 1 and 2 as well as the difference between models 3 and 4, does indicate that the categorical variable had a small effect on the model and probably doesn’t do all that much.
Given our research question regarding the effect of sleep and physical activity on college students’ semester GPA, our model found slight indications of meaning that there is much effect. The variables meanSteps and meanActivity, while standardized, had effects of 0.02 and -0.02 respectively. The most interesting part of these results isn’t how small they are but rather that they are directly inverted, meaning that if a student had equal relative amount of steps and activity, their GPA would remain the same, but more steps and less activity would increase their GPA but the less steps and more activity would in fact decrease their GPA. Both SleepEfficiency and meanSleep had more effect on GPA but not by much but the amount of sleep does appear to be slightly more important than the efficiency of sleep. The categorical predictor also had minimal effect on the model but the Spring semester had twice as much impact on GPA than the Fall given their mean slopes of 0.06 and 0.03 respectively. Given these results, it is hard to conclude much. This is likely due to the data set and how much we minimized the day to day data by summarizing it as a mean across an entire semester. Potential future research should look into a way to use the daily results to look at GPA over time. Another improvement to the data could be using Semester as the semester a student is in school rather than an arbitrary semester. Instead of Spring 2015 and Fall 2015 being semester 1 and semester 2, A senior in the final semester in Spring 2015 should have semester 8 and a freshman in their first semester in Fall 2015 should have semester 1. Although these things weren’t given in our data set, it would potentially make the results more interesting. Another piece of information that would’ve been beneficial would’ve been to include measurement error calculations on the data since the fitbit data wouldn’t always be accurate and it might’ve helped to account that the data was being measured in such a way. Finally, fitting a gamma model opposed to a normal model could’ve helped as well by allowing us to bound the model to greater than 0 since gamma doesn’t work with negative numbers and we could’ve been more successful in constraining the model to real possible data of GPA between 0 and 4 instead of the slightly negative lower bound and around 10 for an upper bound we had. Overall, the research question was interesting and the model fitting was successful as well as the opportunity to compare multiple models and the importance of the hierarchical piece, but in the end, the model shows minimal effects on GPA besides the intercept.
gpa_stan_model_hierarchical
## Inference for Stan model: rt_cmdstanr_aa499b92a5d6a61c631cd11a72bc7a54-202304261654-1-6bfe70.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff
## b1 0.02 0.00 0.01 -0.01 0.01 0.02 0.03 0.05 2959
## b2 -0.02 0.00 0.02 -0.05 -0.03 -0.02 -0.01 0.01 3067
## b3 0.06 0.00 0.01 0.03 0.05 0.06 0.07 0.08 4496
## b4 0.08 0.00 0.01 0.06 0.07 0.08 0.09 0.10 4472
## b5[1] 0.03 0.00 0.11 -0.19 -0.04 0.03 0.11 0.25 2216
## b5[2] 0.06 0.00 0.12 -0.16 -0.01 0.06 0.14 0.28 2330
## mu_semester 3.59 0.00 0.11 3.37 3.51 3.59 3.67 3.82 2152
## z0[1] 0.22 0.01 0.55 -0.86 -0.15 0.23 0.60 1.29 2255
## z0[2] -1.00 0.02 0.67 -2.38 -1.45 -0.98 -0.53 0.23 1550
## z0[3] 0.02 0.01 0.55 -1.07 -0.33 0.03 0.40 1.06 2232
## z0[4] 0.91 0.01 0.60 -0.22 0.50 0.89 1.30 2.18 2079
## z0[5] -0.04 0.01 0.63 -1.28 -0.46 -0.04 0.39 1.17 2131
## z0[6] -0.12 0.01 0.61 -1.34 -0.52 -0.10 0.29 1.04 2149
## z0[7] 0.41 0.01 0.62 -0.79 0.01 0.39 0.81 1.67 2222
## sigma 0.34 0.00 0.01 0.33 0.34 0.34 0.35 0.35 4471
## sigma_semester 0.11 0.00 0.06 0.04 0.07 0.09 0.13 0.26 1046
## a0[1] 3.61 0.00 0.12 3.40 3.53 3.61 3.69 3.84 2300
## a0[2] 3.50 0.00 0.11 3.28 3.43 3.50 3.58 3.73 2236
## a0[3] 3.60 0.00 0.11 3.37 3.52 3.60 3.67 3.82 2248
## a0[4] 3.68 0.00 0.11 3.46 3.60 3.67 3.75 3.91 2267
## a0[5] 3.59 0.00 0.12 3.36 3.51 3.59 3.67 3.82 2321
## a0[6] 3.59 0.00 0.12 3.36 3.51 3.58 3.66 3.81 2330
## a0[7] 3.63 0.00 0.12 3.41 3.56 3.63 3.71 3.87 2355
## lp__ 652.11 0.12 3.46 644.52 650.01 652.40 654.61 657.87 903
## Rhat
## b1 1
## b2 1
## b3 1
## b4 1
## b5[1] 1
## b5[2] 1
## mu_semester 1
## z0[1] 1
## z0[2] 1
## z0[3] 1
## z0[4] 1
## z0[5] 1
## z0[6] 1
## z0[7] 1
## sigma 1
## sigma_semester 1
## a0[1] 1
## a0[2] 1
## a0[3] 1
## a0[4] 1
## a0[5] 1
## a0[6] 1
## a0[7] 1
## lp__ 1
##
## Samples were drawn using NUTS(diag_e) at Wed Apr 26 16:55:57 2023.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
bayesplot::mcmc_rank_overlay(gpa_stan_model_hierarchical)
bayesplot::mcmc_trace(gpa_stan_model_hierarchical)
We can see that all the model we fitted above converged, from the all the trace plots, they are moving randomly around some single mean value for 4 chains which looks like a dense scribble rather than a meandering stream. Also, for all of our rank plots, all of the 4 chains are swapping ranks a lot, which means the different chains are exploring similar areas of the posterior space. And 4 model summaries have Rhat value of 1 for the vast majority of b1,b2 and so on, their effective number of samples is ideal comparing to the the actual number of iterations.
No
Yes
No
Yes
No
No