Overview of R Package ‘predictmeans’

Author

Dr. Dongwen Luo
AgResearch NZ Ltd

Published

June 9, 2025

Code
library(tidyverse)
library(predictmeans)
library(HRW)
library(gridExtra)
library(car)
library(DT)
library(gt)
library(knitr)

# Data copper can be download at Example 8
copper <- read.csv("copper.csv", header=TRUE)

DT_df <- function(df){
  require(DT)
  stopifnot(is.data.frame(df))
  
  df |>
    DT::datatable(
      extensions = 'Buttons',
      options = list(
        dom = 'Blfrtip', # set DOM
        buttons = c('csv'),
        lengthMenu = list(
          c(10, 20, -1), # declare values
          c(10, 20, "All") # declare titles
        ), # end of lengthMenu customization
        pageLength = 10
      ), rownames= FALSE)
}

Outline

  • Motivation

  • Overview

  • Examples

  • Future Plan

Motivation

  • Mixed Effects Models: Essential for statistical analysis (e.g., split-plot experiments, repeated measurements, meta-analysis). However, popular R packages like ‘nlme’ and ‘lme4’ only provide model effects and ANOVA outputs, lacking further inference (predicted means, multiple comparisons, confidence intervals).

  • Permutation Tests: Ideal for small sample sizes, non-normal distributions, or when parametric test assumptions are unmet. They offer robust and flexible hypothesis testing in such scenarios.

  • Semiparametric Regression: Combines parametric and non-parametric effects for predictive variables. Widely valuable across astronomy, biology, medicine, economics, and finance applications.

  • Package ‘predictmeans’: Offers diagnostic and inference functions for various models (‘aov’, ‘lm’, ‘glm’, etc.). Inferences include predicted means, standard errors, contrasts, multiple comparisons, permutation tests, adjusted R-square, and graphical representations. https://cran.r-project.org/web/packages/predictmeans/index.html.

Overview

Main Functions Flowchart

Contents

Example_1 Split-plot Design (Oats data)

In the split-plot design shown here, the treatments are three varieties of oats (Victory, Golden rain and Marvellous) and four levels of nitrogen (0, 0.2, 0.4 and 0.6 cwt). As it is feasible to work with smaller plots for fertiliser than for varieties, the six blocks were initially split into three whole-plots and then each whole-plot was split into four subplots. The varieties were allocated (at random) to the whole-plots within each block, and the nitrogen levels (at random) to the subplots within each whole-plot. In a randomized-block design, we have a hierarchical structure with blocks and then plots within blocks.

Data Info

Modelling

ANOVA and \(R^2\)

  • ANOVA
Code
kable(anova(mod))
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
nitro 20020.5001 6673.5000 3 45.00004 37.685704 0.0000000
Variety 526.0575 263.0288 2 10.00002 1.485341 0.2723866
nitro:Variety 321.7500 53.6250 6 45.00004 0.302824 0.9321985
  • \(R^2\)
Code
R2_glmm(mod)
# Adjusted R2 for Mixed Models
           Total            Fixed           Random Whole_plot:Block 
        "75.81%"         "37.19%"         "38.63%"         "13.87%" 
           Block 
        "24.75%" 

Predicted Means

Predicted Means for “nitro”

Predicted Means for “nitro:Variety”

Highlights

Options in function ‘predictmens’:

  • {pair=TRUE} is for multiple comparison without p-value adjusted (‘adj’ has default value ‘none’).

  • {adj=“BH”} is for multiple comparison with p-value adjusted by “BH” method (‘pair’ with be TRUE automatically when ‘adj’ is not none).

  • {bar=TRUE} is for producing bar chart.

  • {meandecr=TRUE} is for descending means in ‘mean_table’.

  • {meandecr=FALSE} is for ascending means in ‘mean_table’.

  • Pairwise LSD is available only when adj=‘none’ or ‘bonferroni’.

Permutation Test

Example_2 Repeated Measurements (ATP data)

ATP containing data from an experiment to study the effects of preserving liquids on the enzyme content of dog hearts. There were 23 hearts and two treatment factors, A and B, each at two levels. Measurements were made of ATP as a percentage of total enzyme in the heart, at one and two hourly intervals during a twelve hour period following initial preservation.

Data Info

Modelling

ANOVA and \(R^2\)

  • ANOVA
Code
kable(anova(mod))
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
A 9372.542 9372.542 1 19 1.1970159 0.2875959
B 9666.108 9666.108 1 19 1.2345088 0.2803950
time 4338231.562 482025.729 9 171 61.5620077 0.0000000
A:B 13034.579 13034.579 1 19 1.6647138 0.2124504
A:time 299791.243 33310.138 9 171 4.2542106 0.0000540
B:time 168897.036 18766.337 9 171 2.3967463 0.0139325
A:B:time 25392.525 2821.392 9 171 0.3603346 0.9522486
  • \(R^2\)
Code
R2_glmm(mod)
# Adjusted R2 for Mixed Models
   Total    Fixed   Random    heart 
"74.28%" "64.23%" "10.04%" "10.04%" 

Predicted Means

Predicted Means for “A:time”

Predicted Means for “B:time”

Predicted Means for “A:B:time”

Highlights

Options in function ‘predictmens’:

  • {atvar=“time”} sets multiple comparison within each level of ‘time’, it also infects the mean_table and mean plots arrangement. When ‘atvar’ is not NULL ‘pair’ turns to be TRUE automatically. Please try to see any effects for ‘atvar=c(“A”, “B”)’ or ‘atvar=c(“B”, “A”)’.

  • {trans=function(x) x^(1/1.5)} is a function for back transformation e.g. ‘trans=exp’ for log, ‘trans=function(x) x^2’ for sqrt.

  • For any GLM, the function produces associated back transformation automatically.

Example_3 GLM with Covariate (Drug data)

Comparison of the effectiveness of three analgesic drugs to a standard drug, morphine (Finney, Probit analysis, 3rd Edition 1971, p.103). 14 groups of mice were tested for response to the drugs at a range of doses. N is total number of mice in each group, R is number responding.

Data Info

Modelling

ANOVA

Code
kable(car::Anova(mod))
LR Chisq Df Pr(>Chisq)
log10Dose 221.822660 1 0.0000000
Drug 206.682057 3 0.0000000
log10Dose:Drug 1.533629 3 0.6745307

Covariate Means

Predicted Means for “Treatment”

Highlights

Options in function ‘covariatemeans’:

  • {level = 0.166} produces 83.4% confidence interval (CI) for predicted means. If a straightforward multiple comparison between predicted means is not feasible, an alternative method involves utilizing predicted means with 83.4% CIs. When the CIs of two predicted means overlap, the associated means are not significantly different at the 0.05 level; otherwise, they are considered significant. The function ‘ci_mcp(LL, UL)’ is used to produce indicating letters.

DT_df is a function for producing data table in html with ‘CSV’ download button. To download the whole data, you need to select ‘All’ entries.

Example_4 Semiparametric Regression with Group (WarsawApts data)

‘WarsawApts’ is a subset of the data set ‘apartments’ in the R package PBImisc. This dataset contains the prices of the apartments which were sold in Warsaw, Poland, during the calendar years 2007 to 2009.

Data Info

Smooth without Group

ANOVA, RANOVA and \(R^2\)

  • ANOVA
Code
kable(anova(mod1$semer))
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
construction.date 500.9997 500.9997 1 1871.012 1.579118 0.2090447
  • RANOVA
Code
kable(ranova(mod1$semer))
npar logLik AIC LRT Df Pr(>Chisq)
4 -1772.452 3552.905 NA NA NA
(1 | grp) 3 -1844.882 3695.763 144.859 1 0
  • \(R^2\)
Code
R2_glmm(mod1$semer)
# Adjusted R2 for Mixed Models
   Total    Fixed   Random      grp 
"86.14%"  "0.03%" "86.11%" "86.11%" 

Prediction

Smooth with Group

ANOVA, RANOVA and \(R^2\)

  • ANOVA
Code
kable(anova(mod2$semer))
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
construction.date 448.0818 448.0818 1 2331.885 1.621461 0.2030151
district 4444.3098 1481.4366 3 2832.229 5.360832 0.0011129
construction.date:district 1582.3866 527.4622 3 1983.797 1.908713 0.1260932
  • RANOVA
Code
kable(ranova(mod2$semer))
npar logLik AIC LRT Df Pr(>Chisq)
13 -1751.525 3529.051 NA NA NA
(1 | grp.Mokotow) 12 -1812.510 3649.019 121.9686 1 0
(1 | grp.Srodmiescie) 12 -1812.510 3649.019 121.9686 1 0
(1 | grp.Wola) 12 -1812.510 3649.019 121.9686 1 0
(1 | grp.Zoliborz) 12 -1812.510 3649.019 121.9686 1 0
  • \(R^2\)
Code
R2_glmm(mod2$semer)
# Adjusted R2 for Mixed Models
          Total           Fixed          Random     grp.Mokotow grp.Srodmiescie 
       "89.98%"         "0.69%"        "89.29%"        "31.79%"        "55.41%" 
       grp.Wola    grp.Zoliborz 
        "0.27%"         "1.81%" 

Prediction with Group

Example_5 Semiparametric Regression with Multiple Covariates (Caschool data)

A data is collected from 420 schools in United States cross-section from 1998-1999, including county, grade span of district, percent qualifying for CalWORKS, computer per student, district average income, average math score.

Data Info

Smooth Modelling

ANOVA and \(R^2\)

  • ANOVA
Code
kable(Anova(mod$semer))
Chisq Df Pr(>Chisq)
calwpct 35.440316 1 0.0000000
log.avginc 91.867899 1 0.0000000
compstu 1.858835 1 0.1727593
  • \(R^2\)
Code
R2_glmm(mod$semer)
# Adjusted R2 for Mixed Models
       Total        Fixed       Random   1 | county   1 | grspan  1 | cal_grp 
    "72.42%"     "51.35%"     "21.06%"      "8.74%"         "0%"      "8.98%" 
 1 | avg_grp 1 | comp_grp 
     "2.84%"      "0.51%" 

Prediction

Example_6 Semiparametric Regression with Binomial Distribution (indonRespir data)

Indonesian Children’s Health Study of respiratory infections for a cohort of 275 Indonesian children. The data are longitudinal with each child having between 1 and 6 repeated measurements.

Data Info

Smooth without Group

ANOVA and \(R^2\)

  • ANOVA
Code
kable(car::Anova(mod1$semer))
Chisq Df Pr(>Chisq)
age 15.5042755 1 0.0000823
vitAdefic 1.7004790 1 0.1922253
sex 3.6344414 1 0.0565956
height 1.4757295 1 0.2244439
stunted 0.9710527 1 0.3244178
visit2 7.8003972 1 0.0052235
visit3 2.2951131 1 0.1297818
visit4 7.6878903 1 0.0055593
visit5 2.0929208 1 0.1479829
visit6 0.0000626 1 0.9936854
  • \(R^2\)
Code
R2_glmm(mod1$semer)
# Adjusted R2 for Mixed Models
    Total     Fixed    Random 1 | idnum   1 | grp 
  "33.8%"  "17.33%"  "16.47%"  "13.99%"   "2.48%" 

Prediction

Smooth with Group

ANOVA and \(R^2\)

  • ANOVA
Code
kable(Anova(mod2$semer))
Chisq Df Pr(>Chisq)
age 20.0274492 1 0.0000076
vitAdefic 1.5265392 1 0.2166326
sex 0.9017615 1 0.3423098
height 1.6630360 1 0.1971939
stunted 0.9686910 1 0.3250069
visit2 7.5723914 1 0.0059269
visit3 2.0072509 1 0.1565488
visit4 7.1443814 1 0.0075199
visit5 2.6819018 1 0.1014945
visit6 0.0305583 1 0.8612294
  • \(R^2\)
Code
R2_glmm(mod2$semer)
# Adjusted R2 for Mixed Models
         Total          Fixed         Random      1 | idnum   1 | grp.male 
      "34.21%"          "17%"       "17.21%"       "14.57%"        "2.64%" 
1 | grp.female 
          "0%" 

Prediction with Group

Example_7 Semiparametric Regression with Negtive Binomial Distribution (ragweed data)

The ragweed data frame has data on ragweed levels and meteorological variables for 334 days in Kalamazoo, Michigan, U.S.A.

Data Info

Smooth Modelling

ANOVA and \(R^2\)

  • ANOVA
Code
kable(car::Anova(mod$semer))
Chisq Df Pr(>Chisq)
rain 28.697489 1 0.0000001
year 6.902495 3 0.0750714
temperatureResidual 70.281585 1 0.0000000
dayInSeason 34.539924 1 0.0000000
windSpeed 76.531043 1 0.0000000
  • \(R^2\)
Code
R2_glmm(mod$semer)
# Adjusted R2 for Mixed Models
         Total          Fixed         Random 1 | s_day.1991 1 | s_day.1992 
       "1.37%"        "0.22%"        "1.15%"        "0.34%"        "0.35%" 
1 | s_day.1993 1 | s_day.1994     1 | s_temp     1 | s_wind 
       "0.26%"        "0.19%"           "0%"           "0%" 

Prediction by Day in Season

Example_8 Semiparametric Regression with Spatial Smooth (copper data)

Data copper records the copper grade quality associated with various sampling points’ X, Y, and Z coordinates within a designated area.

Data Info

Modelling

ANOVA, RANOVA and \(R^2\)

  • ANOVA
Code
kable(anova(mod$semer))
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
xcoord 0.0156795 0.0156795 1 2470291720.7005 0.1872202 0.6652403
ycoord 0.0000031 0.0000031 1 1562420796.9768 0.0000373 0.9951299
zcoord 0.4661560 0.4661560 1 1058.4382 5.5661082 0.0184923
xcoord:ycoord 0.0238179 0.0238179 1 1185516.1837 0.2843961 0.5938343
xcoord:zcoord 0.3288000 0.3288000 1 941.0343 3.9260172 0.0478351
ycoord:zcoord 2.1794253 2.1794253 1 870.7646 26.0232993 0.0000004
  • RANOVA
Code
kable(ranova(mod$semer))
npar logLik AIC LRT Df Pr(>Chisq)
10 -159.2031 338.4061 NA NA NA
(1 | grp_z) 9 -235.9377 489.8754 153.4693 1 0
(1 | grp_geo) 9 -236.6876 491.3752 154.9691 1 0
  • \(R^2\)
Code
R2_glmm(mod$semer)
# Adjusted R2 for Mixed Models
     Total      Fixed     Random    grp_geo      grp_z 
   "79.2%" "-135.63%"  "214.83%"  "196.83%"      "18%" 

The negative \(R^2\) indictates we may fit the model with intercept only.

Code
mod <- semireg(log(grade) ~ 1,
                smoothZ=list(
                  grp_z=smZ(zcoord),
                  grp_geo=smZ(cbind(xcoord, ycoord), type="Ztps")
                  ), 
                data=copper)
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: log(grade) ~ 1 + (1 | grp_z) + (1 | grp_geo)
   Data: copper
REML criterion at convergence: 255.1183
Random effects:
 Groups   Name        Std.Dev. 
 grp_geo  (Intercept) 0.0009664
 grp_z    (Intercept) 0.0159936
 Residual             0.3009139
Number of obs: 439, groups:  grp_geo, 36; grp_z, 8
Fixed Effects:
(Intercept)  
     0.1598  
Code
R2_glmm(mod$semer)
# Adjusted R2 for Mixed Models
   Total    Fixed   Random  grp_geo    grp_z 
"87.71%"     "0%" "87.71%" "78.17%"  "9.54%" 

Semiparametric prediction

Future Plan

  • Extend ‘predictmeans’ to more popular models in R,

  • Build a comprehensive Shiny App based on ‘predictmeans’,

  • Make functions in ‘predictmeans’ working properly for glmmTMB model with a complex variance-covariance structure.

Reference

  • Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x. https://www.jstor.org/stable/2346101.

  • Cave, V. (2021). Confidence tricks: the 83.4% confidence interval for comparing means. https://vsni.co.uk/blogs/confidence_trick.

  • Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

  • Harezlak J., Ruppert, D. & Wand, M.P. (2018). Semiparametric Regression with R. Springer; ISBN: 978-1-4939-8851-8.

  • Luo, D., Ganesh, S., & Koolaard, J. (2022). Predictmeans: calculate predicted means for linear models [R package version 1.0.8]. https://CRAN.R-project.org/package=predictmeans.

  • Piepho HP. An adjusted coefficient of determination (R2 ) for generalized linear mixed models in one go. Biom J. 2023 Oct;65(7):e2200290. doi: 10.1002/bimj.202200290. Epub 2023 May 1. PMID: 37127864.

  • R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.