library(tidyverse)library(predictmeans)library(HRW)library(gridExtra)library(car)library(DT)library(gt)library(knitr)# Data copper can be download at Example 8copper <-read.csv("copper.csv", header=TRUE)DT_df <-function(df){require(DT)stopifnot(is.data.frame(df)) df |> DT::datatable(extensions ='Buttons',options =list(dom ='Blfrtip', # set DOMbuttons =c('csv'),lengthMenu =list(c(10, 20, -1), # declare valuesc(10, 20, "All") # declare titles ), # end of lengthMenu customizationpageLength =10 ), rownames=FALSE)}
Outline
Motivation
Overview
Examples
Future Plan
Motivation
Mixed Effects Models: Essential for statistical analysis (e.g., split-plot experiments, repeated measurements, meta-analysis). However, popular R packages like ‘nlme’ and ‘lme4’ only provide model effects and ANOVA outputs, lacking further inference (predicted means, multiple comparisons, confidence intervals).
Permutation Tests: Ideal for small sample sizes, non-normal distributions, or when parametric test assumptions are unmet. They offer robust and flexible hypothesis testing in such scenarios.
Semiparametric Regression: Combines parametric and non-parametric effects for predictive variables. Widely valuable across astronomy, biology, medicine, economics, and finance applications.
Package ‘predictmeans’: Offers diagnostic and inference functions for various models (‘aov’, ‘lm’, ‘glm’, etc.). Inferences include predicted means, standard errors, contrasts, multiple comparisons, permutation tests, adjusted R-square, and graphical representations. https://cran.r-project.org/web/packages/predictmeans/index.html.
Overview
Main Functions Flowchart
Contents
Example_1 Split-plot Design (Oats data)
In the split-plot design shown here, the treatments are three varieties of oats (Victory, Golden rain and Marvellous) and four levels of nitrogen (0, 0.2, 0.4 and 0.6 cwt). As it is feasible to work with smaller plots for fertiliser than for varieties, the six blocks were initially split into three whole-plots and then each whole-plot was split into four subplots. The varieties were allocated (at random) to the whole-plots within each block, and the nitrogen levels (at random) to the subplots within each whole-plot. In a randomized-block design, we have a hierarchical structure with blocks and then plots within blocks.
# Permutation Test for model termssystem.time( permlme <-permmodels(model=mod, nperm=999))
Coefficients of (fixed) effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 80.0000 9.1070 16.0816 8.7844 0.0000
nitro0.2 18.5000 7.6829 45.0000 2.4079 0.0202
nitro0.4 34.6667 7.6829 45.0000 4.5122 0.0000
nitro0.6 44.8333 7.6829 45.0000 5.8354 0.0000
VarietyMarvellous 6.6667 9.7150 30.2308 0.6862 0.4978
VarietyVictory -8.5000 9.7150 30.2308 -0.8749 0.3885
nitro0.2:VarietyMarvellous 3.3333 10.8653 45.0000 0.3068 0.7604
nitro0.4:VarietyMarvellous -4.1667 10.8653 45.0000 -0.3835 0.7032
nitro0.6:VarietyMarvellous -4.6667 10.8653 45.0000 -0.4295 0.6696
nitro0.2:VarietyVictory -0.3333 10.8653 45.0000 -0.0307 0.9757
nitro0.4:VarietyVictory 4.6667 10.8653 45.0000 0.4295 0.6696
nitro0.6:VarietyVictory 2.1667 10.8653 45.0000 0.1994 0.8428
Perm_p_value
(Intercept) NA
nitro0.2 0.022
nitro0.4 0.001
nitro0.6 0.001
VarietyMarvellous 0.488
VarietyVictory 0.362
nitro0.2:VarietyMarvellous 0.752
nitro0.4:VarietyMarvellous 0.686
nitro0.6:VarietyMarvellous 0.671
nitro0.2:VarietyVictory 0.974
nitro0.4:VarietyVictory 0.686
nitro0.6:VarietyVictory 0.857
Note: Perm_p_value of t test is obtained using '999' permutations.
ANOVA:
Sum Sq Mean Sq NumDF DenDF F value Pr(>F) Perm_p_value
nitro 20020.5001 6673.5000 3 45 37.6857 0.0000 0.001
Variety 526.0575 263.0288 2 10 1.4853 0.2724 0.244
nitro:Variety 321.7500 53.6250 6 45 0.3028 0.9322 0.935
Note: Perm_p_value of F test is obtained using '999' permutations.
user system elapsed
0.35 0.06 7.12
Code
# Permutation Test for multiple comparisonspredictmeans(model=mod, modelterm="nitro:Variety", atvar="Variety", adj="BH", permlist=permlme, plot=FALSE)
$`Predicted Means`
Variety Golden Rain Marvellous Victory
nitro
0 80.0000 86.6667 71.5000
0.2 98.5000 108.5000 89.6667
0.4 114.6667 117.1667 110.8333
0.6 124.8333 126.8333 118.5000
$`Standard Error of Means`
All means have the same SE
9.10701
$`Standard Error of Differences`
Max.SED Min.SED Aveg.SED
9.715020 7.682948 9.160819
attr(,"For the Same Level of Factor")
nitro Variety
Aveg.SED 9.71502 7.682948
Min.SED 9.71502 7.682948
Max.SED 9.71502 7.682948
$`Approximated LSD`
Max.LSD Min.LSD Aveg.LSD
19.43004 15.36590 18.32164
attr(,"For the Same Level of Factor")
nitro Variety
Aveg.LSD 19.43004 15.3659
Min.LSD 19.43004 15.3659
Max.LSD 19.43004 15.3659
attr(,"Note")
[1] "This is a approximate LSD (i.e. 2*SED) at 0.05 level."
$`Pairwise '999' times permuted p-value (adjusted by 'BH' method)`
[1] "For variable 'nitro' at each level of 'Variety'"
$`Golden Rain`
0 0.2 0.4 0.6 Group
0 1 A
0.2 0.033 1 B
0.4 0.003 0.0348 1 C
0.6 0.003 0.008 0.201 1 C
$Marvellous
0 0.2 0.4 0.6 Group
0 1 A
0.2 0.016 1 B
0.4 0.003 0.265 1 BC
0.6 0.003 0.03 0.265 1 C
$Victory
0 0.2 0.4 0.6 Group
0 1 A
0.2 0.0324 1 B
0.4 0.002 0.0105 1 C
0.6 0.002 0.002 0.312 1 C
$mean_table
$mean_table$Table
Variety nitro Mean SE Df LL(95%) UL(95%) LetterGrp
1 Golden Rain 0 80.0000 9.10701 NA 61.7860 98.2140 A
4 Golden Rain 0.2 98.5000 9.10701 NA 80.2860 116.7140 B
7 Golden Rain 0.4 114.6667 9.10701 NA 96.4526 132.8807 C
10 Golden Rain 0.6 124.8333 9.10701 NA 106.6193 143.0474 C
2 Marvellous 0 86.6667 9.10701 NA 68.4526 104.8807 A
5 Marvellous 0.2 108.5000 9.10701 NA 90.2860 126.7140 B
8 Marvellous 0.4 117.1667 9.10701 NA 98.9526 135.3807 BC
11 Marvellous 0.6 126.8333 9.10701 NA 108.6193 145.0474 C
3 Victory 0 71.5000 9.10701 NA 53.2860 89.7140 A
6 Victory 0.2 89.6667 9.10701 NA 71.4526 107.8807 B
9 Victory 0.4 110.8333 9.10701 NA 92.6193 129.0474 C
12 Victory 0.6 118.5000 9.10701 NA 100.2860 136.7140 C
$mean_table$Note
[1] "Letter-based representation of pairwise comparisons at significant level '0.05' at each level of Variety"
ATP containing data from an experiment to study the effects of preserving liquids on the enzyme content of dog hearts. There were 23 hearts and two treatment factors, A and B, each at two levels. Measurements were made of ATP as a percentage of total enzyme in the heart, at one and two hourly intervals during a twelve hour period following initial preservation.
$`Predicted Means`
A 1 2
B 1 2 1 2
time
0 683.4993 772.1393 745.8641 747.3915
1 625.6698 733.6232 775.2654 741.9678
2 707.7245 752.9475 694.2764 661.7776
3 651.3920 716.2343 651.9301 657.8931
4 672.0561 653.3022 653.8890 582.3405
5 619.0203 609.8928 629.9470 485.6291
6 522.4710 487.5464 559.3260 458.9779
8 448.9900 331.8658 573.1090 441.5482
10 342.3595 300.0973 482.2612 404.8767
12 294.1297 256.8516 429.4629 380.4315
$`Standard Error of Means`
A 1 2
B 1 2 1 2
time
0 42.83950 46.92832 42.83950 42.83950
1 42.83950 46.92832 42.83950 42.83950
2 42.83950 46.92832 42.83950 42.83950
3 42.83950 46.92832 42.83950 42.83950
4 42.83950 46.92832 42.83950 42.83950
5 42.83950 46.92832 42.83950 42.83950
6 42.83950 46.92832 42.83950 42.83950
8 42.83950 46.92832 42.83950 42.83950
10 42.83950 46.92832 42.83950 42.83950
12 42.83950 46.92832 42.83950 42.83950
$`Standard Error of Differences`
Max.SED Min.SED Aveg.SED
63.54125 51.08791 59.81139
attr(,"For the Same Level of Factor")
A B time
Aveg.SED 57.44156 57.44156 62.06273
Min.SED 51.08791 51.08791 60.58421
Max.SED 63.54125 63.54125 63.54125
$LSD
Max.LSD Min.LSD Aveg.LSD
125.4262 100.8441 118.0637
attr(,"For the Same Level of Factor")
A B time
Aveg.LSD 113.3859 113.3859 122.5077
Min.LSD 100.8441 100.8441 119.5892
Max.LSD 125.4262 125.4262 125.4262
attr(,"Significant level")
[1] 0.05
attr(,"Degree of freedom")
[1] 171
Code
gt(predm_out$mean_table$Table, caption ="Group: Letter-based representation of pairwise comparisons at significant level '0.05' at each 'time'")
Group: Letter-based representation of pairwise comparisons at significant level '0.05' at each 'time'
time
A
B
Mean
SE
Df
LL(95%)
UL(95%)
Bk_Mean
Bk_LL(95%)
Bk_UL(95%)
LetterGrp
0
1
1
683.4993
42.83950
108.49
598.5883
768.4103
77.5935
71.0262
83.8936
A
0
1
2
772.1393
46.92832
108.49
679.1240
865.1546
84.1648
77.2620
90.7951
A
0
2
1
745.8641
42.83950
108.49
660.9531
830.7751
82.2444
75.8776
88.3736
A
0
2
2
747.3915
42.83950
108.49
662.4805
832.3025
82.3567
75.9945
88.4818
A
1
1
1
625.6698
42.83950
108.49
540.7588
710.5808
73.1527
66.3747
79.6298
A
1
1
2
733.6232
46.92832
108.49
640.6079
826.6385
81.3421
74.3124
88.0800
A
1
2
1
775.2654
42.83950
108.49
690.3544
860.1764
84.3918
78.1115
90.4465
A
1
2
2
741.9678
42.83950
108.49
657.0568
826.8788
81.9578
75.5791
88.0970
A
2
1
1
707.7245
42.83950
108.49
622.8135
792.6355
79.4163
72.9299
85.6477
A
2
1
2
752.9475
46.92832
108.49
659.9322
845.9628
82.7643
75.7995
89.4474
A
2
2
1
694.2764
42.83950
108.49
609.3654
779.1874
78.4070
71.8762
84.6762
A
2
2
2
661.7776
42.83950
108.49
576.8666
746.6886
75.9407
69.2974
82.3050
A
3
1
1
651.3920
42.83950
108.49
566.4810
736.3029
75.1441
68.4632
81.5401
A
3
1
2
716.2343
46.92832
108.49
623.2190
809.2496
80.0516
72.9615
86.8404
A
3
2
1
651.9301
42.83950
108.49
567.0192
736.8411
75.1855
68.5065
81.5798
A
3
2
2
657.8931
42.83950
108.49
572.9821
742.8040
75.6433
68.9860
82.0193
A
4
1
1
672.0561
42.83950
108.49
587.1451
756.9671
76.7250
70.1181
83.0586
A
4
1
2
653.3022
46.92832
108.49
560.2868
746.3175
75.2909
67.9632
82.2777
A
4
2
1
653.8890
42.83950
108.49
568.9780
738.7999
75.3360
68.6642
81.7243
A
4
2
2
582.3405
42.83950
108.49
497.4295
667.2515
69.7351
62.7800
76.3589
A
5
1
1
619.0203
42.83950
108.49
534.1093
703.9313
72.6334
65.8294
79.1323
A
5
1
2
609.8928
46.92832
108.49
516.8775
702.9081
71.9177
64.4058
79.0556
A
5
2
1
629.9470
42.83950
108.49
545.0360
714.8580
73.4857
66.7242
79.9490
A
5
2
2
485.6291
42.83950
108.49
400.7181
570.5401
61.7831
54.3533
68.7898
A
6
1
1
522.4710
42.83950
108.49
437.5600
607.3819
64.8696
57.6357
71.7202
A
6
1
2
487.5464
46.92832
108.49
394.5311
580.5617
61.9456
53.7924
69.5930
A
6
2
1
559.3260
42.83950
108.49
474.4151
644.2370
67.8854
60.8283
74.5928
A
6
2
2
458.9779
42.83950
108.49
374.0669
543.8888
59.5015
51.9158
66.6306
A
8
1
1
448.9900
42.83950
108.49
364.0791
533.9010
58.6351
50.9875
65.8123
AB
8
1
2
331.8658
46.92832
108.49
238.8504
424.8811
47.9338
38.4962
56.5169
A
8
2
1
573.1090
42.83950
108.49
488.1980
658.0200
68.9961
62.0008
75.6530
B
8
2
2
441.5482
42.83950
108.49
356.6372
526.4592
57.9854
50.2903
65.1993
AB
10
1
1
342.3595
42.83950
108.49
257.4486
427.2705
48.9390
40.4694
56.7286
AB
10
1
2
300.0973
46.92832
108.49
207.0820
393.1126
44.8237
35.0022
53.6634
A
10
2
1
482.2612
42.83950
108.49
397.3502
567.1722
61.4971
54.0483
68.5188
B
10
2
2
404.8767
42.83950
108.49
319.9657
489.7877
54.7287
46.7809
62.1353
AB
12
1
1
294.1297
42.83950
108.49
209.2187
379.0407
44.2275
35.2425
52.3750
AB
12
1
2
256.8516
46.92832
108.49
163.8363
349.8669
40.4068
29.9415
49.6518
A
12
2
1
429.4629
42.83950
108.49
344.5520
514.3739
56.9225
49.1477
64.1977
B
12
2
2
380.4315
42.83950
108.49
295.5205
465.3425
52.5030
44.3668
60.0503
AB
Code
predm_out$predictmeansPlot$meanPlot
Code
predm_out$predictmeansPlot$ciPlot
Pairwise p-value matrix has p-value (adjusted by ‘BH’ method) below the diagonal at each ‘time’
Code
for (i in6:15){if (i==1) {cat(paste("\n", predm_out[[i]], "\n\n", sp="")) }else{print(names(predm_out[i]))print(gt(as.data.frame.matrix(predm_out[[i]], check.names=FALSE),rownames_to_stub =TRUE)) }}
[1] “0”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
A
1 : 2
0.6112
1
A
2 : 1
0.6112
0.8372
1
A
2 : 2
0.6112
0.8372
0.9799
1
A
[1] “1”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
A
1 : 2
0.1844
1
A
2 : 1
0.0906
0.7005
1
A
2 : 2
0.1726
0.8958
0.7005
1
A
[1] “2”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
A
1 : 2
0.7113
1
A
2 : 1
0.8248
0.7113
1
A
2 : 2
0.7113
0.7113
0.7113
1
A
[1] “3”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
A
1 : 2
0.7211
1
A
2 : 1
0.9929
0.7211
1
A
2 : 2
0.9929
0.7211
0.9929
1
A
[1] “4”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
A
1 : 2
0.9221
1
A
2 : 1
0.9221
0.9926
1
A
2 : 2
0.5331
0.5331
0.5331
1
A
[1] “5”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
A
1 : 2
0.886
1
A
2 : 1
0.886
0.886
1
A
2 : 2
0.0894
0.1062
0.0894
1
A
[1] “6”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
A
1 : 2
0.6539
1
A
2 : 1
0.6539
0.5939
1
A
2 : 2
0.5939
0.6539
0.5939
1
A
[1] “8”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
AB
1 : 2
0.102
1
A
2 : 1
0.0858
0.0015
1
B
2 : 2
0.9025
0.1046
0.0858
1
AB
[1] “10”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
AB
1 : 2
0.5074
1
A
2 : 1
0.0685
0.0299
1
B
2 : 2
0.3653
0.2041
0.3063
1
AB
[1] “12”
1 : 1
1 : 2
2 : 1
2 : 2
Group
1 : 1
1
AB
1 : 2
0.5586
1
A
2 : 1
0.0826
0.0461
1
B
2 : 2
0.2358
0.1088
0.5041
1
AB
Code
PMplot(predm_out$p_valueMatrix)
Highlights
Options in function ‘predictmens’:
{atvar=“time”} sets multiple comparison within each level of ‘time’, it also infects the mean_table and mean plots arrangement. When ‘atvar’ is not NULL ‘pair’ turns to be TRUE automatically. Please try to see any effects for ‘atvar=c(“A”, “B”)’ or ‘atvar=c(“B”, “A”)’.
{trans=function(x) x^(1/1.5)} is a function for back transformation e.g. ‘trans=exp’ for log, ‘trans=function(x) x^2’ for sqrt.
For any GLM, the function produces associated back transformation automatically.
Example_3 GLM with Covariate (Drug data)
Comparison of the effectiveness of three analgesic drugs to a standard drug, morphine (Finney, Probit analysis, 3rd Edition 1971, p.103). 14 groups of mice were tested for response to the drugs at a range of doses. N is total number of mice in each group, R is number responding.
'data.frame': 14 obs. of 5 variables:
$ Drug : Factor w/ 4 levels "Morphine","Amidone",..: 1 1 1 2 2 2 3 3 3 4 ...
$ Dose : num 1.5 3 6 1.5 3 6 0.75 1.5 3 5 ...
$ N : int 103 120 123 60 110 100 90 80 90 60 ...
$ R : int 19 53 83 14 54 81 31 54 80 13 ...
$ log10Dose: num 0.176 0.477 0.778 0.176 0.477 ...
Code
summary(Drug)
Drug Dose N R
Morphine :3 Min. : 0.750 Min. : 60.00 Min. :13.00
Amidone :3 1st Qu.: 1.875 1st Qu.: 65.00 1st Qu.:28.00
Phenadoxone:3 Median : 4.000 Median : 90.00 Median :48.50
Pethidine :5 Mean : 5.982 Mean : 87.93 Mean :45.71
3rd Qu.: 7.125 3rd Qu.:102.25 3rd Qu.:54.75
Max. :20.000 Max. :123.00 Max. :83.00
log10Dose
Min. :-0.1249
1st Qu.: 0.2513
Median : 0.5880
Mean : 0.6030
3rd Qu.: 0.8508
Max. : 1.3010
{level = 0.166} produces 83.4% confidence interval (CI) for predicted means. If a straightforward multiple comparison between predicted means is not feasible, an alternative method involves utilizing predicted means with 83.4% CIs. When the CIs of two predicted means overlap, the associated means are not significantly different at the 0.05 level; otherwise, they are considered significant. The function ‘ci_mcp(LL, UL)’ is used to produce indicating letters.
DT_df is a function for producing data table in html with ‘CSV’ download button. To download the whole data, you need to select ‘All’ entries.
Example_4 Semiparametric Regression with Group (WarsawApts data)
‘WarsawApts’ is a subset of the data set ‘apartments’ in the R package PBImisc. This dataset contains the prices of the apartments which were sold in Warsaw, Poland, during the calendar years 2007 to 2009.
mod1 <-semireg(apartment.price ~ construction.date,smoothZ=list(grp=smZ(construction.date, k=25) ), # Use of a O'Sullivan spline with 25 knots # to model the non-linear relationshipdata = WarsawApts)
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: apartment.price ~ construction.date + (1 | grp)
Data: WarsawApts
REML criterion at convergence: 3544.905
Random effects:
Groups Name Std.Dev.
grp (Intercept) 1.191
Residual 17.812
Number of obs: 409, groups: grp, 27
Fixed Effects:
(Intercept) construction.date
109.38694 0.06692
Code
residplot(mod1$semer)
ANOVA, RANOVA and \(R^2\)
ANOVA
Code
kable(anova(mod1$semer))
Sum Sq
Mean Sq
NumDF
DenDF
F value
Pr(>F)
construction.date
500.9997
500.9997
1
1871.012
1.579118
0.2090447
RANOVA
Code
kable(ranova(mod1$semer))
npar
logLik
AIC
LRT
Df
Pr(>Chisq)
4
-1772.452
3552.905
NA
NA
NA
(1 | grp)
3
-1844.882
3695.763
144.859
1
0
\(R^2\)
Code
R2_glmm(mod1$semer)
# Adjusted R2 for Mixed Models
Total Fixed Random grp
"86.14%" "0.03%" "86.11%" "86.11%"
Example_5 Semiparametric Regression with Multiple Covariates (Caschool data)
A data is collected from 420 schools in United States cross-section from 1998-1999, including county, grade span of district, percent qualifying for CalWORKS, computer per student, district average income, average math score.
county grspan calwpct compstu
Sonoma : 29 KK-06: 61 Min. : 0.000 Min. :0.00000
Kern : 27 KK-08:359 1st Qu.: 4.395 1st Qu.:0.09377
Los Angeles: 27 Median :10.520 Median :0.12546
Tulare : 24 Mean :13.246 Mean :0.13593
San Diego : 21 3rd Qu.:18.981 3rd Qu.:0.16447
Santa Clara: 20 Max. :78.994 Max. :0.42083
(Other) :272
avginc mathscr log.avginc
Min. : 5.335 Min. :605.4 Min. :1.674
1st Qu.:10.639 1st Qu.:639.4 1st Qu.:2.365
Median :13.728 Median :652.4 Median :2.619
Mean :15.317 Mean :653.3 Mean :2.645
3rd Qu.:17.629 3rd Qu.:665.8 3rd Qu.:2.870
Max. :55.328 Max. :709.5 Max. :4.013
Example_6 Semiparametric Regression with Binomial Distribution (indonRespir data)
Indonesian Children’s Health Study of respiratory infections for a cohort of 275 Indonesian children. The data are longitudinal with each child having between 1 and 6 repeated measurements.
idnum respirInfec age vitAdefic female
1 : 6 Min. :0.00000 Min. :0.3333 Min. :0.00000 0:708
2 : 6 1st Qu.:0.00000 1st Qu.:1.7500 1st Qu.:0.00000 1:492
3 : 6 Median :0.00000 Median :3.2500 Median :0.00000
5 : 6 Mean :0.08917 Mean :3.2669 Mean :0.04583
10 : 6 3rd Qu.:0.00000 3rd Qu.:4.5833 3rd Qu.:0.00000
11 : 6 Max. :1.00000 Max. :7.1667 Max. :1.00000
(Other):1164
height stunted visit2 visit3
Min. :-23.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.: -3.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median : 1.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean : 0.9058 Mean :0.1233 Mean :0.1783 Mean :0.1475
3rd Qu.: 4.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. : 25.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
visit4 visit5 visit6 sex
Min. :0.0000 Min. :0.0000 Min. :0.0000 male :708
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 female:492
Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.1525 Mean :0.1625 Mean :0.1675
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000
pollenCount year dayInSeason temperature temperatureResidual
Min. : 0.00 1991:92 Min. : 1.00 Min. :37.31 Min. :-13.98210
1st Qu.: 1.00 1992:81 1st Qu.:22.00 1st Qu.:55.47 1st Qu.: -4.31318
Median : 9.00 1993:87 Median :43.50 Median :64.66 Median : -0.05665
Mean : 44.45 1994:74 Mean :43.29 Mean :63.29 Mean : 0.01246
3rd Qu.: 56.00 3rd Qu.:64.00 3rd Qu.:71.89 3rd Qu.: 4.16160
Max. :440.00 Max. :92.00 Max. :84.33 Max. : 16.43900
rain windSpeed
Min. :0.0000 Min. : 0.000
1st Qu.:1.0000 1st Qu.: 6.850
Median :1.0000 Median : 9.200
Mean :0.8982 Mean : 8.865
3rd Qu.:1.0000 3rd Qu.:10.800
Max. :1.0000 Max. :17.800
xcoord ycoord zcoord grade
Min. : 65.0 Min. :220.0 Min. : 77.0 Min. :0.370
1st Qu.:365.0 1st Qu.:320.0 1st Qu.: 87.0 1st Qu.:0.740
Median :465.0 Median :425.0 Median :102.0 Median :0.970
Mean :469.5 Mean :427.8 Mean :100.4 Mean :1.058
3rd Qu.:610.0 3rd Qu.:525.0 3rd Qu.:112.0 3rd Qu.:1.245
Max. :665.0 Max. :625.0 Max. :122.0 Max. :3.200
mod <-semireg(log(grade) ~ (xcoord+ycoord+zcoord)^2,smoothZ=list(grp_z=smZ(zcoord),grp_geo=smZ(cbind(xcoord, ycoord), type="Ztps") ), data=copper)
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: log(grade) ~ (xcoord + ycoord + zcoord)^2 + (1 | grp_z) + (1 |
grp_geo)
Data: copper
REML criterion at convergence: 318.4061
Random effects:
Groups Name Std.Dev.
grp_geo (Intercept) 0.001134
grp_z (Intercept) 0.016245
Residual 0.289394
Number of obs: 439, groups: grp_geo, 36; grp_z, 8
Fixed Effects:
(Intercept) xcoord ycoord zcoord xcoord:ycoord
0.003085275 -0.001736149 -0.000024458 0.002502775 -0.000002455
xcoord:zcoord ycoord:zcoord
-0.000013942 -0.000040482
fit warnings:
Some predictor variables are on very different scales: consider rescaling
Code
residplot(mod$semer)
ANOVA, RANOVA and \(R^2\)
ANOVA
Code
kable(anova(mod$semer))
Sum Sq
Mean Sq
NumDF
DenDF
F value
Pr(>F)
xcoord
0.0156795
0.0156795
1
2470291720.7005
0.1872202
0.6652403
ycoord
0.0000031
0.0000031
1
1562420796.9768
0.0000373
0.9951299
zcoord
0.4661560
0.4661560
1
1058.4382
5.5661082
0.0184923
xcoord:ycoord
0.0238179
0.0238179
1
1185516.1837
0.2843961
0.5938343
xcoord:zcoord
0.3288000
0.3288000
1
941.0343
3.9260172
0.0478351
ycoord:zcoord
2.1794253
2.1794253
1
870.7646
26.0232993
0.0000004
RANOVA
Code
kable(ranova(mod$semer))
npar
logLik
AIC
LRT
Df
Pr(>Chisq)
10
-159.2031
338.4061
NA
NA
NA
(1 | grp_z)
9
-235.9377
489.8754
153.4693
1
0
(1 | grp_geo)
9
-236.6876
491.3752
154.9691
1
0
\(R^2\)
Code
R2_glmm(mod$semer)
# Adjusted R2 for Mixed Models
Total Fixed Random grp_geo grp_z
"79.2%" "-135.63%" "214.83%" "196.83%" "18%"
The negative \(R^2\) indictates we may fit the model with intercept only.
Code
mod <-semireg(log(grade) ~1,smoothZ=list(grp_z=smZ(zcoord),grp_geo=smZ(cbind(xcoord, ycoord), type="Ztps") ), data=copper)
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: log(grade) ~ 1 + (1 | grp_z) + (1 | grp_geo)
Data: copper
REML criterion at convergence: 255.1183
Random effects:
Groups Name Std.Dev.
grp_geo (Intercept) 0.0009664
grp_z (Intercept) 0.0159936
Residual 0.3009139
Number of obs: 439, groups: grp_geo, 36; grp_z, 8
Fixed Effects:
(Intercept)
0.1598
Code
R2_glmm(mod$semer)
# Adjusted R2 for Mixed Models
Total Fixed Random grp_geo grp_z
"87.71%" "0%" "87.71%" "78.17%" "9.54%"
Cave, V. (2021). Confidence tricks: the 83.4% confidence interval for comparing means. https://vsni.co.uk/blogs/confidence_trick.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Harezlak J., Ruppert, D. & Wand, M.P. (2018). Semiparametric Regression with R. Springer; ISBN: 978-1-4939-8851-8.
Piepho HP. An adjusted coefficient of determination (R2 ) for generalized linear mixed models in one go. Biom J. 2023 Oct;65(7):e2200290. doi: 10.1002/bimj.202200290. Epub 2023 May 1. PMID: 37127864.
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Source Code
---title: "Overview of R Package 'predictmeans'"author: "Dr. Dongwen Luo<br>AgResearch NZ Ltd"date: '`r Sys.Date()`'toc: TRUEtoc-location: lefttoc-depth: 4format: html-document: grid: sidebar-width: 300px body-width: 1100px margin-width: 100px gutter-width: 1.5rem code-fold: true css: AgRtstyles.css self-contained: true code-overflow: wrap code-block-bg: true code-block-border-left: "#31BAE9" code-tools: source: true toggle: false caption: noneexecute: echo: true warning: false message: false fig-align: centereditor_options: chunk_output_type: console---```{r}#| label: setuplibrary(tidyverse)library(predictmeans)library(HRW)library(gridExtra)library(car)library(DT)library(gt)library(knitr)# Data copper can be download at Example 8copper <-read.csv("copper.csv", header=TRUE)DT_df <-function(df){require(DT)stopifnot(is.data.frame(df)) df |> DT::datatable(extensions ='Buttons',options =list(dom ='Blfrtip', # set DOMbuttons =c('csv'),lengthMenu =list(c(10, 20, -1), # declare valuesc(10, 20, "All") # declare titles ), # end of lengthMenu customizationpageLength =10 ), rownames=FALSE)}```## **Outline**::: {style="font-size: 200%;"}- Motivation- Overview - Examples - Future Plan::: ## **Motivation**::: {style="font-size: 150%;"}- **Mixed Effects Models:** Essential for statistical analysis (e.g., split-plot experiments, repeated measurements, meta-analysis). However, popular R packages like 'nlme' and 'lme4' only provide model effects and ANOVA outputs, lacking further inference (predicted means, multiple comparisons, confidence intervals).- **Permutation Tests:** Ideal for small sample sizes, non-normal distributions, or when parametric test assumptions are unmet. They offer robust and flexible hypothesis testing in such scenarios.- **Semiparametric Regression:** Combines parametric and non-parametric effects for predictive variables. Widely valuable across astronomy, biology, medicine, economics, and finance applications.- **Package 'predictmeans':** Offers diagnostic and inference functions for various models ('aov', 'lm', 'glm', etc.). Inferences include predicted means, standard errors, contrasts, multiple comparisons, permutation tests, adjusted R-square, and graphical representations. <https://cran.r-project.org/web/packages/predictmeans/index.html>.::: ## **Overview**### **Main Functions Flowchart**### Contents## Example_1 Split-plot Design (Oats data)In the split-plot design shown here, the treatments are three varieties of oats (Victory, Golden rain and Marvellous) and four levels of nitrogen (0, 0.2, 0.4 and 0.6 cwt). As it is feasible to work with smaller plots for fertiliser than for varieties, the six blocks were initially split into three whole-plots and then each whole-plot was split into four subplots. The varieties were allocated (at random) to the whole-plots within each block, and the nitrogen levels (at random) to the subplots within each whole-plot. In a randomized-block design, we have a hierarchical structure with blocks and then plots within blocks.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}data(Oats, package="nlme")Oats$Block <-factor(Oats$Block, levels=unique(Oats$Block), ordered =TRUE)Oats <-data.frame(Oats)str(Oats)```## Plot```{r}#| fig-width: 8#| fig-height: 7Oats %>%ggplot(aes(x=nitro, y=yield, col=Variety))+geom_point(size=2)+geom_line(linewidth=1)+facet_wrap(vars(Block))+theme_bw(9)```:::### Modelling::: {.panel-tabset .nav-pills}## Model```{r}Oats$nitro <-factor(Oats$nitro)Oats$Whole_plot <- Oats$Varietymod <-lmer(yield ~ nitro*Variety+(1|Block/Whole_plot), data=Oats)summary(mod, corr=FALSE)```## Residual Plot```{r}residplot(mod, group="nitro") # id=TRUE```### HighlightsOptions in function 'residplot':- **{id=TRUE}** is useful to identify any outliers.- **{group="nitro"}** is useful to identify variation pattern along 'nitro' groups.## Test```{r}#perform shapiro-wilk testshapiro.test(resid(mod))#perform bartlett testbartlett.test(yield ~interaction(nitro:Variety), data = Oats)#perform levene test car::leveneTest(yield ~ nitro:Variety, data = Oats)```:::### ANOVA and $R^2$- **ANOVA**```{r}kable(anova(mod))```- **$R^2$**```{r}R2_glmm(mod)```### Predicted Means#### Predicted Means for "nitro"::: {.panel-tabset .nav-pills}## Mean Table```{r}predm_out <-predictmeans(mod, "nitro", meandecr=TRUE, bar=TRUE, adj="BH", prtplt=FALSE)print(predm_out[1:4])gt(predm_out$mean_table$Table, caption ="Group: Letter-based representation of pairwise comparisons at significant level '0.05'")```## Mean Plot```{r}#| fig-width: 8#| fig-height: 7predm_out$predictmeansPlot$meanPlotpredm_out$predictmeansPlot$ciPlotpredm_out$predictmeansBarPlot```## p-value Table & Plot```{r}gt(data.frame(predm_out$`Pairwise p-value`, check.names=FALSE), rownames_to_stub=TRUE, caption ="The matrix has t-value above the diagonal, p-value (adjusted by 'BH' method) below the diagonal") %>%cols_width(everything() ~px(130))``````{r}#| fig-width: 7#| fig-height: 6PMplot(predm_out$p_valueMatrix)```:::#### Predicted Means for "nitro:Variety"::: {.panel-tabset .nav-pills}## Mean Table```{r}predm_out <-predictmeans(mod, "nitro:Variety", meandecr=TRUE, bar=TRUE, adj="BH", prtplt=FALSE)print(predm_out[1:4])gt(predm_out$mean_table$Table, caption ="Group: Letter-based representation of pairwise comparisons at significant level '0.05'")```## Mean Plot```{r}#| fig-width: 8#| fig-height: 7predm_out$predictmeansPlot$meanPlotpredm_out$predictmeansPlot$ciPlotpredm_out$predictmeansBarPlot```## p-value Table & Plot```{r}gt(data.frame(predm_out$`Pairwise p-value`, check.names=FALSE), rownames_to_stub=TRUE,caption ="The matrix has t-value above the diagonal, p-value (adjusted by 'BH' method) below the diagonal") %>%cols_width(everything() ~px(130))``````{r}#| fig-width: 9#| fig-height: 8PMplot(predm_out$p_valueMatrix)```:::#### HighlightsOptions in function 'predictmens':- **{pair=TRUE}** is for multiple comparison without p-value adjusted ('adj' has default value 'none'). - **{adj="BH"}** is for multiple comparison with p-value adjusted by "BH" method ('pair' with be TRUE automatically when 'adj' is not none).- **{bar=TRUE}** is for producing bar chart.- **{meandecr=TRUE}** is for descending means in 'mean_table'.- **{meandecr=FALSE}** is for ascending means in 'mean_table'.- **Pairwise LSD** is available only when adj='none' or 'bonferroni'.### Permutation Test::: {.panel-tabset .nav-pills}## For Model Term```{r}# Permutation Test for model termssystem.time( permlme <-permmodels(model=mod, nperm=999))```## For Multiple Comparisons```{r}# Permutation Test for multiple comparisonspredictmeans(model=mod, modelterm="nitro:Variety", atvar="Variety", adj="BH", permlist=permlme, plot=FALSE)```## For Contrasts```{r}# Permutation Test for specified contrastscm <-rbind(c(-1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),c(0, 0, 1, 0, 0, 0, 0, -1, 0, 0, 0, 0))contrastmeans(model=mod, modelterm="nitro:Variety", ctrmatrix=cm, permlist=permlme)```:::## Example_2 Repeated Measurements (ATP data)ATP containing data from an experiment to study the effects of preserving liquids on the enzyme content of dog hearts. There were 23 hearts and two treatment factors, A and B, each at two levels. Measurements were made of ATP as a percentage of total enzyme in the heart, at one and two hourly intervals during a twelve hour period following initial preservation.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}data(ATP, package="predictmeans")ATP$time.v <-as.numeric(as.character(ATP$time))str(ATP)```## Plot```{r}#| fig-width: 8#| fig-height: 7ATP %>%ggplot(aes(x=time.v, y=ATP, col=A:B))+geom_point(size=2)+geom_line(linewidth=1)+facet_wrap(vars(heart))``````{r}#| fig-width: 8#| fig-height: 7ATP %>%ggplot(aes(x=time.v, y=ATP, col=A:B))+geom_point()+geom_smooth()+facet_grid(vars(A), vars(B))```:::### Modelling::: {.panel-tabset .nav-pills}## Model```{r}mod <-lmer(ATP^1.5~ A*B*time+(1| heart), ATP)summary(mod)```## Residual Plot```{r}residplot(mod, group="A") ```## Test```{r}#perform shapiro-wilk testshapiro.test(resid(mod))#perform bartlett testbartlett.test((ATP)^1.5~interaction(A, B), data = ATP)#perform levene testcar::leveneTest((ATP)^1.5~ A:B, data = ATP)```:::### ANOVA and $R^2$- **ANOVA**```{r}kable(anova(mod))```- **$R^2$**```{r}R2_glmm(mod)```### Predicted Means#### Predicted Means for "A:time"::: {.panel-tabset .nav-pills}## Mean Table```{r}predm_out <-predictmeans(mod, "A:time", atvar="time", adj="BH", bar=TRUE, trans=function(x) x^(1/1.5), prtplt=FALSE)print(predm_out[1:4])gt(predm_out$mean_table$Table, caption ="Group: Letter-based representation of pairwise comparisons at significant level '0.05' at each 'time'")```## Mean Plot```{r}#| fig-width: 8#| fig-height: 7predm_out$predictmeansPlot$meanPlotpredm_out$predictmeansPlot$ciPlotpredm_out$predictmeansBarPlot```## p-value Table- **Pairwise p-value matrix has p-value (adjusted by 'BH' method) below the diagonal at each 'time'**```{r}#| results='asis'for (i in6:15){if (i==1) {cat(paste("\n", predm_out[[i]], "\n\n", sp="")) }else{print(names(predm_out[i]))print(gt(as.data.frame.matrix(predm_out[[i]], check.names=FALSE), rownames_to_stub =TRUE)) }}```:::#### Predicted Means for "B:time"::: {.panel-tabset .nav-pills}## Mean Table```{r}predm_out <-predictmeans(mod, "B:time", atvar="time", adj="BH", bar=TRUE, trans=function(x) x^(1/1.5), prtplt=FALSE)print(predm_out[1:4])gt(predm_out$mean_table$Table, caption ="Group: Letter-based representation of pairwise comparisons at significant level '0.05' at each 'time'")```## Mean Plot```{r}#| fig-width: 8#| fig-height: 7predm_out$predictmeansPlot$meanPlotpredm_out$predictmeansPlot$ciPlotpredm_out$predictmeansBarPlot```## p-value Table- **Pairwise p-value matrix has p-value (adjusted by 'BH' method) below the diagonal at each 'time'**```{r}#| results='asis'for (i in6:15){if (i==1) {cat(paste("\n", predm_out[[i]], "\n\n", sp="")) }else{print(names(predm_out[i]))print(gt(as.data.frame.matrix(predm_out[[i]], check.names=FALSE),rownames_to_stub =TRUE)) }}```:::#### Predicted Means for "A:B:time"::: {.panel-tabset .nav-pills}## Mean Table```{r}predm_out <-predictmeans(mod, "A:B:time", atvar="time", adj="BH", trans=function(x) x^(1/1.5), prtplt=FALSE)print(predm_out[1:4])gt(predm_out$mean_table$Table, caption ="Group: Letter-based representation of pairwise comparisons at significant level '0.05' at each 'time'")```## Mean Plot```{r}#| fig-width: 8#| fig-height: 7predm_out$predictmeansPlot$meanPlotpredm_out$predictmeansPlot$ciPlot```## p-value Table- **Pairwise p-value matrix has p-value (adjusted by 'BH' method) below the diagonal at each 'time'**```{r}#| results='asis'for (i in6:15){if (i==1) {cat(paste("\n", predm_out[[i]], "\n\n", sp="")) }else{print(names(predm_out[i]))print(gt(as.data.frame.matrix(predm_out[[i]], check.names=FALSE),rownames_to_stub =TRUE)) }}```## p-value Plot```{r}#| fig-width: 9#| fig-height: 8PMplot(predm_out$p_valueMatrix)```:::#### HighlightsOptions in function 'predictmens':- **{atvar="time"}** sets multiple comparison within each level of 'time', it also infects the mean_table and mean plots arrangement. When 'atvar' is not NULL 'pair' turns to be TRUE automatically. Please try to see any effects for 'atvar=c("A", "B")' or 'atvar=c("B", "A")'.- **{trans=function(x) x^(1/1.5)}** is a function for back transformation e.g. 'trans=exp' for log, 'trans=function(x) x^2' for sqrt. - For any GLM, the function produces associated back transformation automatically.## Example_3 GLM with Covariate (Drug data)Comparison of the effectiveness of three analgesic drugs to a standard drug, morphine (Finney, Probit analysis, 3rd Edition 1971, p.103). 14 groups of mice were tested for response to the drugs at a range of doses. N is total number of mice in each group, R is number responding.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}data(Drug, package="predictmeans")str(Drug)```## Summary```{r}summary(Drug)```:::### Modelling::: {.panel-tabset .nav-pills}## Model```{r}mod <-glm(cbind(R, N-R) ~ log10Dose*Drug, family=binomial(link ="probit"), data=Drug)summary(mod)```## Residual Plot```{r}residplot(mod) ```:::### ANOVA```{r}kable(car::Anova(mod))```### Covariate Means#### Predicted Means for "Treatment"::: {.panel-tabset .nav-pills}## Mean Table```{r}covariate_out <-covariatemeans(model=mod, modelterm="Drug", covariate="log10Dose",level =0.166, # get 83.4% CInewwd =TRUE) covariate_out$pltdf %>%select(Drug, xvar, Mean, ses, LL, UL) %>%mutate_if(is.numeric, round, digits =4) %>%rename(log10Dose=xvar, Probability=Mean) %>%arrange(log10Dose, Drug) %>%group_by(log10Dose) %>%mutate(Letter=ci_mcp(LL, UL)) %>%# get multi-compare letterungroup() %>%DT_df()```## Mean Plot```{r}#| fig-width: 8#| fig-height: 7covariatemeans(model=mod, modelterm="Drug", covariate="log10Dose", level =0.166, trellis =FALSE)covariatemeans(model=mod, modelterm="Drug", covariate="log10Dose", level =0.166, trans=I, trellis =FALSE, ci=F)```:::### HighlightsOptions in function 'covariatemeans':- **{level = 0.166}** produces 83.4% confidence interval (CI) for predicted means. If a straightforward multiple comparison between predicted means is not feasible, an alternative method involves utilizing predicted means with 83.4% CIs. When the CIs of two predicted means overlap, the associated means are not significantly different at the 0.05 level; otherwise, they are considered significant. The function **'ci_mcp(LL, UL)'** is used to produce indicating letters.**DT_df** is a function for producing data table in html with 'CSV' download button. To download the whole data, you need to select 'All' entries.## Example_4 Semiparametric Regression with Group (WarsawApts data)'WarsawApts' is a subset of the data set 'apartments' in the R package PBImisc. This dataset contains the prices of the apartments which were sold in Warsaw, Poland, during the calendar years 2007 to 2009.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}data(WarsawApts, package="HRW")WarsawApts$district <-factor(WarsawApts$district)names(WarsawApts)[c(6)] <-c("apartment.price")str(WarsawApts)```## Plot```{r}#| fig-width: 9#| fig-height: 7ggplot(WarsawApts, aes(x=construction.date, y=apartment.price, color=district))+geom_point(size=1.6)+theme_bw(15)```:::### Smooth without Group::: {.panel-tabset .nav-pills}## Model```{r}mod1 <-semireg(apartment.price ~ construction.date,smoothZ=list(grp=smZ(construction.date, k=25) ), # Use of a O'Sullivan spline with 25 knots # to model the non-linear relationshipdata = WarsawApts)```## Residual Plot```{r}residplot(mod1$semer) ```:::### ANOVA, RANOVA and $R^2$- **ANOVA**```{r}kable(anova(mod1$semer))```- **RANOVA**```{r}kable(ranova(mod1$semer))```- **$R^2$**```{r}R2_glmm(mod1$semer)```### Prediction::: {.panel-tabset .nav-pills}## Predict Table```{r}pred_out1 <-semipred(mod1, covariate="construction.date", prt=FALSE)DT_df(pred_out1$pred_df %>%mutate_if(is.numeric, round, digits =4))```## Predict Plot```{r}#| fig-width: 9#| fig-height: 7pred_out1$plt```:::### Smooth with Group::: {.panel-tabset .nav-pills}## Model```{r}mod2 <-semireg(apartment.price ~ construction.date*district, smoothZ=list(grp=smZ(construction.date, k=10, by = district, group=TRUE) ),data=WarsawApts)```## Residual Plot```{r}residplot(mod2$semer, group="district") ```:::### ANOVA, RANOVA and $R^2$- **ANOVA**```{r}kable(anova(mod2$semer))```- **RANOVA**```{r}kable(ranova(mod2$semer))```- **$R^2$**```{r}R2_glmm(mod2$semer)```### Prediction with Group::: {.panel-tabset .nav-pills}## Predict Table```{r}pred_out2_1 <-semipred(mod2, "district", "construction.date", prt=FALSE)DT_df(pred_out2_1$pred_df %>%mutate_if(is.numeric, round, digits =4))```## Predict Plot```{r}#| fig-width: 9#| fig-height: 7pred_out2_1$plt```## Contrast TableMean Contrast table for District 'Mokotow' vs 'Srodmiescie' ```{r}pred_out2_2 <-semipred(mod2, "district", "construction.date", contr=c('Mokotow', 'Srodmiescie'), prt=FALSE)pred_out2_2$pred_df %>%mutate(P_value=ifelse(LL*UL >0, "p-value < 0.05", "p-value > 0.05")) %>%mutate_if(is.numeric, round, digits =4) %>%DT_df()```## Contrast Plot```{r}#| fig-width: 8#| fig-height: 7pred_out2_2$plt```:::## Example_5 Semiparametric Regression with Multiple Covariates (Caschool data)A data is collected from 420 schools in United States cross-section from 1998-1999, including county, grade span of district, percent qualifying for CalWORKS, computer per student, district average income, average math score.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}data(Caschool, package="Ecdat")Caschool <- Caschool %>%select(county, grspan, calwpct, compstu, avginc, mathscr) %>%mutate(log.avginc =log(Caschool$avginc))str(Caschool)```## Summary```{r}summary(Caschool)```:::### Smooth Modelling::: {.panel-tabset .nav-pills}## Model```{r}mod <-semireg_tmb(mathscr ~ calwpct+ log.avginc+ compstu+(1|county)+(1|grspan),smoothZ=list(cal_grp=smZ(calwpct, k=12), avg_grp=smZ(log.avginc, k=5),comp_grp=smZ(compstu, k=12) ), data=Caschool)```## Residual Plot```{r}residplot(mod$semer) ```:::### ANOVA and $R^2$- **ANOVA**```{r}kable(Anova(mod$semer))```- **$R^2$**```{r}R2_glmm(mod$semer)```### Prediction::: {.panel-tabset .nav-pills}## calwpct```{r}pred_out1 <-semipred(mod, covariate ="calwpct", prt=FALSE)DT_df(pred_out1$pred_df %>%mutate_if(is.numeric, round, digits =4))pred_out1$plt```## log.avginc```{r}pred_out2 <-semipred(mod, covariate ="log.avginc", prt=FALSE)DT_df(pred_out2$pred_df %>%mutate_if(is.numeric, round, digits =4))pred_out2$plt```## compstu```{r}pred_out3 <-semipred(mod, covariate ="compstu", prt=FALSE)DT_df(pred_out3$pred_df %>%mutate_if(is.numeric, round, digits =4))pred_out3$plt```## Joint Plot```{r}#| fig-width: 9#| fig-height: 8gridExtra::grid.arrange(pred_out1$plt, pred_out2$plt, pred_out3$plt, nrow =1)```:::## Example_6 Semiparametric Regression with Binomial Distribution (indonRespir data)Indonesian Children's Health Study of respiratory infections for a cohort of 275 Indonesian children. The data are longitudinal with each child having between 1 and 6 repeated measurements.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}data(indonRespir, package="HRW")indonRespir[, c(1,5)] <-lapply(indonRespir[, c(1,5)], factor)indonRespir$sex <- indonRespir$femalelevels(indonRespir$sex) <-c("male", "female")str(indonRespir)```## Summary```{r}summary(indonRespir)```:::### Smooth without Group::: {.panel-tabset .nav-pills}## Model```{r}mod1 <-semireg_tmb(respirInfec ~ age+vitAdefic + sex + height + stunted + visit2 + visit3 + visit4 + visit5 + visit6+(1|idnum),smoothZ=list(grp=smZ(age, k=25) ),family = binomial,data = indonRespir)```## Residual Plot```{r}residplot(mod1$semer) ```:::### ANOVA and $R^2$- **ANOVA**```{r}kable(car::Anova(mod1$semer))```- **$R^2$**```{r}R2_glmm(mod1$semer)```### Prediction::: {.panel-tabset .nav-pills}## Predict Table```{r}pred_out1 <-semipred(mod1, "age", "age", prt=FALSE)DT_df(pred_out1$pred_df %>%mutate_if(is.numeric, round, digits =4))```## Predict Plot```{r}#| fig-width: 8#| fig-height: 7pred_out1$plt+geom_rug(data =subset(indonRespir, respirInfec==0), sides ="b", col="deeppink")+geom_rug(data =subset(indonRespir, respirInfec==1), sides ="t", col="deeppink")+ylim(0, 0.18)```:::### Smooth with Group::: {.panel-tabset .nav-pills}## Model```{r}mod2 <-semireg_tmb(respirInfec ~ age+vitAdefic + sex + height + stunted + visit2 + visit3 + visit4 + visit5 + visit6+(1|idnum),smoothZ=list(grp=smZ(age, k=25, by=sex, group=TRUE) ),family = binomial,data = indonRespir)```## Residual Plot```{r}residplot(mod2$semer) ```:::### ANOVA and $R^2$- **ANOVA**```{r}kable(Anova(mod2$semer))```- **$R^2$**```{r}R2_glmm(mod2$semer)```### Prediction with Group::: {.panel-tabset .nav-pills}## Predict Table```{r}pred_out2_1 <-semipred(mod2, "sex", "age", prt=FALSE)DT_df(pred_out2_1$pred_df %>%mutate_if(is.numeric, round, digits =4))```## Predict Plot```{r}#| fig-width: 8#| fig-height: 7pred_out2_1$plt+geom_rug(data =subset(indonRespir, respirInfec==0), sides ="b", col="deeppink") +geom_rug(data =subset(indonRespir, respirInfec==1), sides ="t", col="deeppink")+ylim(0, 0.24)```## Contrast TableMean Contrast table for Sex 'male' vs 'female'```{r}pred_out2_2 <-semipred(mod2, "sex", "age", contr=c('male', 'female'), prt=FALSE)pred_out2_2$pred_df %>%mutate(P_value=ifelse(LL*UL >0, "p-value < 0.05", "p-value > 0.05")) %>%mutate_if(is.numeric, round, digits =4) %>%DT_df()```## Contrast Plot```{r}#| fig-width: 8#| fig-height: 7pred_out2_2$plt```:::## Example_7 Semiparametric Regression with Negtive Binomial Distribution (ragweed data)The ragweed data frame has data on ragweed levels and meteorological variables for 334 days in Kalamazoo, Michigan, U.S.A.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}data(ragweed, package="HRW")ragweed$year <-factor(ragweed$year)str(ragweed)```## Summary```{r}summary(ragweed)```:::### Smooth Modelling::: {.panel-tabset .nav-pills}## Model```{r}mod <-semireg_tmb(pollenCount ~ rain+year+temperatureResidual+dayInSeason+windSpeed,smoothZ=list(s_day=smZ(dayInSeason, by=year, group=TRUE),s_temp=smZ(temperatureResidual),s_wind=smZ(windSpeed) ),family = nbinom1,data=ragweed)```## Residual Plot```{r}residplot(mod$semer) ```:::### ANOVA and $R^2$- **ANOVA**```{r}kable(car::Anova(mod$semer))```- **$R^2$**```{r}R2_glmm(mod$semer)```### Prediction by Day in Season::: {.panel-tabset .nav-pills}## Predict Table```{r}pred_out1_1 <-semipred(mod, "year", "dayInSeason", sm_term="s_day", prt=FALSE)DT_df(pred_out1_1$pred_df %>%mutate_if(is.numeric, round, digits =4))```## Predict Plot```{r}#| fig-width: 8#| fig-height: 7pred_out1_1$plt```:::## Example_8 Semiparametric Regression with Spatial Smooth (copper data)Data copper records the copper grade quality associated with various sampling points' X, Y, and Z coordinates within a designated area.### Data Info::: {.panel-tabset .nav-pills}## Structure```{r}str(copper)```## Summary```{r}summary(copper)```## Data```{r}DT_df(copper)```:::### Modelling::: {.panel-tabset .nav-pills}## Model```{r}mod <-semireg(log(grade) ~ (xcoord+ycoord+zcoord)^2,smoothZ=list(grp_z=smZ(zcoord),grp_geo=smZ(cbind(xcoord, ycoord), type="Ztps") ), data=copper)```## Residual Plot```{r}residplot(mod$semer) ```:::### ANOVA, RANOVA and $R^2$- **ANOVA**```{r}kable(anova(mod$semer))```- **RANOVA**```{r}kable(ranova(mod$semer))```- **$R^2$**```{r}R2_glmm(mod$semer)```The negative $R^2$ indictates we may fit the model with intercept only.```{r}mod <-semireg(log(grade) ~1,smoothZ=list(grp_z=smZ(zcoord),grp_geo=smZ(cbind(xcoord, ycoord), type="Ztps") ), data=copper)``````{r}R2_glmm(mod$semer)```### Semiparametric prediction::: {.panel-tabset .nav-pills}## Z Table```{r}pred_out1 <-semipred(mod, covariate="zcoord", trans=exp, prt=FALSE)DT_df(pred_out1$pred_df %>%mutate_if(is.numeric, round, digits =4))```## Z Plot```{r}#| fig-width: 8#| fig-height: 7pred_out1$plt```## X, Y Table```{r}pred_out2 <-semipred(mod, covariate=c("xcoord", "ycoord"), trans=exp, point =TRUE, prt=FALSE)DT_df(pred_out2$pred_df %>%mutate_if(is.numeric, round, digits =4))```## X, Y 2D Plot```{r}pred_out2$plt```## X, Y 3D Plot```{r}pred_out3 <-semipred(mod, covariate=c("xcoord", "ycoord"), threeD =TRUE, point=FALSE, trans=exp, prt=FALSE)pred_out3$plt```:::## **Future Plan**::: {style="font-size: 200%;"}- Extend **'predictmeans'** to more popular models in R,- Build a comprehensive Shiny App based on **'predictmeans'**,- Make functions in **'predictmeans'** working properly for **glmmTMB** model with a complex variance-covariance structure.:::## **Reference**- Benjamini, Y., and Hochberg, Y. (1995). Controlling the falsediscovery rate: a practical and powerful approach to multipletesting. Journal of the Royal Statistical Society Series B, 57,289--300. <doi:10.1111/j.2517-6161.1995.tb02031.x>.<https://www.jstor.org/stable/2346101>.- Cave, V. (2021). Confidence tricks: the 83.4% confidence interval for comparing means. https://vsni.co.uk/blogs/confidence_trick.- Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).Fitting Linear Mixed-Effects Models Using lme4. Journal ofStatistical Software, 67(1), 1-48. <doi:10.18637/jss.v067.i01>.- Harezlak J., Ruppert, D. & Wand, M.P. (2018). Semiparametric Regression with R. Springer; ISBN: 978-1-4939-8851-8.- Luo, D., Ganesh, S., & Koolaard, J. (2022). Predictmeans: calculatepredicted means for linear models [R package version 1.0.8].<https://CRAN.R-project.org/package=predictmeans>.- Piepho HP. An adjusted coefficient of determination (R2 ) for generalized linear mixed models in one go. Biom J. 2023 Oct;65(7):e2200290. doi: 10.1002/bimj.202200290. Epub 2023 May 1. PMID: 37127864.- R Core Team (2022). R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing, Vienna, Austria.URL <https://www.R-project.org/>.