These commands help visualize the F distribution, make F probability calculations, e.g., to calculate p-values, and run F tests in the context of analysis of variance (ANOVA). ANOVA can be used to analyze how the mean of a quantitative response variable varies with the (>2) levels of a categorical predictor variable. Some commands require mosaic (noted in comments) so you should install that package.
library(mosaic)
Visualizing the F distribution using plotDist():
# plotDist() is in the mosaic package
plotDist(dist='f', df1=5, df2=79, col="blue")
plotDist(dist='f', df1=3, df2=79, col="red", add=TRUE)
Making F cummulative probability calculations using pf():
# Note that we are usually use positive F values and
# are interested in upper tail probabilities.
# so use lower.tail=FALSE
F.prob <- pf(q=2.82, df1=3, df2=46, lower.tail=FALSE)
F.prob
## [1] 0.04925044
Making and visualizing \(F\) cummulative probability calculations using xpf():
# xpf() is in the mosaic package
xpf(q=2.82, df1=3, df2=46, lower.tail=FALSE)
## [1] 0.04925044
Finding critical \(F^∗\) values using qt():
# since F tests are always upper, we want 0.5 in the upper tail
F.crit <- qf(p=0.05, df1=3, df2=46, lower.tail=FALSE)
F.crit
## [1] 2.806845
Finding and visualizing critical \(F^∗\) values using xqt():
# xpf() is in the mosaic package
# since F tests are always upper, we want 0.5 in the upper tail
# same as 0.95 in the low end
xqf(p=0.95, df1=3, df2=46)
## [1] 2.806845
lm()# We will use the dataset "chickwts" from the 'datasets' package included with R
# Chick weights reared on six feed types
data(chickwts)
head(chickwts) #extra
## weight feed
## 1 179 horsebean
## 2 160 horsebean
## 3 136 horsebean
## 4 227 horsebean
## 5 217 horsebean
## 6 168 horsebean
Run an ANOVA to test the null hypothesis that \(\mu_1 = \mu_2 = ... = \mu_6\) using lm():
# lm() constructs a linear model of a quantitative variable
# if predictor is a categorical factor, this is ANOVA
# if predictor is quantitative, this is regression
chick.mod <- lm(weight~feed, data=chickwts)
# lm() just makes the model, to look at an ANOVA table we can use anova()
anova(chick.mod)
## Analysis of Variance Table
##
## Response: weight
## Df Sum Sq Mean Sq F value Pr(>F)
## feed 5 231129 46226 15.365 5.936e-10 ***
## Residuals 65 195556 3009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Summarize the model and examine \(t\)-tests for the individual feed types compared to the baseline factor level using summary():
summary(chick.mod)
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
Perform multiple comparisons for all pairwise levels of the predictor variable using TukeyHSD():
TukeyHSD(chick.mod)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = x)
##
## $feed
## diff lwr upr p adj
## horsebean-casein -163.383333 -232.346876 -94.41979 0.0000000
## linseed-casein -104.833333 -170.587491 -39.07918 0.0002100
## meatmeal-casein -46.674242 -113.906207 20.55772 0.3324584
## soybean-casein -77.154762 -140.517054 -13.79247 0.0083653
## sunflower-casein 5.333333 -60.420825 71.08749 0.9998902
## linseed-horsebean 58.550000 -10.413543 127.51354 0.1413329
## meatmeal-horsebean 116.709091 46.335105 187.08308 0.0001062
## soybean-horsebean 86.228571 19.541684 152.91546 0.0042167
## sunflower-horsebean 168.716667 99.753124 237.68021 0.0000000
## meatmeal-linseed 58.159091 -9.072873 125.39106 0.1276965
## soybean-linseed 27.678571 -35.683721 91.04086 0.7932853
## sunflower-linseed 110.166667 44.412509 175.92082 0.0000884
## soybean-meatmeal -30.480519 -95.375109 34.41407 0.7391356
## sunflower-meatmeal 52.007576 -15.224388 119.23954 0.2206962
## sunflower-soybean 82.488095 19.125803 145.85039 0.0038845
Make a figure of weight as a function of feed type.
# can include 95% CI for means if we include the "Hmisc" package
library(Hmisc)
ggplot(chickwts, aes(x=feed, y=weight)) +
geom_jitter(width=0.1) +
stat_summary(fun.data="mean_cl_normal", col="blue")
Reorder the feed typed based on the mean response and add groups from the Tukey HSD multiple comparison test.
ggplot(chickwts, aes(x=reorder(feed,weight,mean), y=weight)) +
geom_jitter(width=0.1) +
stat_summary(fun.data="mean_cl_normal", col="blue") +
annotate("text", x=1:6, y=420, col="black", label=c("A","AB", "B", "BC", "C", "C")) +
labs(x="feed type", y="chick weight")
Note that these confidence intervals are not adjusted for the multiple comparisons!