Clean-up R environment using the code below.
rm(list=ls())
Ward and Quinn (1988) investigated differences in the fecundity (as measured by egg production) of a predatory intertidal gastropod (Lepsiella vinosa) in two different intertidal zones (mussel zone and higher littorinid zone) (Box 3.2) of Quinn and Keough (2002)).
Using the code below, set directory of the data for this activity, and choose data for processing and create data object ward.
Step 1
setwd("C://Users/April Mae Tabonda//Documents//MS Marine Science//Biostat//PLP//RMDs//PLP_7 T-Test")
getwd()
## [1] "C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_7 T-Test"
ward <-read.csv("ward.csv", header=TRUE, sep=",")
head(ward)
## ZONE EGGS
## 1 Mussel 11
## 2 Mussel 8
## 3 Mussel 18
## 4 Mussel 10
## 5 Mussel 9
## 6 Mussel 13
tail(ward)
## ZONE EGGS
## 74 Littor 9
## 75 Littor 6
## 76 Littor 9
## 77 Littor 12
## 78 Littor 10
## 79 Littor 7
Step 2a
Assess assumptions of normality and homogeneity of variance for the null hypothesis that the population mean egg production is the same for both littorinid and mussel zone Lepsiella. Use the code below:
code
summary(ward)
## ZONE EGGS
## Littor:37 Min. : 5.00
## Mussel:42 1st Qu.: 8.50
## Median :10.00
## Mean :10.11
## 3rd Qu.:12.00
## Max. :18.00
str(ward)
## 'data.frame': 79 obs. of 2 variables:
## $ ZONE: Factor w/ 2 levels "Littor","Mussel": 2 2 2 2 2 2 2 2 2 2 ...
## $ EGGS: int 11 8 18 10 9 13 15 12 9 15 ...
To clearly view the differences, create a boxplot using the codes below:
codes
library(ggplot2)
ggplot(data=ward, aes(x=ZONE, y=EGGS)) +
geom_boxplot(outlier.colour = "red", outlier.shape = 8, outlier.size = 4)
Description:
Outlier is indicated in the red colored asterisk. Outlier is considered as the value different from the other values within the data set.
Assess normality of data using quantile-quantile plot
We were instructed to compare scripts and output below with qqplot generated through ggplot.
qqplot
qqnorm(ward$EGGS, col=ward$EGGS)
qqline(ward$EGGS, lty=2)
ggplot
p <-ggplot(data=ward, aes(sample=EGGS))
p + stat_qq() + stat_qq_line()
Levene’s test
This is used to answer the following question:
Is the assumption of equal variances valid?
Package
pacman::p_load(car)
leveneTest(ward$EGGS,ward$ZONE)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.2012 0.655
## 77
It was instructed to skip altogether test of HOV instead we do Welch test which is the default in R. You can get around this altogether by always running a Welch’s t-test, which does not assume homogeneity of variance. This is default option in R, using t-test-Mark White. Welch’s test can be over-ridden by var.equal = TRUE.
Notes from http://www.statmethods.net/stats/withby.html. Using with () and by (). There are two functions that can help write simpler and more efficient code. The with() function applys an expression to a dataset. It is similar to DATA= in SAS with (data, expression) example applying a t-test to a data frame mydata with(mydata, t.test(y~group)). rbind() function combines vector, matrix or data frame by rows.
We calculated the MEAN, VAR, SD using functions “with” and “tapply”.
code
with(ward, rbind(MEAN=tapply(EGGS, ZONE, mean),
VAR=tapply(EGGS, ZONE, var),
SD=tapply(EGGS, ZONE, sd)))
## Littor Mussel
## MEAN 8.702703 11.357143
## VAR 4.103604 5.357143
## SD 2.025735 2.314550
Result from Quinn:
Note that standard deviations (and therefore the variances) are similar and boxplots do not suggest any asymmetry so a parametric t-test is appropriate. From Logan p143 Conclusions- There was no edivence of non- normality (boxplots not grossly asymmetrical) or unequal variances (boxplots very similar size and variances very similar). Hence, the simple studentized (pooled variances) t-test is likely to be reliable.
Step 3
We performed a pooled variances t-test to test the null hypothesis that the population mean egg production is the same for both littorinid and mussel zone in Lepsiella.
t.test(EGGS~ZONE, data=ward, var.equal= TRUE)
##
## Two Sample t-test
##
## data: EGGS by ZONE
## t = -5.3899, df = 77, p-value = 7.457e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.63511 -1.67377
## sample estimates:
## mean in group Littor mean in group Mussel
## 8.702703 11.357143
Description
Conclusions:
Since the population of mean egg production in littorinid and mussel is not the same, the null hypothesis was rejected. As shown in the Two Sample t-test, mean in group of littorinids is ~9 while in the mussel group is ~11. The difference between the mean of the two groups is not equal to 0, as stated in the alternative hypothesis.
Egg production by predatory gastropods (Lepsiella vinosa) was significantly greater (t= -5.39, P< 0.001) in mussel zones than littorinid zones on rocky intertidal shores.
Furness and Bryant (1996) measured the metabolic rates of eight male and six female breeding northern fulmars and were interested in testing the null hypothesis that there was no difference in metabolic rate between the sexes (Box 3.2 of Quinn and Keough(2002)).
To start this activity, I reset first the R’s memory using the code below:
rm(list=ls())
Step 1
I import the Furness and Bryant (1996) data set that will be used in this activity using the codes below.
setwd("C://Users/April Mae Tabonda//Documents//MS Marine Science//Biostat//PLP//RMDs//PLP_7 T-Test")
getwd()
## [1] "C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_7 T-Test"
furness <-read.csv('furness.csv', header = T, sep=',')
head(furness)
## SEX METRATE BODYMASS
## 1 Male 2950.0 875
## 2 Female 1956.1 635
## 3 Male 2308.7 765
## 4 Male 2135.6 780
## 5 Male 1945.6 790
## 6 Female 1490.5 635
tail(furness)
## SEX METRATE BODYMASS
## 9 Female 1091.0 645
## 10 Male 1195.5 788
## 11 Female 727.7 635
## 12 Male 843.3 855
## 13 Male 525.8 860
## 14 Male 605.7 1005
str(furness)
## 'data.frame': 14 obs. of 3 variables:
## $ SEX : Factor w/ 2 levels "Female ",..: 2 1 2 2 2 1 1 1 1 2 ...
## $ METRATE : num 2950 1956 2309 2136 1946 ...
## $ BODYMASS: int 875 635 765 780 790 635 668 640 645 788 ...
Step 2
In this activity, I assessed assumptions of normality and homogeneity of variance for the null hypothesis that the population mean metabolic rate is the same for male and female breeding northern fulmars.
boxplot(METRATE~SEX, furness)
with(furness, rbind(MEAN=tapply(METRATE, SEX, mean),
VAR=tapply(METRATE, SEX, var),
SD=tapply(METRATE, SEX, sd)))
## Female Male
## MEAN 1285.5167 1563.7750
## VAR 177209.4177 799902.5250
## SD 420.9625 894.3727
In order to visually check normality, I used qqplot by running the codes below.
qqnorm(furness$METRATE)
qqline(furness$METRATE)
We were instructed to used qqplot to produce both boxplot and quantile-quantile to visually checked normality.
In this activity, we used Shapiro Wilks to test for normality.
shapiro.test(furness$METRATE)
##
## Shapiro-Wilk normality test
##
## data: furness$METRATE
## W = 0.94392, p-value = 0.4709
Step2b
Levene’s test can be used to answer the following question:
Is the assumption of equal variances valid?
We used the package
pacman::p_load(car)
leveneTest(furness$METRATE,furness$SEX)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 7.4057 0.01856 *
## 12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion While there is no evidence of non-normality (boxplots not grossly asymmetrical), variances are a little unequal (although perhaps not grossly unequal- one of the boxplots is not more than three times smaller than the other). Hence, a separate variances t-test is more appropriate than a pooled variances t-test.
Step 3
Welch’s T-test
We were instructed to perform a separate variances (Welch’s) t-test to test the null hypothesis that the population mean metabolic rate is the same for both male and female breeding northern fulmars.
t.test(METRATE~SEX, furness, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: METRATE by SEX
## t = -0.77317, df = 10.468, p-value = 0.4565
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1075.3208 518.8042
## sample estimates:
## mean in group Female mean in group Male
## 1285.517 1563.775
Conclusion
Do not reject the null hypothesis. Metabolic rate of male breeding northern fulmars was not found differ significantly (t= -0.773, df= 10.468, P= 0.457) from that of females.
To investigate the effects of lighting conditions on the orb-spinning spider webs Elgar et al. (1996) measured the horizontal (width) and vertical (height) dimensions of the webs made 17 spiders under light and dim conditions. Accepting that the webs of individual spiders vary considerably, Elgar et al. (1996) employed a paired design in which each individual spider effectively acts as its own control. A paired t-test performs a one sample t-test on the differences between dimensions under light and dim conditions (Box 3.3 of Quinn and Keough (2002)).
Reset R’s memory using the code:
rm(list=ls())
Step 1
Import the Elgar et al. (1996) data set using the codes below.
setwd("C://Users/April Mae Tabonda//Documents//MS Marine Science//Biostat//PLP//RMDs//PLP_7 T-Test")
getwd()
## [1] "C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_7 T-Test"
elgar <-read.csv('elgar.csv', header=TRUE, sep=',')
head(elgar)
## PAIR VERTDIM HORIZDIM VERTLIGH HORIZLIG
## 1 K 300 295 80 60
## 2 M 240 260 120 140
## 3 N 250 280 170 160
## 4 O 220 250 90 120
## 5 P 160 160 150 180
## 6 R 170 150 110 90
tail(elgar)
## PAIR VERTDIM HORIZDIM VERTLIGH HORIZLIG
## 12 F 270 270 300 330
## 13 G 130 150 160 100
## 14 I 190 210 300 240
## 15 J 190 200 280 190
## 16 U 120 160 190 170
## 17 W 180 160 100 100
str(elgar)
## 'data.frame': 17 obs. of 5 variables:
## $ PAIR : Factor w/ 17 levels "A","B","D+","E",..: 9 10 11 12 13 14 15 1 2 3 ...
## $ VERTDIM : int 300 240 250 220 160 170 300 180 200 80 ...
## $ HORIZDIM: int 295 260 280 250 160 150 290 120 210 120 ...
## $ VERTLIGH: int 80 120 170 90 150 110 260 240 190 120 ...
## $ HORIZLIG: int 60 140 160 120 180 90 120 220 210 150 ...
Description
We are instructed to instead organize the data into the usual long format in which variables are represented in cloumns and rows represent individual replicates, these data have been organized in wide format. Wide format is often used for data containing repeated measures from individual or other sampling units. Whilst, this is not necessary (as paired t-tests can be performed on long format data), traditionally it did allow more compact data management as well as making it easier to calculate the differences between repeated measurements on each individual.
Step 2
We also assessed in this activity whether the differences in web width (and height) in light and dim light conditions are normally distributed.
with(elgar, boxplot(HORIZLIG-HORIZDIM))
with(elgar, boxplot(VERTLIGH-VERTDIM))
INSERT ggplot!
Conclusion
There is no evidence of non-normality for either the difference in widths or heights of webs under light and dim ambient conditions. Therefore, paired t-test are likely to be reliable tests of the hypotheses that the mean web dimensional differences are equal to zero.
Step 3
We were instructed to perform two separate paired t-test to test the respective null hypothesis.
Using the code below, we tested the effect of lighting on web width.
with(elgar, t.test(HORIZLIG,HORIZDIM,paired = TRUE))
##
## Paired t-test
##
## data: HORIZLIG and HORIZDIM
## t = -2.1482, df = 16, p-value = 0.04735
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -91.7443687 -0.6085725
## sample estimates:
## mean of the differences
## -46.17647
Code below was run to test the effect of lighting on web height.
with(elgar,t.test(VERTLIGH,VERTDIM,paired = TRUE))
##
## Paired t-test
##
## data: VERTLIGH and VERTDIM
## t = -0.96545, df = 16, p-value = 0.3487
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -65.79532 24.61885
## sample estimates:
## mean of the differences
## -20.58824
Conclusion
Orb-spinning spider webs were found to be significantly wider (t= -2.148, df= 16, P=0.0047) under dim lighting conditions than light conditions, yet were not found to differ (t= -0.965, df= 16, P= 0.349) in height.
Sokal and Rohlf (1997) presented a dataset comprising the lengths of cheliceral bases (in um) from two samples of chigger (Trombicula lipovskyi) nymphs. These data were used to illustrate two equivalent tests (Mann-Whitney U-test and Wilcoxon two-sample test) of location equality (Box 13.7 of Sokal and Rohlf (1997)).
Reset R’s memory
rm(list=ls())
Step 1
Import the nymph data set using these codes.
setwd("C://Users/April Mae Tabonda//Documents//MS Marine Science//Biostat//PLP//RMDs//PLP_7 T-Test")
getwd()
## [1] "C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_7 T-Test"
nymphs <-read.csv("nymphs.csv", header=TRUE, sep=',')
head(nymphs)
## SAMPLE LENGTH
## 1 Sample A 104
## 2 Sample A 109
## 3 Sample A 112
## 4 Sample A 114
## 5 Sample A 116
## 6 Sample A 118
tail(nymphs)
## SAMPLE LENGTH
## 21 Sample B 108
## 22 Sample B 111
## 23 Sample B 116
## 24 Sample B 120
## 25 Sample B 121
## 26 Sample B 123
str(nymphs)
## 'data.frame': 26 obs. of 2 variables:
## $ SAMPLE: Factor w/ 2 levels "Sample A","Sample B": 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH: int 104 109 112 114 116 118 118 119 121 123 ...
Step 2
We assessed assumptions of normality and homogeneity of variance for the null hypothesis.
boxplot(LENGTH~SAMPLE, data=nymphs)
In your PLP, use ggplot to produce both boxplot and quantile-quantile plot to check for normality, see scripts from Example 1. Label the plots.
with(nymphs,rbind(MEAN=tapply(LENGTH,SAMPLE,mean),
VAR=tapply(LENGTH,SAMPLE,var),
SD=tapply(LENGTH,SAMPLE,sd)))
## Sample A Sample B
## MEAN 119.68750 111.800000
## VAR 53.29583 60.177778
## SD 7.30040 7.757434
pacman::p_load(car)
leveneTest(nymphs$LENGTH, nymphs$SAMPLE)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.0749 0.7867
## 24
Conclusions
While there is no evidence of unequal variance, there is some (possible) evidence of non-normality (boxplots slightly asymmetrical). These data will therefore be analysed using a non-parametric. Mann-Whitney-Wilcoxon signed rank test.
Step 3
Perform a Mann-Whitney Wilcoxon test to investigate the null hypothesis that the mean length of cheliceral bases is the same for the two samples of nymph of chigger (Trombicular lipovskyi).
wilcox.test(LENGTH~SAMPLE,nymphs)
## Warning in wilcox.test.default(x = c(104L, 109L, 112L, 114L, 116L, 118L, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: LENGTH by SAMPLE
## W = 123.5, p-value = 0.0232
## alternative hypothesis: true location shift is not equal to 0
Conclusions
Reject the null hypothesis. The length of the cheliceral base is significantly longer in nymphs from sample 1 (W= 123.5, df= 24, P= 0.023) than those from sample 2.