Probabilities
For a normal random variable X with mean 5, and standard deviation 2, find the probability that X is less than 3. Find the probability that X is greater than 4.5.
pnorm(3.0, 5,2)
## [1] 0.1586553
pnorm(4.5, 5,2, lower.tail=FALSE)
## [1] 0.5987063
#or
1-pnorm(4.5,5,2)
## [1] 0.5987063
Find the value K so that P(X > K) = 0.05.
qnorm(0.95, 5, 2)
## [1] 8.289707
#or
qnorm(0.05, 5, 2, lower.tail=FALSE)
## [1] 8.289707
When tossing a fair coin 10 times, 1nd the probability of seeing no heads. Find the probability of seeing exactly 5 heads. Find the probability of seeing more than 7 heads.
dbinom(x = 0, size = 10, prob = 0.5)
## [1] 0.0009765625
dbinom(x = 5, size = 10, prob = 0.5)
## [1] 0.2460938
1-pbinom(q=7, size=10, prob=0.5)
## [1] 0.0546875
Univariate Distributions
Simulate a sample of 100 random data points from a normal distribution with mean 100 and standard deviation 5, and store the result in a vector.
Plot a histogram and a boxplot of the vector you just created.
Calculate the sample mean and standard deviation.
Calculate the median and interquartile range.
Using the data above, test the hypothesis that the mean equals 100 (using t.test).
Test the hypothesis that mean equals 90.
Repeat the above two tests using a Wilcoxon signed rank test. Compare the p-values with those from the t-tests you just did.
x <- rnorm(n=100, mean=100, sd=5)
par(mfrow=c(1,2))
hist(x)
boxplot(x)
mean(x)
## [1] 100.334
sd(x)
## [1] 4.981055
median(x)
## [1] 100.3069
IQR(x)
## [1] 7.454873
t.test(x, mu=100)
##
## One Sample t-test
##
## data: x
## t = 0.67044, df = 99, p-value = 0.5041
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 99.3456 101.3223
## sample estimates:
## mean of x
## 100.334
t.test(x, mu=90)
##
## One Sample t-test
##
## data: x
## t = 20.747, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 90
## 95 percent confidence interval:
## 99.3456 101.3223
## sample estimates:
## mean of x
## 100.334
wilcox.test(x, mu=100)
##
## Wilcoxon signed rank test with continuity correction
##
## data: x
## V = 2684, p-value = 0.5858
## alternative hypothesis: true location is not equal to 100
wilcox.test(x, mu=90)
##
## Wilcoxon signed rank test with continuity correction
##
## data: x
## V = 5045, p-value < 2.2e-16
## alternative hypothesis: true location is not equal to 90
Use the t.test function to compare PupalWeight by T_treatment.
Repeat above using a Wilcoxon rank sum test.
pupae <- read.csv("pupae.csv")
t.test(PupalWeight~T_treatment, data=pupae, var.equal=TRUE)
##
## Two Sample t-test
##
## data: PupalWeight by T_treatment
## t = 1.4385, df = 82, p-value = 0.1541
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.007715698 0.048012420
## sample estimates:
## mean in group ambient mean in group elevated
## 0.3222973 0.3021489
wilcox.test(PupalWeight~T_treatment, data=pupae)
## Warning in wilcox.test.default(x = c(0.244, 0.319, 0.221, 0.28, 0.257,
## 0.333, : cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: PupalWeight by T_treatment
## W = 1017.5, p-value = 0.1838
## alternative hypothesis: true location shift is not equal to 0
Run the following code to generate some data:
base <- rnorm(20, 20, 5)
x <- base + rnorm(20,0,0.5)
y <- base + rnorm(20,1,0.5)
Using a two-sample t-test compare the means of x and y, assume that the variance is equal for the two samples.
t.test(x,y, var.equal=TRUE)
##
## Two Sample t-test
##
## data: x and y
## t = -0.44588, df = 38, p-value = 0.6582
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.491474 2.870057
## sample estimates:
## mean of x mean of y
## 20.99835 21.80906
Repeat the above using a paired t-test. How has the p-value changed?
t.test(x,y, paired=TRUE)
##
## Paired t-test
##
## data: x and y
## t = -6.6634, df = 19, p-value = 2.26e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.0653594 -0.5560568
## sample estimates:
## mean of the differences
## -0.8107081
Which test is most appropriate? The paired t-test is more appropriate because X and Y are not independent.
Simple linear regression
Perform a simple linear regression of Frass on PupalWeight. Produce and inspect the following:
Plots of the data
plot(Frass ~ PupalWeight, data = pupae)
Summary of the model
model <- lm(Frass ~ PupalWeight, data = pupae)
summary(model)
##
## Call:
## lm(formula = Frass ~ PupalWeight, data = pupae)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77463 -0.21560 -0.01064 0.26259 0.89392
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5046 0.1838 2.745 0.00746 **
## PupalWeight 4.2994 0.5773 7.448 9.1e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3332 on 81 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.4065, Adjusted R-squared: 0.3991
## F-statistic: 55.47 on 1 and 81 DF, p-value: 9.1e-11
Diagnostic plots.
par(mfrow=c(1,2))
plot(model)
All of the above for a subset of the data, where Gender is 0, and CO2_treatment is 400.
plot(Frass ~ PupalWeight, data = pupae, subset=Gender==0 & CO2_treatment == 400)
model <- lm(Frass ~ PupalWeight, data = pupae, subset=Gender==0 & CO2_treatment == 400)
summary(model)
##
## Call:
## lm(formula = Frass ~ PupalWeight, data = pupae, subset = Gender ==
## 0 & CO2_treatment == 400)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26720 -0.08526 -0.01585 0.13171 0.28181
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6751 0.1845 3.660 0.00156 **
## PupalWeight 4.1189 0.6430 6.405 3.01e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1657 on 20 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.6723, Adjusted R-squared: 0.6559
## F-statistic: 41.03 on 1 and 20 DF, p-value: 3.006e-06
par(mfrow=c(1,2))
plot(model)