This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
11.S.16: Three college students collected several pillbugs from a woodpile and used them in an experiment in which they measured the time, in seconds, that it took for a bug to move 6 inches within an apparatus they had created. There were three groups of bugs: one group was exposure to strong light, for one group the stimulus was moisture, and a third group served as a control. The data are shown in the table of the HW6 file. Clearly the SDs show that the variability is not constant between groups, so a transformation is needed. Taking the natural logarithm of each observation results in the following dotplots and summary statistics. For the transformed data, the ANOVA SS(between) is 53.1103 and the SS(within) is 23.5669. (a) State the null hypothesis in symbols (b) Construct the ANOVA table and test the null hypothesis. Let α=0.05.
#(a) H0: µ1 = µ2 = µ3; µ1 = Light, µ2 = Moisture, µ3 = Control
#(b) ANOVA table
Pillbug_Speed = matrix(c(2, 57, 59, 53.1103, 23.5669, 76.6772, 26.5552, 0.4135, "_"), nrow = 3, ncol = 3)
colnames(Pillbug_Speed) = c("df", "SS", "MS")
rownames(Pillbug_Speed) = c("Between", "Within", "Total")
Pillbug_Speed
## df SS MS
## Between "2" "53.1103" "26.5552"
## Within "57" "23.5669" "0.4135"
## Total "59" "76.6772" "_"
#EXTRA
#Without Transformation
#We can reject the H0 (mu1 = mu2 = mu 3) b/c the p-value = 1.19e-09.
speed = c(23, 12, 29, 12, 5, 47, 18, 30, 8, 45, 36, 27, 29, 33, 24, 17, 11, 25, 6, 34, 170, 182, 286, 103, 330, 55, 49, 31, 132, 150, 165, 206, 200, 270, 298, 100, 162, 126, 229, 140, 229, 126, 140, 260, 330, 310, 45, 248, 280, 140, 160, 192, 159, 62, 180, 32, 54, 149, 201, 173)
group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
group = factor(group)
boxplot(speed~group, names = c("Light","Moisture","Control"))
fit = aov(speed~group)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 291449 145725 30.1 1.19e-09 ***
## Residuals 57 275925 4841
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
FIT_ONE = lm(speed~group);
par(mfrow = c(1,2))
#Checking Normality
qqnorm(residuals(FIT_ONE), main = "QQ plot", ylab = "Residual")
#Checking variance homogeneity
plot(fitted.values(FIT_ONE), residuals(FIT_ONE), xlab = "Fitted values", ylab = "Residuals", main = "Residual vs Fitted")
12.2.1: Arrange the plots in the HW6 document in order of their correlations (from closest to -1 to closest to +1).
#We're taking a look at r, which measures the strength and direction (+ or -) of how linearly related Y is with X.
#(d) is the closest to -1
#(a) is close to -0.4
#(b) seems to be close to 0.0
#(c) is close to +0.4
#(e) is the closest to +1
#In ascending order from -1 to +1:
## (d), (a), (b), (c), (e)
12.2.2. Consider the following data (a) Plot the data. Does there appear to be a relationship between X and Y? Is it linear or nonlinear? Weak or Strong? (b) Compute the sample correlation coefficient between X and Y. (c) Is the significant evidence that X and Y are correlated? Construct a test using α=0.05.
#Samples
x_data <- c(6, 1, 3, 2, 5)
sd(x_data)
## [1] 2.073644
mean(x_data)
## [1] 3.4
y_data <- c(6, 7, 3, 2, 14)
sd(y_data)
## [1] 4.722288
mean(y_data)
## [1] 6.4
#EXTRA
#HW6 handout TABLE
Unknown_Data_Source = matrix(c(6, 1, 3, 2, 5, 3.4, 2.1, 6, 7, 3, 2, 14, 6.4, 4.7), nrow = 7, ncol = 2)
colnames(Unknown_Data_Source) = c("X", "Y")
rownames(Unknown_Data_Source) = c("", "", "", "", "", "Mean", "SD")
Unknown_Data_Source
## X Y
## 6.0 6.0
## 1.0 7.0
## 3.0 3.0
## 2.0 2.0
## 5.0 14.0
## Mean 3.4 6.4
## SD 2.1 4.7
#(a) Plot of Data --> There seems to be a weak and positive linear relationship between the data acquired from X and from Y. More data points would be extremely useful to determine this preliminary (weak, yet present) correlation seen between X & Y.
plot(x_data, y_data, main = "Scatterplot", xlab = "x data", ylab = "y data", las = 1, xlim = c(0, 20), ylim = c(0, 20), cex = 3, col = 4)
#Linear regression
abline(lm(x_data~y_data), col = 2)
#(b) Obtaining the correlation coefficient (r) = 0.4391186, tells us that it is not a stong linear relationship, but that there is a slight positive correlation between the values of the x_data and the values of the y_data.
cor(x_data, y_data)
## [1] 0.4391186
#(c) We're testing the null hypothesis for this data set with the Pearson's correlation test. But more importantly, we are not only collecting the Pearson correlation coefficient (ρ) but we're also collecting the 95% CI by using α=0.05. We use this to examine the strength of the linear relationship b/w 2 numeric variables.
#H0: ρ = 0 vs. HA: ρ ≠ 0, or we're looking for a 95% confidence interval for ρ.
#p-value = 0.4594 (p-value > α (= 0.05)), so we are unable to reject the null hypothesis of ρ = 0. There is, however, a +0.44 correlation between these two data lists.
cor.test(x_data, y_data)
##
## Pearson's product-moment correlation
##
## data: x_data and y_data
## t = 0.84656, df = 3, p-value = 0.4594
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7234117 0.9524048
## sample estimates:
## cor
## 0.4391186
#We'll also conduct the Spearman rank correlation as an alternative to the Pearson correlation test. The spearman correlation coefficient (ρs) is obtained by replacing the actual values of the two sets of data (X and Y) with the ranks of the data elements. Therefore, it is a helpful tool when the data used is non-normal or has extreme values.
#It is for this reason that "rho" (the spearman correlation coefficient) is smaller than the Pearson correlation coefficient- 0.2 and 0.4391186 respectively.
#All in all, there is a weak correlation between X and Y as shown by the correlation coefficients of both the Pearson and the Spearman correlation tests.
cor.test(x_data, y_data, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: x_data and y_data
## S = 16, p-value = 0.7833
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.2