Read the question, then restate the question in literate documentation and state briefly how you intend to solve the problem. Then fill in the code chunks with R code to produce the solution. The variables assign infinite Inf values are place holders for unit testing.
The unit test will count the number of points you earn from computing the correct solution, the remaining points will given for providing documentation for the solution, excepting Exercises 4 and 7; these will be scored manually. I count 75 points for the exercise; 40 points are guaranteed if you pass the unit tests. The unit tests will print a cumulative point total, or a message if the test fails. If you’re not sure what the question is asking, use the unit test to reverse engineer the expected answer.
The first exercise has been completed for you. I’m using a verbose style, you can work on developing your own style of restating the problem. You don’t need to interpret the results for this homework; your task is primarily to translate mathematical notation to computer form.
Finally, this homework is largely testing your ability to work with mathematical formula in R and to understand how R uses vectors as atomic (individual) units. I am not asking you to write functions or to use iterative constructs (for loops). You should be able to solve all the exercises by creating, assigning and indexing vectors. Note - in R, a number such as 0.124611 is not a scalar value, it’s a vector of length 1.
Write the R code to calculate Fisher’s LSD, given by
\[ LSD_{i, j} = t_{\alpha / 2, d.f.} \times \sqrt{RMS \left(\frac{1}{n_i} + \frac{1}{n_j} \right)} \]
Let $ RMS = 0.124611 $ , \(n_i = n_j = 6\) and \(t_{\alpha / 2, d.f.} = 2.085963447\).
We are asked to compute Fisher’s LSD for parameters
| Parameter | Value |
|---|---|
| \(RMS\) | \(0.124611\) |
| \(n_i\) | \(6\) |
| \(n_j\) | \(6\) |
| \(t_{\alpha / 2, d.f.}\) | \(2.085963447\) |
We calculate this using \[ LSD_{i, j} = t_{\alpha / 2, d.f.} \times \sqrt{RMS \left(\frac{1}{n_i} + \frac{1}{n_j} \right)} \]
n_i <- 6
n_j <- n_i
print(lsd <- 2.085963447*sqrt(0.124611*(1/n_i + 1/n_j)))
## [1] 0.425132451539
## [1] 8
The probablity of an observation \(x\), when taken from a normal population with mean \(\mu\) and variance \(\sigma^2\) is calculated by \[
L (x ; \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}^{}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}}
\] For values of \(x = \{ 0.1, 0.2 \}\), write R code to calculate \(L (x ; \mu = 0.1, \sigma^2 = 2)\). Assign the values to l.1 and l.2.
Use a variable named x, you will be reusing this formula later. You should also declare variables for \(\mu\) and \(\sigma^2\)
We are asked to calculate the probability of \(x\) from a normal population with a mean \(\mu\) and variance \(\sigma^2\).
We will set \(x_1 = 0.1\), \(x_2 = 0.2\), \(\mu = 0.1\), and \(\sigma_2 = 2\). We then compute the liklihood of \(x_1\) and \(x_2\) using the following formula:
\[ L (x ; \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}^{}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}} \]
| Parameter | Value |
|---|---|
| \(x\) | \({0.1, 0.2}\) |
| \(\mu\) | \(0.1\) |
| \(\sigma^2\) | \(2\) |
l.1 <- Inf
l.2 <- Inf
#Set variables
x_1 <- 0.1
x_2 <- 0.2
mu <- 0.1
sigma_2 <- 2
#Compute liklihood with provided formula
l.1 <- (1/(sqrt(sigma_2)*sqrt(2*pi)))*exp(-((x_1-mu)^2)/(2*sigma_2))
l.2 <- (1/(sqrt(sigma_2)*sqrt(2*pi)))*exp(-((x_2-mu)^2)/(2*sigma_2))
#Print the liklihood to the console
print(l.1)
## [1] 0.282094791774
print(l.2)
## [1] 0.281390435607
## [1] 12
## [1] 16
A common task for a statistician supporting experimental science is to determine the number of replicates required to achieve a desired statistical power and significance. This simplest method is to calculate
\[ r \geqslant 2 [z_{\alpha / 2} + z_{\beta}]^2 \left( \frac{\sigma}{\delta} \right)^2 \]
For \(z_{\alpha / 2} = 1.959964\) and \(z_{\beta} = 0.8416212\), write the R code to compute required replicates (\(r\)) when \(\sigma = \{4, 8 \}\) and \(\delta = \{ 5, 10 \}\), Compute the decimal value and assign the next largest integer values for \(r\) to variables r.4.5.integer, r.4.10.integer, r.8.5.integer, r.8.10.integer. There is an R function to find next largest integer, you may need to search help.
We are asked to calculate the number of replicates required to achieve a desired statistical power and signficance. We are given \(z_{\alpha / 2} = 1.959964\), \(z_{\beta} = 0.8416212\), \(\sigma = \{4, 8 \}\), and \(\delta = \{ 5, 10 \}\). We are then asked to calculate the next largest integer values for \(r\). We will use base R’s ceiling() function to do this. To calculate \(r\) we will use the following formula:
\[ r \geqslant 2 [z_{\alpha / 2} + z_{\beta}]^2 \left( \frac{\sigma}{\delta} \right)^2 \]
| Parameter | Value |
|---|---|
| \(z_{\alpha / 2}\) | \(1.959964\) |
| \(z_{\beta}\) | \(0.8416212\) |
| \(\sigma_1\) | \(4\) |
| \(\sigma_2\) | \(8\) |
| \(\delta_1\) | \(5\) |
| \(\delta_2\) | \(10\) |
r.4.5.integer <- Inf
r.4.10.integer <- Inf
r.8.5.integer <- Inf
r.8.10.integer <- Inf
#Assign variable values
z_alpha <- 1.959964
z_beta <- 0.8416212
sigma_1 <- 4
sigma_2 <- 8
delta_1 <- 5
delta_2 <- 10
#Calculate the number of replicates required and assign the next highest integer value
r.4.5.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_1/delta_1)^2)
r.4.10.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_1/delta_2)^2)
r.8.5.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_2/delta_1)^2)
r.8.10.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_2/delta_2)^2)
## [1] 18
## [1] 20
## [1] 22
## [1] 24
Write a test showing when the next largest integer value is different from the rounded integer value. That is, programmatically compare each pair of values.
We are asked to write a test to show when the next largest integer value is different from the rounded integer value. To do this we will bind the \(r\) vectors into a data.frame and use the dplyr package’s mutate() function with an ifelse() to determine if the rounded value is the same as the next highest integer or different.
require(dplyr)
r.4.5 <- (2 * (z_alpha + z_beta)^2 * (sigma_1/delta_1)^2)
r.4.10 <- (2 * (z_alpha + z_beta)^2 * (sigma_1/delta_2)^2)
r.8.5 <- (2 * (z_alpha + z_beta)^2 * (sigma_2/delta_1)^2)
r.8.10 <- (2 * (z_alpha + z_beta)^2 * (sigma_2/delta_2)^2)
#Bind the vectors into a dataframe
df_compare <- data.frame(operation = c("r.4.5","r.4.10","r.8.5","r.8.10"),rounded = c(round(r.4.5),round(r.4.10),round(r.8.5),round(r.8.10)),integer = c(r.4.5.integer,r.4.10.integer,r.8.5.integer,r.8.10.integer))
#Compare the rounded values to the integer values
df_compare <- df_compare %>%
mutate(test = ifelse(rounded == integer,"The Same","Different"))
#Show the data.frame with the test results
df_compare
## operation rounded integer test
## 1 r.4.5 10 11 Different
## 2 r.4.10 3 3 The Same
## 3 r.8.5 40 41 Different
## 4 r.8.10 10 11 Different
Write R code to compute
\[7 - 1 \times 0 + 3 \div 3\]
Type this in verbatim, using only numbers, -,* and /, with no parenthesis. Do you agree with the result? Explain why, one or two sentences.
We are asked to compute \(7 - 1 \times 0 + 3 \div 3\) in R.
7-1*0+3/3
## [1] 8
I do agree with the result. Order of operations states that the multiplicatio and division needed to be computed before the addition and subtraction so we get 7-0+1 or 8 as our result. ## Part 2 (3 pts)
According to “Why Did 74% of Facebook Users Get This Wrong?” (http://www.classroomprofessor.com/teaching-math/why-did-74pc-of-facebook-users-get-this-wrong/), most people would compute the result as 1. Use parenthesis ( ) to force R to produce this result.
The parenthesis mustbe placed around \(7-1\) for the equation to compute to 1. \[0 \times (7-1) + 3 \div 3 = 1\]
(7-1)*0+3/3
## [1] 1
Several respondents to the survey cited in Part 2 gave the answer 6. Add one set of parenthesis to force R to produce this result.
To get 6 we must place the parenthesis on the 0+3 argument to produce 7-1*1 or 6.
We must place the parenthesis around \(0+3\) for the equation to compute to 6.
\[7 - 1 \times (0 + 3) \div 3\]
7-1*(0+3)/3
## [1] 6
Create a sequence of values from -3 to 3, inclusive, incremented by 0.1. Do this programmatically, not by writing c(-1.0, 0.9, …). You may do this in a sequence of steps and print each step, if that helps.
Assign the sequence to a variable x and use this in the likelihood formula from Exercise 1. Assign the results a variable lik.vec.
We are asked to create a sequence from -3 to 3 by increments of 0.1. To do this we will use R’s base seq() function with the by argument. We then are asked to compute the liklihood of this vector of \(x\) values using the below formula:
\[ L (x ; \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}^{}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}} \]
| Parameter | Value |
|---|---|
| \(x\) | \({-3,-2.9,...2.9,3.0}\) |
| \(\mu\) | \(0.1\) |
| \(\sigma^2\) | \(2\) |
x <- c(Inf)
lik.vec <- c(Inf)
#Declare mu and sigma_2
mu <- 0.1
sigma_2 <- 2
#Set x to the sequence -3:3 by 0.1
x<-seq.int(-3,3,by=.1)
#Compute the liklihood vector using the equation from Exercise 2
lik.vec <- (1/(sqrt(sigma_2)*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma_2))
## [1] 26
## [1] 28
## [1] 30
## [1] 32
if(!is.infinite(x)) {
plot(x,lik.vec)
}
## Warning in if (!is.infinite(x)) {: the condition has length > 1 and only
## the first element will be used
Calculate values of LSD from exercise 1, for values of \(n_i\) from \({2,\ldots,8}\) and \(n_j\) from \({8,\ldots,2}\), by creating two different vectors for \(n_i\) and \(n_j\). Use these to compute LSD for combinations of \((n_i, n_j)={(2,8), (3,7), \ldots , (7,3), (8,2)}\). You should be able to do this by redefining n_i and n_j from the first exercise and copying the code that assigns a value to lsd.
Write code to determine the combination of \(n_i\) and \(n_j\) that produce the smallest LSD, and assign these to the variable best.n_i and best.n_j. Assign the minimum LSD value to best.LSD. (Hint - use the functions min and which).
We are asked to calculate values of LSD using the formula in Exercise 1. We are given \(n_i = 2:8\) and \(n_j = 8:2\). Once we have assigned those values we are asked to determine the combination of \(n_i\) and \(n_j\) to calculate the minimum LSD value. To do this we will bind the result vector to the \(n_i\) and \(n_j\) vectors into a data.frame. We will then subset the data.frame using the base min() and which() functions. We then need to simply assign the associated values to the variables best.n_i, best.n_j, and best.lsd.
best.lsd <- Inf
best.n_i <- Inf
best.n_j <- Inf
#Assign the variables
n_i <- 2:8
n_j <- 8:2
#Compute the lsd vector
lsd <- 2.085963447*sqrt(0.124611*(1/n_i + 1/n_j))
#Assign the associated vectors to lsd_df
lsd_df <- data.frame(n_i,n_j,lsd)
#Subset lsd_df to find the minimum combination of n_i and n_j
lsd_df <- lsd_df[which.min(lsd_df$lsd),]
#Assign the minimum values to the associated variable names
best.lsd <- lsd_df$lsd
best.n_i <- as.numeric(lsd_df$n_i)
best.n_j <- as.numeric(lsd_df$n_j)
## [1] 34
## [1] 36
## [1] 38
## [1] 39
## [1] 40
This is an exercise in Software Archaeology. See https://media.pragprog.com/articles/mar_02_archeology.pdf.
Repeat this block of code in literate form, placing each line in a code chunk, so that the results of the function call is written to the literate document. Find each function in R Help, and after the code chunk provide a brief description of the purpose of the function and describe the parameters, where possible using mathematical notation.
You may include additional print, plot, or summary function calls if this helps you explain the code. For example, you may rewrite x <-runif(50,0,20) as print(x <-runif(50,0,20)) to show the value(s) assigned to x
set.seed(0719)
x <- runif(50,0,20)
y <- 5 + 3.5*x + rnorm(50)
mod <- lm(y~x)
summary(mod)
plot(x,y)
abline(mod)
This chunk of code sets the random number sequence so that the linear model is reproducible across all uses. The arguments take 0719 as an integer to set the seed of the random number.
set.seed(0719)
This chunk of code generates random numbers with length \(n = 50\) from \(min = 0\) to \(max = 20\) and assigns the vector to x.
x <- runif(50,0,20)
This chunk of code creates the vector \(y\) by adding \(5 + 3.5x +\) the normal distribution with mean zero, standard deviation of 1, and with 50 observations.
y <- 5 + 3.5 * x + rnorm(50)
This chunk of coede generates a linear regression model of \(y\) to \(x\) and assigns the model to the variable \(mod\).
mod <- lm(y~x)
This chunk of code prints summary statistics of the linear model to the console.This includes the residuals, coefficients, R-squared, and the F-statistic of the model.
summary(mod)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.452516446 -0.583999104 0.222929268 0.699037299 1.548980762
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.8946297283 0.2795669124 17.50790 < 2.22e-16 ***
## x 3.5179508658 0.0241352343 145.75996 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.998357782 on 48 degrees of freedom
## Multiple R-squared: 0.99774584, Adjusted R-squared: 0.997698879
## F-statistic: 21245.967 on 1 and 48 DF, p-value: < 2.220446e-16
This chunk of code creates a plot of \(x\) to \(y\) and prints the plot to the consol. The second line of this code chunk adds the regression line of the model \(mod\) to the plot.
plot(x,y)
abline(mod)
total.points
## [1] 40