General instructions.

Read the question, then restate the question in literate documentation and state briefly how you intend to solve the problem. Then fill in the code chunks with R code to produce the solution. The variables assign infinite Inf values are place holders for unit testing.

The unit test will count the number of points you earn from computing the correct solution, the remaining points will given for providing documentation for the solution, excepting Exercises 4 and 7; these will be scored manually. I count 75 points for the exercise; 40 points are guaranteed if you pass the unit tests. The unit tests will print a cumulative point total, or a message if the test fails. If you’re not sure what the question is asking, use the unit test to reverse engineer the expected answer.

The first exercise has been completed for you. I’m using a verbose style, you can work on developing your own style of restating the problem. You don’t need to interpret the results for this homework; your task is primarily to translate mathematical notation to computer form.

Finally, this homework is largely testing your ability to work with mathematical formula in R and to understand how R uses vectors as atomic (individual) units. I am not asking you to write functions or to use iterative constructs (for loops). You should be able to solve all the exercises by creating, assigning and indexing vectors. Note - in R, a number such as 0.124611 is not a scalar value, it’s a vector of length 1.

Exercise 1 (10 points)

Write the R code to calculate Fisher’s LSD, given by

\[ LSD_{i, j} = t_{\alpha / 2, d.f.} \times \sqrt{RMS \left(\frac{1}{n_i} + \frac{1}{n_j} \right)} \]

Let $ RMS = 0.124611 $ , \(n_i = n_j = 6\) and \(t_{\alpha / 2, d.f.} = 2.085963447\).

Answer

We are asked to compute Fisher’s LSD for parameters

Parameter Value
\(RMS\) \(0.124611\)
\(n_i\) \(6\)
\(n_j\) \(6\)
\(t_{\alpha / 2, d.f.}\) \(2.085963447\)

We calculate this using \[ LSD_{i, j} = t_{\alpha / 2, d.f.} \times \sqrt{RMS \left(\frac{1}{n_i} + \frac{1}{n_j} \right)} \]

n_i <- 6
n_j <- n_i
print(lsd <- 2.085963447*sqrt(0.124611*(1/n_i + 1/n_j)))
## [1] 0.425132451539

Unit Test

## [1] 8

Exercise 2 (10 points)

The probablity of an observation \(x\), when taken from a normal population with mean \(\mu\) and variance \(\sigma^2\) is calculated by \[ L (x ; \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}^{}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}} \] For values of \(x = \{ 0.1, 0.2 \}\), write R code to calculate \(L (x ; \mu = 0.1, \sigma^2 = 2)\). Assign the values to l.1 and l.2.

Use a variable named x, you will be reusing this formula later. You should also declare variables for \(\mu\) and \(\sigma^2\)

Answer

We are asked to calculate the probability of \(x\) from a normal population with a mean \(\mu\) and variance \(\sigma^2\).

We will set \(x_1 = 0.1\), \(x_2 = 0.2\), \(\mu = 0.1\), and \(\sigma_2 = 2\). We then compute the liklihood of \(x_1\) and \(x_2\) using the following formula:

\[ L (x ; \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}^{}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}} \]

Parameter Value
\(x\) \({0.1, 0.2}\)
\(\mu\) \(0.1\)
\(\sigma^2\) \(2\)
l.1 <- Inf
l.2 <- Inf

#Set variables
x_1 <- 0.1
x_2 <- 0.2
mu <- 0.1
sigma_2 <- 2

#Compute liklihood with provided formula
l.1 <- (1/(sqrt(sigma_2)*sqrt(2*pi)))*exp(-((x_1-mu)^2)/(2*sigma_2))
l.2 <- (1/(sqrt(sigma_2)*sqrt(2*pi)))*exp(-((x_2-mu)^2)/(2*sigma_2))

#Print the liklihood to the console
print(l.1)
## [1] 0.282094791774
print(l.2)
## [1] 0.281390435607

Unit Test

## [1] 12
## [1] 16

Exercise 3 (12 points)

Part 1 (10 points)

A common task for a statistician supporting experimental science is to determine the number of replicates required to achieve a desired statistical power and significance. This simplest method is to calculate

\[ r \geqslant 2 [z_{\alpha / 2} + z_{\beta}]^2 \left( \frac{\sigma}{\delta} \right)^2 \]

For \(z_{\alpha / 2} = 1.959964\) and \(z_{\beta} = 0.8416212\), write the R code to compute required replicates (\(r\)) when \(\sigma = \{4, 8 \}\) and \(\delta = \{ 5, 10 \}\), Compute the decimal value and assign the next largest integer values for \(r\) to variables r.4.5.integer, r.4.10.integer, r.8.5.integer, r.8.10.integer. There is an R function to find next largest integer, you may need to search help.

Answer

We are asked to calculate the number of replicates required to achieve a desired statistical power and signficance. We are given \(z_{\alpha / 2} = 1.959964\), \(z_{\beta} = 0.8416212\), \(\sigma = \{4, 8 \}\), and \(\delta = \{ 5, 10 \}\). We are then asked to calculate the next largest integer values for \(r\). We will use base R’s ceiling() function to do this. To calculate \(r\) we will use the following formula:

\[ r \geqslant 2 [z_{\alpha / 2} + z_{\beta}]^2 \left( \frac{\sigma}{\delta} \right)^2 \]

Parameter Value
\(z_{\alpha / 2}\) \(1.959964\)
\(z_{\beta}\) \(0.8416212\)
\(\sigma_1\) \(4\)
\(\sigma_2\) \(8\)
\(\delta_1\) \(5\)
\(\delta_2\) \(10\)
r.4.5.integer <- Inf
r.4.10.integer <- Inf
r.8.5.integer <- Inf
r.8.10.integer <- Inf

#Assign variable values
z_alpha <- 1.959964
z_beta <- 0.8416212
sigma_1 <- 4
sigma_2 <- 8
delta_1 <- 5
delta_2 <- 10

#Calculate the number of replicates required and assign the next highest integer value
r.4.5.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_1/delta_1)^2)
r.4.10.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_1/delta_2)^2)
r.8.5.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_2/delta_1)^2)
r.8.10.integer <- ceiling(2 * (z_alpha + z_beta)^2 * (sigma_2/delta_2)^2)

Unit Test

## [1] 18
## [1] 20
## [1] 22
## [1] 24

Part 2 (2 pts)

Write a test showing when the next largest integer value is different from the rounded integer value. That is, programmatically compare each pair of values.

Answer

We are asked to write a test to show when the next largest integer value is different from the rounded integer value. To do this we will bind the \(r\) vectors into a data.frame and use the dplyr package’s mutate() function with an ifelse() to determine if the rounded value is the same as the next highest integer or different.

require(dplyr)
r.4.5 <- (2 * (z_alpha + z_beta)^2 * (sigma_1/delta_1)^2)
r.4.10 <- (2 * (z_alpha + z_beta)^2 * (sigma_1/delta_2)^2)
r.8.5 <- (2 * (z_alpha + z_beta)^2 * (sigma_2/delta_1)^2)
r.8.10 <- (2 * (z_alpha + z_beta)^2 * (sigma_2/delta_2)^2)

#Bind the vectors into a dataframe
df_compare <- data.frame(operation = c("r.4.5","r.4.10","r.8.5","r.8.10"),rounded = c(round(r.4.5),round(r.4.10),round(r.8.5),round(r.8.10)),integer = c(r.4.5.integer,r.4.10.integer,r.8.5.integer,r.8.10.integer))

#Compare the rounded values to the integer values
df_compare <- df_compare %>%
  mutate(test = ifelse(rounded == integer,"The Same","Different"))

#Show the data.frame with the test results
df_compare
##   operation rounded integer      test
## 1     r.4.5      10      11 Different
## 2    r.4.10       3       3  The Same
## 3     r.8.5      40      41 Different
## 4    r.8.10      10      11 Different

Exercise 4 (9 pts)

Part 1 (3 pts)

Write R code to compute

\[7 - 1 \times 0 + 3 \div 3\]

Type this in verbatim, using only numbers, -,* and /, with no parenthesis. Do you agree with the result? Explain why, one or two sentences.

Answer

We are asked to compute \(7 - 1 \times 0 + 3 \div 3\) in R.

7-1*0+3/3
## [1] 8

I do agree with the result. Order of operations states that the multiplicatio and division needed to be computed before the addition and subtraction so we get 7-0+1 or 8 as our result. ## Part 2 (3 pts)

According to “Why Did 74% of Facebook Users Get This Wrong?” (http://www.classroomprofessor.com/teaching-math/why-did-74pc-of-facebook-users-get-this-wrong/), most people would compute the result as 1. Use parenthesis ( ) to force R to produce this result.

Answer

The parenthesis mustbe placed around \(7-1\) for the equation to compute to 1. \[0 \times (7-1) + 3 \div 3 = 1\]

(7-1)*0+3/3
## [1] 1

Part 3 (3 pts)

Several respondents to the survey cited in Part 2 gave the answer 6. Add one set of parenthesis to force R to produce this result.

To get 6 we must place the parenthesis on the 0+3 argument to produce 7-1*1 or 6.

Answer

We must place the parenthesis around \(0+3\) for the equation to compute to 6.

\[7 - 1 \times (0 + 3) \div 3\]

7-1*(0+3)/3
## [1] 6

Exercise 5 (10 points)

Create a sequence of values from -3 to 3, inclusive, incremented by 0.1. Do this programmatically, not by writing c(-1.0, 0.9, …). You may do this in a sequence of steps and print each step, if that helps.

Assign the sequence to a variable x and use this in the likelihood formula from Exercise 1. Assign the results a variable lik.vec.

Answer

We are asked to create a sequence from -3 to 3 by increments of 0.1. To do this we will use R’s base seq() function with the by argument. We then are asked to compute the liklihood of this vector of \(x\) values using the below formula:

\[ L (x ; \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}^{}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}} \]

Parameter Value
\(x\) \({-3,-2.9,...2.9,3.0}\)
\(\mu\) \(0.1\)
\(\sigma^2\) \(2\)
x <- c(Inf)
lik.vec <- c(Inf)

#Declare mu and sigma_2
mu <- 0.1
sigma_2 <- 2

#Set x to the sequence -3:3 by 0.1
x<-seq.int(-3,3,by=.1)

#Compute the liklihood vector using the equation from Exercise 2
lik.vec <- (1/(sqrt(sigma_2)*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma_2))

Unit Test

## [1] 26
## [1] 28
## [1] 30
## [1] 32
if(!is.infinite(x)) {
  plot(x,lik.vec)
}
## Warning in if (!is.infinite(x)) {: the condition has length > 1 and only
## the first element will be used

Exercise 6 (10 points)

Calculate values of LSD from exercise 1, for values of \(n_i\) from \({2,\ldots,8}\) and \(n_j\) from \({8,\ldots,2}\), by creating two different vectors for \(n_i\) and \(n_j\). Use these to compute LSD for combinations of \((n_i, n_j)={(2,8), (3,7), \ldots , (7,3), (8,2)}\). You should be able to do this by redefining n_i and n_j from the first exercise and copying the code that assigns a value to lsd.

Write code to determine the combination of \(n_i\) and \(n_j\) that produce the smallest LSD, and assign these to the variable best.n_i and best.n_j. Assign the minimum LSD value to best.LSD. (Hint - use the functions min and which).

Answer

We are asked to calculate values of LSD using the formula in Exercise 1. We are given \(n_i = 2:8\) and \(n_j = 8:2\). Once we have assigned those values we are asked to determine the combination of \(n_i\) and \(n_j\) to calculate the minimum LSD value. To do this we will bind the result vector to the \(n_i\) and \(n_j\) vectors into a data.frame. We will then subset the data.frame using the base min() and which() functions. We then need to simply assign the associated values to the variables best.n_i, best.n_j, and best.lsd.

best.lsd <- Inf
best.n_i <- Inf
best.n_j <- Inf

#Assign the variables
n_i <- 2:8
n_j <- 8:2

#Compute the lsd vector
lsd <- 2.085963447*sqrt(0.124611*(1/n_i + 1/n_j))

#Assign the associated vectors to lsd_df
lsd_df <- data.frame(n_i,n_j,lsd)

#Subset lsd_df to find the minimum combination of n_i and n_j
lsd_df <- lsd_df[which.min(lsd_df$lsd),]

#Assign the minimum values to the associated variable names
best.lsd <- lsd_df$lsd
best.n_i <- as.numeric(lsd_df$n_i)
best.n_j <- as.numeric(lsd_df$n_j)
## [1] 34
## [1] 36
## [1] 38
## [1] 39
## [1] 40

Exercise 7 (14 points)

This is an exercise in Software Archaeology. See https://media.pragprog.com/articles/mar_02_archeology.pdf.

Repeat this block of code in literate form, placing each line in a code chunk, so that the results of the function call is written to the literate document. Find each function in R Help, and after the code chunk provide a brief description of the purpose of the function and describe the parameters, where possible using mathematical notation.

You may include additional print, plot, or summary function calls if this helps you explain the code. For example, you may rewrite x <-runif(50,0,20) as print(x <-runif(50,0,20)) to show the value(s) assigned to x

set.seed(0719)
x <- runif(50,0,20)
y <- 5 + 3.5*x + rnorm(50)
mod <- lm(y~x) 
summary(mod)
plot(x,y)
abline(mod)

Answer

This chunk of code sets the random number sequence so that the linear model is reproducible across all uses. The arguments take 0719 as an integer to set the seed of the random number.

set.seed(0719)

This chunk of code generates random numbers with length \(n = 50\) from \(min = 0\) to \(max = 20\) and assigns the vector to x.

x <- runif(50,0,20)

This chunk of code creates the vector \(y\) by adding \(5 + 3.5x +\) the normal distribution with mean zero, standard deviation of 1, and with 50 observations.

y <- 5 + 3.5 * x + rnorm(50)

This chunk of coede generates a linear regression model of \(y\) to \(x\) and assigns the model to the variable \(mod\).

mod <- lm(y~x) 

This chunk of code prints summary statistics of the linear model to the console.This includes the residuals, coefficients, R-squared, and the F-statistic of the model.

summary(mod)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##          Min           1Q       Median           3Q          Max 
## -2.452516446 -0.583999104  0.222929268  0.699037299  1.548980762 
## 
## Coefficients:
##                 Estimate   Std. Error   t value   Pr(>|t|)    
## (Intercept) 4.8946297283 0.2795669124  17.50790 < 2.22e-16 ***
## x           3.5179508658 0.0241352343 145.75996 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.998357782 on 48 degrees of freedom
## Multiple R-squared:  0.99774584, Adjusted R-squared:  0.997698879 
## F-statistic:  21245.967 on 1 and 48 DF,  p-value: < 2.220446e-16

This chunk of code creates a plot of \(x\) to \(y\) and prints the plot to the consol. The second line of this code chunk adds the regression line of the model \(mod\) to the plot.

plot(x,y)
abline(mod)

Total points from unit tests

total.points
## [1] 40