In this homework you will duplicate and extend results from Homework 1, and start working with lists, matrices and data frames. You are required to complete these exercises without using iteration - no for loops.

Each of the three Exercises is divided into Parts. Most parts can be completed with 1-2 lines of code. You might find the exercise easier if you included one code chuck per part. You don’t need to comment on each chunk (except where explicitly stated), unless that helps your coding, but you need to provide a summary paragraph for each function defined in this Homework. You may count Part 4 as the summary paragraph for Exercise 3.

Exercise 1 (20 points)

Part 1

Using the formula for log-likelihood from Homework 1, Exercise 2, write a function that accepts x,mu ($\mu$) and sigma.2 ($\sigma^2$) as parameters and returns the corresponding log-likelihood value. Make mu and sigma.2 optional parameters with values mu=0 and sigma.2=1.

Write a one paragraph summary of the function.

The function log_lik takes the arguments $x$, ‘sigma.2’ ($\sigma^2$), and ‘mu’ ($\mu$) where ‘sigma.2’ and ‘mu’ are optional parameters and takes the default values of 1 and 0 respectively. The function calculates the probability of an observations $x$, when taken from a normal population with mean $\mu$ and variance $\sigma^2$.

log_lik <- function(x,sigma.2=1,mu=0){
  
  (1/(sqrt(sigma.2)*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma.2))
  
}

Part 2

Create a data frame table with one column, x, containing x values from -3 to +3, incremented by 0.1.

  #create a vector of the sequence (x) from -3 to 3 by 0.1
 x <- seq(-3,3,by=0.1)
  #coerce the vector to a data.frame
 logLik.dat <- as.data.frame(x)

Part 3

Append to this table a data column logLik1. Let sigma and using mu take the default values. Compute the values for logLik1 by using apply, your data frame as a parameter, and your log-likelihood function.

#use the apply function on logLik.data with margin 1 and the function log_lik
logLik.dat$logLik1<-apply(logLik.dat,1,log_lik)

Part 4

Append a second data column logLik2, by using x from your data table as an argument to your function and use mu=0.1 and sigma.2=2 as a parameters. This data column should contain values identical to Homework1, Exercise 5.

#use apply on logLik to define logLik2
#we will need to define the first array in logLik.data as X using bracket notation [1]
#adjusting sigma and mu is completed by redefining the variables as arguments in apply
logLik.dat$logLik2<-apply(logLik.dat[1],1,log_lik,mu=0.1,sigma.2=2)

What happens if you try to use apply in this step?

As long as you let apply know to pass the first array [1] $x$ to the function and pass the arguments you can still use apply. It is probably not the easiest way to create a second column, but it calculates the correct values using the apply function asked in this exercise.

Part 5

Append a third data column logLik3, let sigma.2=2 but let mu take the default.

#Using the same premise as the previous exercise we will create logLik3 with the apply
#function
logLik.dat$logLik3<-apply(logLik.dat[1],1,log_lik,sigma.2=2)

The data frame you create should produce a plot below with three slightly different curves. Comment on the effect that the parameters have on the plot.

Log-likelihood plot Unit Test

## [1] 2

Unit Test

## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12

Exercise 2

Part 1 (10 points)

Using the formula for required replicates from Homework 1, Exercise 3, write a function required.replicates that requires two parameters, delta and sigma and has two optional parameters, z.alpha = 1.959964 and z.beta=0.8416212.

Write this function to return a list with 3 elements. The first element will be a list containing the four parameters used to calculate r.exact and r.integer. You may choose your own name, but this element should be accessible by something like result[[1]] - see the unit tests. The other two elements will be values r.exact and r.integer corresponding to the decimal and next largest integer values calculated for Exercise 3.

Write a one paragraph summary of the function.

The function required.replicates takes the arguments ‘delta’ ($\delta$), sigma ($\sigma$), “z.alpha” ($z_{\alpha / 2}$) with the default of 1.959964, and ‘z.beta’ ($z_{\beta}$) with the default of 0.8416212. The required.replicates function calculates the number of replicates required to achieve a desired statistical power and significance. The output of the function is a list with elements rslt, r.exact, and r.integer. rslt is a list of the values of the arguments of the function. r.exact and r.integer are integers that display the exact number of required replicates in numeric format and r.integer will display the next highest integer.

required.replicates <- function(delta,sigma,z.alpha = 1.959964,z.beta=0.8416212){

  list(
    rslt=list(delta=delta,sigma=sigma,z.alpha=z.alpha,z.beta=z.beta) ,
    r.exact = 2 * (z.alpha + z.beta)^2 * (sigma/delta)^2,
    r.integer = ceiling(2 * (z.alpha + z.beta)^2 * (sigma/delta)^2) )

}

Unit Test

## [1] 14
## [1] 16
## [1] 17
## [1] 18
## [1] 20

Part 2 (5 points)

Create two vectors, one for values $\sigma = {2,4,6,8,10,12}$ and one for values $\delta = {1,2,5,10}$. Next, create two matrices, sigma.mat and delta.mat, by repeating (rep) the vectors for $\sigma$ and $\delta$. sigma.mat will contain $\sigma$ repeated in rows i.e.,

\[ \left\{ \begin{array}{cccccc} 2 & 4 & 6 & 8 & 10 & 12\\ 2 & 4 & 6 & 8 & 10 & 12\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \end{array} \right\} \]

delta.mat will contain $\delta$ repeated in columns:

\[ \left\{ \begin{array}{cccccc} 1 & 1 & 1 & 1 & 1 & 1\\ 2 & 2 & 2 & 2 & 2 & 2\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \end{array} \right\} \]

Don’t hardcode the number of repetitions for each vector, compute this from the dimensions of the other vector. Print both vectors and both matrices.

#create the vector sigma which is the sequence from 2:12 by 2
sigma<- seq(2,12,by=2)
sigma

## [1]  2  4  6  8 10 12

#create the sequence {1,2,5,8,10}
delta <- c(1,2,5,8,10)
delta

## [1]  1  2  5  8 10

#create the matrix sigma.mat where sigma is repeated over rows with length delta. Set 
#the number of columns to the length of sigma
sigma.mat <- matrix(rep(sigma,length(delta)),ncol=length(sigma),byrow=T)
sigma.mat

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    2    4    6    8   10   12
## [2,]    2    4    6    8   10   12
## [3,]    2    4    6    8   10   12
## [4,]    2    4    6    8   10   12
## [5,]    2    4    6    8   10   12

#create the matrix delta.mat where delta is repeated over columns with length sigma. 
#set the number of rows equal to the length of delta
delta.mat <- matrix(rep(delta,length(sigma)),nrow=length(delta))
delta.mat

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    1    1    1    1
## [2,]    2    2    2    2    2    2
## [3,]    5    5    5    5    5    5
## [4,]    8    8    8    8    8    8
## [5,]   10   10   10   10   10   10

(There is no unit test for Part 2)

Part 3 (10 points)

Use the two matrices as parameters to your replicate function. Assign to the variable replicates.table only the values for integer replicates (you should be able to do this with a simple $ accessor) and print the results.

#use the required replicates function and assign the parameters delta to the matrix
#delta.mat and sigma to sigma.mat. We only are concerned with the r.integer value
replicates.table <- required.replicates(delta=delta.mat,sigma=sigma.mat)$r.integer
replicates.table

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]   63  252  566 1005 1570 2261
## [2,]   16   63  142  252  393  566
## [3,]    3   11   23   41   63   91
## [4,]    1    4    9   16   25   36
## [5,]    1    3    6   11   16   23

Unit test

## [1] 22
## [1] 24
## [1] 26
## [1] 28
## [1] 30

Hint

Start with $\sigma^2 = {4,8}$ and $\delta = {5,10}$, since you’ve computed those in the Homework 1, and you should know what those values will be. When you’re satisfied with your results, replace your data vectors the complete set of values.

Exercise 3

Part 1 (10 points)

Download the data from http://www.itl.nist.gov/div898/strd/univ/michelso.html (see the link titled Data file (ASCII Format)) and convert this into a text file that can be read into R. You will need to strip the header comments. You may use any of read.table, read.delim, read.csv, but you will need to upload your data file to Dropbox. Use the same naming convention as for homework, but use an appropriate (i.e. .csv suffix). You may choose to include a header in your data file, or not, but the resulting data frame must have a column named ‘Y’ to pass unit tests.

Read this data into a variable named michelso.dat. Confirm that your data has been read properly by comparing this plot with the graph on the web site. Note that the graph won’t be plotted if there is no variable named michelso.dat with a data column Y.

Provide a brief description of the data. If you use a file, comment on the file type.

The data is a single variable of 100 observations of the speed of light (in millions of meters per second). The study was conducted by Michelson in 1879. The file is saved as a comma separated value with the header Y.

#read in the data from http://www.itl.nist.gov/div898/strd/univ/michelso.html and assign it to the object michelso.dat
michelso.dat <- read.table("C:/Users/Jeremiah Lowhorn/Desktop/STAT 700/Week 2/data.csv", header=TRUE, quote="\"", stringsAsFactors=FALSE)

Unit Test

## [1] 32
## [1] 34

Part 2 (10 points)

Write a function that is given a single vector and returns a list of three values: bar.y, s.2 and r_1. Your function will return an object of class “univariate”.

Use the formula from the web site: \[ \bar{y} = \frac{\sum^n_{i = 1} y_i} {n} \] \[ s^2 = \frac{\sum^n_{i = 1} (y_i - \bar{y})^2} {n - 1} \] \[ r_1 = \frac{\sum^n_{i = 2} (y_i - \bar{y}) (y_{i - 1} - \bar{y})} {\sum^n_{i = 1} (y_i - \bar{y})^2} \]

Inside your function, you will need to calculate the following values: 1. Calculate $\mu$ without using mean, but you can use any other (base) functions. Assign this to bar.y. 2. Calculate $s^2$, again without using the function var. Assign this to s.2 3. Calculate the autoregression coefficient $r_1$ described on the web site and assign this to r_1.

You are required to compute all three values without using for! Hint - for r_1, create two different index vectors that are offset from each other by 1.

Combine these three variables in a list, assign “univariate” to that list by assigning class(my.list) <- "univariate" inside your function, then return that list from your function.

Another tip - use the data from http://www.itl.nist.gov/div898/strd/univ/numacc1.html to test your function before you apply it to the Michelso data set. This data set has published values and only three variables.

Use the Michelso data as a parameter to your function and assign the results to a variable michelso.univ. Print the result.

univ.sum.stats <- function(x){
  
  #Set y_2 equal to 2 to the length of the input x
  y_2 <- x[2:(length(x))]
  
  #set y_1 euqal to 1 to the length of x - 1
  y_1 <- x[1:(length(x)-1)]
  
  #calculate the mean, variance, and autoregression coefficient with the provided         
  #formulas
  bar.y <- sum(x)/length(x)
  s.2 <- sum((x-(sum(x)/length(x)))^2)/(length(x)-1)
  r_1 <- sum((y_2-bar.y)*(y_1-bar.y))/sum((x-bar.y)^2) 

  #create a list of the mean, variance, and autoregression coefficient
  my.list<-  list(
  bar.y = bar.y,
  s.2 = s.2,
  r_1 = r_1
  )
  
  #set the class of the above created list to univariate and call my.list
  class(my.list) <- "univariate"
  my.list
}

#call the above function on the michelso.dat Y vector
michelso.univ <- univ.sum.stats(michelso.dat$Y)

Unit Test

## [1] 36
## [1] 38
## [1] 40
## [1] 42

Part 3 (5 points)

The code below will print a summary of michelso.univ if you’ve define the class properly. In object-oriented terms, summary.univariate overrides the generic function summary to work with instances of class univariate.

Modify this function to print a message for each value returned by your function, i.e. the output should read something like ‘estimated autocorrelation is 0.535199668621’. Your results will print as part of the unit test.

summary.univariate <- function(value) {
  #use the paste function to append the statements required in the exercise
  print(paste("Estimated mean is:",value$bar.y))
  print(paste("Estimated variance is:",value$s.2))
  print(paste("Estimated autocorrelation is:", value$r_1))
}

if(length(find("michelso.univ"))>0) {
  if(class(michelso.univ)=="univariate") {
    print(total.points <- total.points + 2)
    print("Testing summary function for class univariate")
    summary(michelso.univ)
  } else {
    print("return value not of class 'univariate'")
  }
} else {
  print("michelso.univ not defined")
}

## [1] 44
## [1] "Testing summary function for class univariate"
## [1] "Estimated mean is: 299.8524"
## [1] "Estimated variance is: 0.00624266666666649"
## [1] "Estimated autocorrelation is: 0.535199668621263"

Part 4 (5 points)

Write a brief summary of your function in R Help (Rd) format. Include in the documentation the fields:

name
title
description
arguments
value

Put your documentation below in between backticks so that it prints verbatim. You don’t need to process or typset the documentation.

See https://cran.r-project.org/doc/manuals/R-exts.html#Writing-R-documentation-files for discussion on Rd format.


\name{univ.sum.stats}
\title{Univariate Summary Statistics}
\description{
  univ.sum.stats is used to calculate the mean, variance, and autoregression coefficient of a vector. The output of the function is contained in a list and can be accessed using $bar.y for the mean, $s.2 for the variance, and $r_1 for the autoregression coefficient.
\arguments{
  \item{x}{a vector with class numeric}
}
\value{
  The value of the function is of class list. 
}
}

(There is no unit test for part 4)

Total points from Unit Tests

total.points

## [1] 44

STAT 700 Week 2

Jeremiah Lowhorn

June 14, 2017

Exercise 1 (20 points)

Part 1

Part 2

Part 3

Part 4

Part 5

Log-likelihood plot Unit Test

Unit Test

Exercise 2

Part 1 (10 points)

Unit Test

Part 2 (5 points)

Part 3 (10 points)

Unit test

Hint

Exercise 3

Part 1 (10 points)

Unit Test

Part 2 (10 points)

Unit Test

Part 3 (5 points)

Part 4 (5 points)

Total points from Unit Tests