Exercise 8

2024-03-07

Exercise 8.1.

Create a plot which has 6 lines (in different colours), representing the pdfs for 6 \(\chi^2\) distributions,
with parameters from {1, 2, 3, 4, 6, 9} over the range (0, 10].
Can you estimate from the plot how the mode relates to the parameter (it is a very simple relationship)? Compare your output with that from a well known, but not totally reliable, source of information on the web.

Code 8.1

colors <- rainbow(9)

set <- c (2, 3, 4, 6, 9)  # this is for the lines added to the plot
df_values <- c(1, set)
x <- seq (0, 10, length.out = 10000)

plot (x, dchisq (x, df = 1),
      ylim = c (0, 0.5),
      col = colors[1], type = "l", lwd = 3,
      main = substitute (chi[k]^2),
      ylab = substitute(f[k](x)))

# I draw the rest of the four lines using a for loop
for (i in set){
  lines (x, dchisq (x, df = i),
         col = colors[i], lwd = 3)
}

# Add a legend on the topright corner for explanation.
legend ("topright", legend = paste("k =", df_values),
                col = colors[df_values], lty = 1)

Plot 8.1

Exercise 8.2.

We will investigate how the t distribution approaches the normal distribution as its degree of freedom increases.
One way to measure the distance between two distributions is by the maximum absolute difference of their cumulative distribution functions.
Use loops, or array arithmetic if you prefer, to generate a matrix of the cdf evaluated at x = -4, -3.99, . . . , 3.99, 4 for t distributions with dof equals to {1,2,3,4,6,9} respectively.
Then, by comparing with the cdf of the standard normal distribution at the same x values, compute the maximum absolute difference of cdf of the standard normal with that of \(t_1, t_2, t_3, t_4, t_6, t_9\) distributions.

My Function

dof <- c (1, 2, 3, 4, 6, 9) # A vector of different degrees of freedom
x <- seq (-4, 4, by = 0.01)  # Set the given range

# Create a blank matrix with correct no. rows & columns.
cdf_matrix <- matrix (nrow = length(x),
                      ncol = length(dof))
# I have six columns in this matrix, corresponds to 6 different dof.

# for different column, output the cdf.
for (i in 1:length(dof)){
  cdf_matrix[, i] <- pt(x, dof[i])
}

# this gives me the vector of standard normal cdf in the given range.
sncf <- pnorm (x)

# create a data frame with 7 columns, same rows.
df <- as.data.frame(cdf_matrix)
abs_diff <- data.frame (df, sncf)

My Function: continue…

# subtract the first 6 rows (cdf of t-distribution) with std. normal.
for (i in 1:length(dof)){
  abs_diff[, i] <- abs(abs_diff[, i] 
                                - 
                         abs_diff[, length(abs_diff)]
                       )
}

# create a new data frame with 1:6 (meaningless numbers)
max_abs_diff <- data.frame (c (1:6))

# for every row, put in the maximum difference value
for (i in 1:length(dof)){
  max_abs_diff[i, ] <- data.frame (max(abs_diff[, i]))
}

rownames(max_abs_diff) = paste ("t =", dof)
colnames(max_abs_diff) <- c ("maximum absolute difference of 2 cdfs")

print(max_abs_diff)

The output: table

##       maximum absolute difference of 2 cdfs
## t = 1                            0.12558222
## t = 2                            0.07107015
## t = 3                            0.04929632
## t = 4                            0.03767243
## t = 6                            0.02556357
## t = 9                            0.01723211

Exercise 8.3.

What is the 95th percentile of the Student’s t distribution with one degree of freedom?
Complete the loop begun above so that it finishes by outputting to the screen a sentence explaining the value of n for which the (-1.65, 1.65) interval has been reached.

Answer to 8.3

n <- 2

while (qt (p = 0.95, df = n-1) > 1.65) {
  n <- n + 1
}

cat ("The lowest sample size for which the central 95% of 
     the corresponding student's t distribution falls within (-1.65, 1.65) is", n, ".\n")

## The lowest sample size for which the central 95% of 
##      the corresponding student's t distribution falls within (-1.65, 1.65) is 299 .

Exercise 8.4.

Write a function which takes two numbers, \(\upsilon_1\) and \(\upsilon_2\) as input and then outputs a \(\upsilon_1 \times \upsilon_2\) matrix of 97.5 percentiles from the F distribution where
the rows represent the range 1 : \(\upsilon_1\) and
the columns 1 : \(\upsilon_2\).
Make sure that the rows and columns are appropriately labelled.
Include checks that \(\upsilon_1\) and \(\upsilon_2\) are positive integers with appropriate error messages.

My Function

F_percentile_calculation <- function (v1, v2) {
  # check if v1, v2 are positive integers
  
  v_values <- c (v1, v2)
  neg_v <- (1:2) [v_values <= 0]
  non_int_v <- (1:2) [v_values %% 1 != 0]
  
  if(length(neg_v) >= 1) {
    stop ("Both vs should be positive!")
  }
  if (length(non_int_v >= 1)) {
    stop ("Both vs need to be integers!")
  }

  M <- matrix (nrow = v1, ncol = v2)
  for (i in 1:v1) {
    M[i, ] <- qf ( p = 0.975,
                   df1 = i,
                   df2 = 1:v2)
  }
  
  rownames(M) <- 1:v1
  colnames(M) <- 1:v2
  return (M)
}

Try it

v1 <- 5
v2 <- 6
F_percentile_calculation(v1,v2)

##          1        2        3         4         5        6
## 1 647.7890 38.50633 17.44344 12.217863 10.006982 8.813101
## 2 799.5000 39.00000 16.04411 10.649111  8.433621 7.259856
## 3 864.1630 39.16549 15.43918  9.979199  7.763589 6.598799
## 4 899.5833 39.24842 15.10098  9.604530  7.387886 6.227161
## 5 921.8479 39.29823 14.88482  9.364471  7.146382 5.987565

Exercise 8.5.

It can be shown that the square of a \(t_d\) random variable has an \(F_{1,d}\) distribution.
We will verify this by simulation in this exercise.
Write a function that takes d and n as the input,
generates two vectors u and v of length n from \(t_d\) and \(F_{1,d}\) distributions respectively,
and finally performs an appropriate statistical test to check if the \(u^2\) (entrywise square of u) and v follows the same distribution.

My function

tf <- function (d, n) {
  
  t_df <- d
  u <- matrix (rt (n, t_df),
                nrow = n, 
                ncol = 1)
  
  v <- matrix (rf (n, 
                    df1 = 1,
                    df2 = t_df),
                nrow = n,
                ncol = 1)
  
  test_result1 <- wilcox.test (u^2, v)
  test_result2 <- ks.test(u^2, v)
  
  cat("Wilcoxon-test results: \n")
  print(test_result1)
  
  cat("Kolmogorov-Smirnov Test Results:\n")
  print(test_result2)
}

Try it

set.seed(123)
d <- 3
n <- 1000
tf(d, n)

## Wilcoxon-test results: 
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  u^2 and v
## W = 500959, p-value = 0.9408
## alternative hypothesis: true location shift is not equal to 0
## 
## Kolmogorov-Smirnov Test Results:
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  u^2 and v
## D = 0.022, p-value = 0.9689
## alternative hypothesis: two-sided

Exercise 8.1.

Code 8.1

Plot 8.1

Exercise 8.2.

My Function

My Function: continue…

The output: table

Exercise 8.3.

Answer to 8.3

Exercise 8.4.

My Function

Try it

Exercise 8.5.

My function

Try it

Or, I can plot: