2024-03-07

Exercise 8.1.

  • Create a plot which has 6 lines (in different colours), representing the pdfs for 6 \(\chi^2\) distributions,

  • with parameters from {1, 2, 3, 4, 6, 9} over the range (0, 10].

  • Can you estimate from the plot how the mode relates to the parameter (it is a very simple relationship)? Compare your output with that from a well known, but not totally reliable, source of information on the web.

Code 8.1

colors <- rainbow(9)

set <- c (2, 3, 4, 6, 9)  # this is for the lines added to the plot
df_values <- c(1, set)
x <- seq (0, 10, length.out = 10000)

plot (x, dchisq (x, df = 1),
      ylim = c (0, 0.5),
      col = colors[1], type = "l", lwd = 3,
      main = substitute (chi[k]^2),
      ylab = substitute(f[k](x)))

# I draw the rest of the four lines using a for loop
for (i in set){
  lines (x, dchisq (x, df = i),
         col = colors[i], lwd = 3)
}

# Add a legend on the topright corner for explanation.
legend ("topright", legend = paste("k =", df_values),
                col = colors[df_values], lty = 1)

Plot 8.1

Exercise 8.2.

  • We will investigate how the t distribution approaches the normal distribution as its degree of freedom increases.

  • One way to measure the distance between two distributions is by the maximum absolute difference of their cumulative distribution functions.

  • Use loops, or array arithmetic if you prefer, to generate a matrix of the cdf evaluated at x = -4, -3.99, . . . , 3.99, 4 for t distributions with dof equals to {1,2,3,4,6,9} respectively.

  • Then, by comparing with the cdf of the standard normal distribution at the same x values, compute the maximum absolute difference of cdf of the standard normal with that of \(t_1, t_2, t_3, t_4, t_6, t_9\) distributions.

My Function

dof <- c (1, 2, 3, 4, 6, 9) # A vector of different degrees of freedom
x <- seq (-4, 4, by = 0.01)  # Set the given range

# Create a blank matrix with correct no. rows & columns.
cdf_matrix <- matrix (nrow = length(x),
                      ncol = length(dof))
# I have six columns in this matrix, corresponds to 6 different dof.

# for different column, output the cdf.
for (i in 1:length(dof)){
  cdf_matrix[, i] <- pt(x, dof[i])
}

# this gives me the vector of standard normal cdf in the given range.
sncf <- pnorm (x)

# create a data frame with 7 columns, same rows.
df <- as.data.frame(cdf_matrix)
abs_diff <- data.frame (df, sncf)

My Function: continue…

# subtract the first 6 rows (cdf of t-distribution) with std. normal.
for (i in 1:length(dof)){
  abs_diff[, i] <- abs(abs_diff[, i] 
                                - 
                         abs_diff[, length(abs_diff)]
                       )
}

# create a new data frame with 1:6 (meaningless numbers)
max_abs_diff <- data.frame (c (1:6))

# for every row, put in the maximum difference value
for (i in 1:length(dof)){
  max_abs_diff[i, ] <- data.frame (max(abs_diff[, i]))
}

rownames(max_abs_diff) = paste ("t =", dof)
colnames(max_abs_diff) <- c ("maximum absolute difference of 2 cdfs")

print(max_abs_diff)

The output: table

##       maximum absolute difference of 2 cdfs
## t = 1                            0.12558222
## t = 2                            0.07107015
## t = 3                            0.04929632
## t = 4                            0.03767243
## t = 6                            0.02556357
## t = 9                            0.01723211

Exercise 8.3.

  • What is the 95th percentile of the Student’s t distribution with one degree of freedom?

  • Complete the loop begun above so that it finishes by outputting to the screen a sentence explaining the value of n for which the (-1.65, 1.65) interval has been reached.

Answer to 8.3

n <- 2

while (qt (p = 0.95, df = n-1) > 1.65) {
  n <- n + 1
}

cat ("The lowest sample size for which the central 95% of 
     the corresponding student's t distribution falls within (-1.65, 1.65) is", n, ".\n")
## The lowest sample size for which the central 95% of 
##      the corresponding student's t distribution falls within (-1.65, 1.65) is 299 .

Exercise 8.4.

  • Write a function which takes two numbers, \(\upsilon_1\) and \(\upsilon_2\) as input and then outputs a \(\upsilon_1 \times \upsilon_2\) matrix of 97.5 percentiles from the F distribution where

  • the rows represent the range 1 : \(\upsilon_1\) and

  • the columns 1 : \(\upsilon_2\).

  • Make sure that the rows and columns are appropriately labelled.

  • Include checks that \(\upsilon_1\) and \(\upsilon_2\) are positive integers with appropriate error messages.

My Function

F_percentile_calculation <- function (v1, v2) {
  # check if v1, v2 are positive integers
  
  v_values <- c (v1, v2)
  neg_v <- (1:2) [v_values <= 0]
  non_int_v <- (1:2) [v_values %% 1 != 0]
  
  if(length(neg_v) >= 1) {
    stop ("Both vs should be positive!")
  }
  if (length(non_int_v >= 1)) {
    stop ("Both vs need to be integers!")
  }

  M <- matrix (nrow = v1, ncol = v2)
  for (i in 1:v1) {
    M[i, ] <- qf ( p = 0.975,
                   df1 = i,
                   df2 = 1:v2)
  }
  
  rownames(M) <- 1:v1
  colnames(M) <- 1:v2
  return (M)
}

Try it

v1 <- 5
v2 <- 6
F_percentile_calculation(v1,v2)
##          1        2        3         4         5        6
## 1 647.7890 38.50633 17.44344 12.217863 10.006982 8.813101
## 2 799.5000 39.00000 16.04411 10.649111  8.433621 7.259856
## 3 864.1630 39.16549 15.43918  9.979199  7.763589 6.598799
## 4 899.5833 39.24842 15.10098  9.604530  7.387886 6.227161
## 5 921.8479 39.29823 14.88482  9.364471  7.146382 5.987565

Exercise 8.5.

  • It can be shown that the square of a \(t_d\) random variable has an \(F_{1,d}\) distribution.

  • We will verify this by simulation in this exercise.

  • Write a function that takes d and n as the input,

  • generates two vectors u and v of length n from \(t_d\) and \(F_{1,d}\) distributions respectively,

  • and finally performs an appropriate statistical test to check if the \(u^2\) (entrywise square of u) and v follows the same distribution.

My function

tf <- function (d, n) {
  
  t_df <- d
  u <- matrix (rt (n, t_df),
                nrow = n, 
                ncol = 1)
  
  v <- matrix (rf (n, 
                    df1 = 1,
                    df2 = t_df),
                nrow = n,
                ncol = 1)
  
  test_result1 <- wilcox.test (u^2, v)
  test_result2 <- ks.test(u^2, v)
  
  cat("Wilcoxon-test results: \n")
  print(test_result1)
  
  cat("Kolmogorov-Smirnov Test Results:\n")
  print(test_result2)
}

Try it

set.seed(123)
d <- 3
n <- 1000
tf(d, n)
## Wilcoxon-test results: 
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  u^2 and v
## W = 500959, p-value = 0.9408
## alternative hypothesis: true location shift is not equal to 0
## 
## Kolmogorov-Smirnov Test Results:
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  u^2 and v
## D = 0.022, p-value = 0.9689
## alternative hypothesis: two-sided

Or, I can plot: