Question

A study was conducted with vegetarians to see whether the number of grams of protein each ate per day \((x_1)\) was related to their diastolic blood pressure \((x_2)\), systolic blood pressure \((x_3),\) and gender \((x_4)\). Below are the recorded data for a sample of ten vegetarian students:

\[\bf{X}=\begin{bmatrix}\bf{x_1}& \bf{x_2}& \bf{x_3}& \bf{x_4} \end{bmatrix}^{'}=\begin{bmatrix} 4.0 & 73 & 112 & 1 \\ 6.5 & 79 & 118 & 2\\ 5.0 & 83 & 123 & 1\\ 5.5 & 82 & 120 & 2 \\ 8.0 & 84 & 125 & 1\\ 10.0 & 92 & 140 & 2\\ 9.0 & 88 & 130 & 1\\ 8.2 & 86 & 126 & 2\\ 10.5 & 95 & 144 & 1 \\ 11.0 & 100 & 150 & 2 \end{bmatrix}\]

  1. [5 points] Use the data to create a data frame with column names as \(x_1=protein\), \(x_2=dbp\), \(x_3=sbp\) and \(x_4=gender\) as a factor variable, where 1 is Female and 2 is Male.
# Vegetarian Data
  X <- matrix(c(4.0, 73, 112, 1,
                6.5, 79, 118, 2,
                5.0,  83,  123, 1,
                5.5,  82,  120, 2,
                8.0,  84,  125, 1,
                10.0,  92,  140, 2,
                9.0,  88,  130, 1, 
                8.2,  86,  126, 2, 
                10.5,  95,  144, 1,
                11.0,  100,  150, 2), 10, 4, byrow = T)

# Data Processing

  P <- as.data.frame(X)
  
  colnames(P) <- c("protein", "dbp", "sbp", "gender")
  
  P$gender <- factor(P$gender, levels=c(1,2), labels=c("Female", "Male"))
  attach(P)
  1. [5 points] Use the for() loop to compute the column averages for the recorded data on a sample vegetarians if the column variable is numeric. Write a related R function to compute column averages if the column variable is numeric.
# Use a for() loop to compute the column averages: One Way

  numericvars <- NULL
  for (m in names(P)){
    if(class(P[,m]) == 'numeric'){
      numericvars[m] <- mean(P[,m], na.rm = TRUE)
    } else { NA }
  }
 Mn <- numericvars
  1. [5 points] Given that \(d_i=x_i-\bar{x_i}\) is a deviation vector, write a for loop to obtain a deviation matrix, D for variables; protein, dbp and sbp; where \(D=\begin{bmatrix} d_1\\ d_2\\ d_3\end{bmatrix}\). Hence, compute \(\frac{D^{'}D}{n-1}\). Interpret your results.
x <- P[, 1:3] 

cov(x)
numb_cols <- ncol(x)

# Create a vector to store the column averages
col_aver <- numeric(numb_cols)

# Calculate column averages using a for loop
for (i in 1:numb_cols) {
      col_aver[i] <- mean(x[, i])
}

names(col_aver) <- c("protein", "dbp", "sbp")

# calculate the Deviation Matrix
D <- matrix(0, nrow = nrow(x), ncol = ncol(x), byrow = TRUE, dimnames = list(NULL, c("protein", "dbp", "sbp")))
  
  for (m in names(x)){
    if(class(x[,m]) == 'numeric'){
      D[, m] <- (x[, m] - col_aver[m])
    } else { NA }
  }

# Compute the Variance-Covariance Matrix

S <- (t(D)%*%D)/(nrow(D)-1)

Interpretation

We notice the we are able to calculate the sample variance-covariance matrix of our numeric variables, protein, diastolic blood pressure (dbp) and systolic blood pressure (sbp) using just the deviation matrix.

  1. [5 points] Suppose the number of grams of protein eaten per day \[(x_{i1})=\beta_0+\beta_2(x_{i2})+\beta_3(x_{i3})+\epsilon_i\] where \(\hat\beta=(X^{'}X)^{-1}X^{'}y\). Determine the regression models for females only, males only and for both female and male together, using the matrix multiplication approach for computing coefficients. Write down your observations.
# Both Male and Female

A <- as.matrix(P[, -c(1,4)], 10,3)
a <- as.matrix(X[, 1], 10, 1)
Ba <- solve(t(A)%*%A)%*%t(A)%*%a

#lm(a ~ A[,1]+A[,2])
#  Male Only

M <- as.matrix(P[which(P$gender == "Male"), -c(1,4)])
m <- as.matrix(X[which(P$gender == "Male"),  1])
Bm <- solve(t(M)%*%M)%*%t(M)%*%m

#lm(m ~ M[,1]+M[,2])

#  Female Only
L <- as.matrix(P[which(P$gender=="Female"), -c(1,4)])
l <- as.matrix(X[which(P$gender=="Female"),  1])
Bl <- solve(t(L)%*%L)%*%t(L)%*%l

#lm(l ~ L[,1]+L[,2])
cbind(ALL=Ba, Male=Bm, Female=Bl)

Observation

Female and Male Model:

\[Protein~~=~~-0.1716936 ~~dbp + 0.1761890 ~~sbp \] Male Only Model:

\[Protein~~=~~-0.6782395 ~~dbp + 0.5187178 ~~sbp \] Female Only Model:

\[Protein~~=~~0.3290938 ~~dbp + -0.1609671 ~~sbp \]

The models for both Female and Male compared to that for Male Only seem to agree in terms of the relationship between the effect of Diastolic Blood Pressure and Systolic Blood Pressure on Protein consumption per day. we notice that there is a negative effect of diastolic blood pressure on Protein consumption, yet systolic blood pressure has a positive effect on the consumption of proteins among bot. However, when modeled independently for only Females, we notice that this effect is reversed.