Page 228: #1

Consider a model for the long-term dining behavior of the students at College USA. It is found that 25% of the students who eat at the college’s Grease Dining Hall return to eat there again, whereas those who eat at Sweet Dining Hall have a 93% return rate. These are the only two dining halls available on campus, and assume that all students eat at one of these halls. Formulate a model to solve for the long-term percentage of students eating at each hall.


\[GreaseDining_{n + 1} = 0.25GreaseDining_{n} + 0.07SweetDining_{n}\] \[SweetDining_{n + 1} = 0.75GreaseDining_{n} + 0.93SweetDining_{n}\]


Assume that half of the diners start at Grease Dinner Hall and the other half start at Sweet Dining Hall.

grease_dining <- function(grease, sweet)  {return (0.25*grease + 0.07*sweet)}
sweet_dining  <- function(grease, sweet) {return (0.75*grease + 0.93*sweet)}


# start with 
data <- data.frame(n = 0:20, Grease = 0.5, Sweet = 0.5)

for (i in 1:20) {
  
  data$Grease[i + 1] <-  grease_dining(data$Grease[i], data$Sweet[i])
  data$Sweet[i + 1]  <-  sweet_dining(data$Grease[i], data$Sweet[i])
}
Iterated Solution to Grease vs. Sweet Dining
n Grease Sweet
0 0.5000000 0.5000000
1 0.1600000 0.8400000
2 0.0988000 0.9012000
3 0.0877840 0.9122160
4 0.0858011 0.9141989
5 0.0854442 0.9145558
6 0.0853800 0.9146200
7 0.0853684 0.9146316
8 0.0853663 0.9146337
9 0.0853659 0.9146341
10 0.0853659 0.9146341
11 0.0853659 0.9146341
12 0.0853659 0.9146341
13 0.0853659 0.9146341
14 0.0853659 0.9146341
15 0.0853659 0.9146341
16 0.0853659 0.9146341
17 0.0853659 0.9146341
18 0.0853659 0.9146341
19 0.0853659 0.9146341
20 0.0853659 0.9146341
ggplot(data, aes(n)) + 
  geom_point(aes(y = Grease, colour = "Grease Dining Hall")) + 
  geom_point(aes(y = Sweet, colour = "Sweet Dining Hall")) + 
  ggtitle("Long % Dining - Grease Diner vs. Sweet Diner") + labs(y = "Percentage")

We see that over time that equilibrium is achieved where 91.4% of diners choose to eat at Sweet Dining Hall versus just 8.5% who choose Grease Dining.


Page 232: #1

Consider a stereo with CD player, FM-AM radio tuner, speakers (dual), and power amplifier (PA) components, as displayed with the reliabilities shown in Figure 6.11. Determine the system’s reliability. What assumptions are required in your model?

Subsystem 1

\(R_1 = 0.95\)

Subsystem 2

Parallel system

\(R_2 = R_1(t) + R_2(t) - R_1(t) * R_2(t) = 0.98 + 0.97 - (0.98 * 0.97) = 0.9994\)

Subsystem 3

Parallel system

\(R_3 = R_1(t) + R_2(t) - R_1(t) * R_2(t) = 0.99 + 0.99 - (0.99 * 0.99) = 0.9999\)

For total system reliability, I’m assuming these three are a combination of series and parallel relationships.

System Reliability =

\[R_s(t) = R_{s1}(t) * R_{s2}(t) * R_{s3}(t) = 0.95 * 0.9994 * 0.9999 = 0.9493351\]


Page 240: #1

Use the basic linear model \(y = ax + b\) to fit the following data sets. Provide the model, provide the values of SSE, SSR, SST, and R2, and provide a residual plot.

From Table 2.7, predict weight as a function of height.

data <- data.frame(
         height =  seq(from = 60, to = 80,  by = 1),
         weight = c( 132, 136, 141, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 
             195, 201, 206, 212, 218, 223, 229, 234) )
str(data)
## 'data.frame':    21 obs. of  2 variables:
##  $ height: num  60 61 62 63 64 65 66 67 68 69 ...
##  $ weight: num  132 136 141 145 150 155 160 165 170 175 ...
# x = height
# y = weight
height weight
60 132
61 136
62 141
63 145
64 150
65 155
66 160
67 165
68 170
69 175
70 180
71 185
72 190
73 195
74 201
75 206
76 212
77 218
78 223
79 229
80 234

Plot the given data which appears to be fairly linear.

Calcuate the slope \(a\)

Solve the normal equation below to obtain the slope \(a\) for the least-squares best-fitting line.

\[a = \frac{m {\sum x_i y_i} - {\sum x_i}{\sum y_i}}{m {\sum x^2_i} - ({\sum x_i})^2}\]

\(m = \text{number of data points}\)

m <- nrow(data)

numerator <-  m * sum(data$height*data$weight) - sum(data$height)*sum(data$weight)  

denominator <- m * sum(data$height^2) - sum(data$height)^2

(a <- numerator/denominator)
## [1] 5.136364

The slope \(a = 5.1363636\)

Solve the normal equation below to obtain y-intercept \(b\) for the least-squares best-fitting line.

\[b = \frac { \sum x^2_i \sum y_i - \sum x_i y_i \sum x_i } {m {\sum x^2_i} - ({\sum x_i})^2} \]

\(m = \text{number of data points}\)

m <- nrow(data)

numerator <- (sum(data$height^2)*sum(data$weight)) - (sum(data$height*data$weight)*sum(data$height))

denominator <- m * sum(data$height^2) - sum(data$height)^2

(b <- numerator/denominator)
## [1] -178.4978

The y-intercept \(b = -178.4978355\)

Calcuate \(SSE\) or error sum of squares

(SSE <-   sum((data$weight - (a * data$height + b))^2))
## [1] 24.6342

Calcuate \(SST\) or total corrected sum of squares

(SST <- sum((data$weight - mean(data$weight))^2))
## [1] 20338.95

Calcuate \(SSR\) or regression sum of squares

\[ SSR = SST = SSE \]

(SSR <- SST - SSE)
## [1] 20314.32

\(R^2\) Coefficient of determination which is a measure of fit for the regresssion line

\[R^2 = 1 - \frac{SSE}{SST}\]

R2 <- 1 - (SSE/SST)

\(R^2 = 0.9987888\)

Results

Model 1 SSE SSR SST \(R^2\)
weight = 5.136364 * height -178.4978 24.6341991 20314.3181818 20338.952381 0.9987888

Residual Plot

data$model1_weight <- a * data$height + b
data$model1_residuals <- data$weight - data$model1_weight

ggplot(data, aes(height)) + geom_point(aes(y = model1_residuals)) + labs(x = "Height", y = "Residuals")

Plot the actual vs. modeled data

The residual plot shows a clear diagonal line pattern which would indicate that this model may not provide the most appropriate fit for this data. Despite the high \(R^2\) value, the residual plot shows there could be a better model for this data.

Run the model using the lm function to compare results:

model <- lm(data = data, weight ~ height)

summary(model)
## 
## Call:
## lm(formula = weight ~ height, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4567 -0.7749 -0.3658  0.9978  2.3160 
## 
## Coefficients:
##               Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) -178.49784    2.88313  -61.91 <0.0000000000000002 ***
## height         5.13636    0.04103  125.17 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.139 on 19 degrees of freedom
## Multiple R-squared:  0.9988, Adjusted R-squared:  0.9987 
## F-statistic: 1.567e+04 on 1 and 19 DF,  p-value: < 0.00000000000000022

Page 240: #2

Use the basic linear model \(y = ax + b\) to fit the following data sets. Provide the model, provide the values of SSE, SSR, SST, and R2, and provide a residual plot.

From Table 2.7, predict weight as a function of the cube of the height.

data <- data.frame(
         height =  seq(from = 60, to = 80,  by = 1),
         weight = c( 132, 136, 141, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 
             195, 201, 206, 212, 218, 223, 229, 234) )

data$height_cubed <- data$heigh^3

str(data)
## 'data.frame':    21 obs. of  3 variables:
##  $ height      : num  60 61 62 63 64 65 66 67 68 69 ...
##  $ weight      : num  132 136 141 145 150 155 160 165 170 175 ...
##  $ height_cubed: num  216000 226981 238328 250047 262144 ...
# x = height
# y = weight

Calcuate the slope \(a\)

Solve the normal equation below to obtain the slope \(a\) for the least-squares best-fitting line.

\[a = \frac{m {\sum x_i y_i} - {\sum x_i}{\sum y_i}}{m {\sum x^2_i} - ({\sum x_i})^2}\]

\(m = \text{number of data points}\)

m <- nrow(data)

numerator <-  m * sum(data$height_cubed*data$weight) - sum(data$height_cubed)*sum(data$weight)  

denominator <- m * sum(data$height_cubed^2) - sum(data$height_cubed)^2

(a <- numerator/denominator)
## [1] 0.0003467044

The slope \(a = 0.0003467\)

Solve the normal equation below to obtain y-intercept \(b\) for the least-squares best-fitting line.

\[b = \frac { \sum x^2_i \sum y_i - \sum x_i y_i \sum x_i } {m {\sum x^2_i} - ({\sum x_i})^2} \]

m <- nrow(data)

numerator <- (sum(data$height_cubed^2)*sum(data$weight)) - (sum(data$height_cubed*data$weight)*sum(data$height_cubed))

denominator <- m * sum(data$height_cubed^2) - sum(data$height_cubed)^2

(b <- numerator/denominator)
## [1] 59.4584

The y-intercept \(b = 59.4583969\)

Calcuate \(SSE\) or error sum of squares

(SSE <-   sum((data$weight - (a * data$height_cubed + b))^2))
## [1] 39.86196

Calcuate \(SST\) or total corrected sum of squares

(SST <- sum((data$weight - mean(data$weight))^2))
## [1] 20338.95

Calcuate \(SSR\) or regression sum of squares

\[ SSR = SST = SSE \]

(SSR <- SST - SSE)
## [1] 20299.09

\(R^2\) Coefficient of determination which is a measure of fit for the regresssion line

\[R^2 = 1 - \frac{SSE}{SST}\]

R2 <- 1 - (SSE/SST)

\(R^2 = 0.9980401\)

Results

Model 2 SSE SSR SST \(R^2\)
weight = 0.0003467044 * height^3 + 59.4584 39.8619613 20299.0904197 20338.952381 0.9980401

Residual Plot

data$model2_weight <- a * data$height^3 + b
data$model2_residuals <- data$weight - data$model2_weight 

ggplot(data, aes(height)) + geom_point(aes(y = model2_residuals)) + labs(x = "Height", y = "Residuals")

Plot the actual vs. modeled data

Looking at the results of model 2 using height cubed, this model appears to provide a better fit to the data than model 1. Model 2 results in a slightly better \(R^2\) value plus the residual plot appears to be more random than model 1’s residual plot.