Consider a model for the long-term dining behavior of the students at College USA. It is found that 25% of the students who eat at the college’s Grease Dining Hall return to eat there again, whereas those who eat at Sweet Dining Hall have a 93% return rate. These are the only two dining halls available on campus, and assume that all students eat at one of these halls. Formulate a model to solve for the long-term percentage of students eating at each hall.
\[GreaseDining_{n + 1} = 0.25GreaseDining_{n} + 0.07SweetDining_{n}\] \[SweetDining_{n + 1} = 0.75GreaseDining_{n} + 0.93SweetDining_{n}\]
Assume that half of the diners start at Grease Dinner Hall and the other half start at Sweet Dining Hall.
grease_dining <- function(grease, sweet) {return (0.25*grease + 0.07*sweet)}
sweet_dining <- function(grease, sweet) {return (0.75*grease + 0.93*sweet)}
# start with
data <- data.frame(n = 0:20, Grease = 0.5, Sweet = 0.5)
for (i in 1:20) {
data$Grease[i + 1] <- grease_dining(data$Grease[i], data$Sweet[i])
data$Sweet[i + 1] <- sweet_dining(data$Grease[i], data$Sweet[i])
}
n | Grease | Sweet |
---|---|---|
0 | 0.5000000 | 0.5000000 |
1 | 0.1600000 | 0.8400000 |
2 | 0.0988000 | 0.9012000 |
3 | 0.0877840 | 0.9122160 |
4 | 0.0858011 | 0.9141989 |
5 | 0.0854442 | 0.9145558 |
6 | 0.0853800 | 0.9146200 |
7 | 0.0853684 | 0.9146316 |
8 | 0.0853663 | 0.9146337 |
9 | 0.0853659 | 0.9146341 |
10 | 0.0853659 | 0.9146341 |
11 | 0.0853659 | 0.9146341 |
12 | 0.0853659 | 0.9146341 |
13 | 0.0853659 | 0.9146341 |
14 | 0.0853659 | 0.9146341 |
15 | 0.0853659 | 0.9146341 |
16 | 0.0853659 | 0.9146341 |
17 | 0.0853659 | 0.9146341 |
18 | 0.0853659 | 0.9146341 |
19 | 0.0853659 | 0.9146341 |
20 | 0.0853659 | 0.9146341 |
ggplot(data, aes(n)) +
geom_point(aes(y = Grease, colour = "Grease Dining Hall")) +
geom_point(aes(y = Sweet, colour = "Sweet Dining Hall")) +
ggtitle("Long % Dining - Grease Diner vs. Sweet Diner") + labs(y = "Percentage")
We see that over time that equilibrium is achieved where 91.4% of diners choose to eat at Sweet Dining Hall versus just 8.5% who choose Grease Dining.
Consider a stereo with CD player, FM-AM radio tuner, speakers (dual), and power amplifier (PA) components, as displayed with the reliabilities shown in Figure 6.11. Determine the system’s reliability. What assumptions are required in your model?
Subsystem 1
\(R_1 = 0.95\)
Subsystem 2
Parallel system
\(R_2 = R_1(t) + R_2(t) - R_1(t) * R_2(t) = 0.98 + 0.97 - (0.98 * 0.97) = 0.9994\)
Subsystem 3
Parallel system
\(R_3 = R_1(t) + R_2(t) - R_1(t) * R_2(t) = 0.99 + 0.99 - (0.99 * 0.99) = 0.9999\)
For total system reliability, I’m assuming these three are a combination of series and parallel relationships.
System Reliability =
\[R_s(t) = R_{s1}(t) * R_{s2}(t) * R_{s3}(t) = 0.95 * 0.9994 * 0.9999 = 0.9493351\]
Use the basic linear model \(y = ax + b\) to fit the following data sets. Provide the model, provide the values of SSE, SSR, SST, and R2, and provide a residual plot.
From Table 2.7, predict weight as a function of height.
data <- data.frame(
height = seq(from = 60, to = 80, by = 1),
weight = c( 132, 136, 141, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190,
195, 201, 206, 212, 218, 223, 229, 234) )
str(data)
## 'data.frame': 21 obs. of 2 variables:
## $ height: num 60 61 62 63 64 65 66 67 68 69 ...
## $ weight: num 132 136 141 145 150 155 160 165 170 175 ...
# x = height
# y = weight
height | weight |
---|---|
60 | 132 |
61 | 136 |
62 | 141 |
63 | 145 |
64 | 150 |
65 | 155 |
66 | 160 |
67 | 165 |
68 | 170 |
69 | 175 |
70 | 180 |
71 | 185 |
72 | 190 |
73 | 195 |
74 | 201 |
75 | 206 |
76 | 212 |
77 | 218 |
78 | 223 |
79 | 229 |
80 | 234 |
Plot the given data which appears to be fairly linear.
Solve the normal equation below to obtain the slope \(a\) for the least-squares best-fitting line.
\[a = \frac{m {\sum x_i y_i} - {\sum x_i}{\sum y_i}}{m {\sum x^2_i} - ({\sum x_i})^2}\]
\(m = \text{number of data points}\)
m <- nrow(data)
numerator <- m * sum(data$height*data$weight) - sum(data$height)*sum(data$weight)
denominator <- m * sum(data$height^2) - sum(data$height)^2
(a <- numerator/denominator)
## [1] 5.136364
The slope \(a = 5.1363636\)
Solve the normal equation below to obtain y-intercept \(b\) for the least-squares best-fitting line.
\[b = \frac { \sum x^2_i \sum y_i - \sum x_i y_i \sum x_i } {m {\sum x^2_i} - ({\sum x_i})^2} \]
\(m = \text{number of data points}\)
m <- nrow(data)
numerator <- (sum(data$height^2)*sum(data$weight)) - (sum(data$height*data$weight)*sum(data$height))
denominator <- m * sum(data$height^2) - sum(data$height)^2
(b <- numerator/denominator)
## [1] -178.4978
The y-intercept \(b = -178.4978355\)
(SSE <- sum((data$weight - (a * data$height + b))^2))
## [1] 24.6342
(SST <- sum((data$weight - mean(data$weight))^2))
## [1] 20338.95
\[ SSR = SST = SSE \]
(SSR <- SST - SSE)
## [1] 20314.32
\[R^2 = 1 - \frac{SSE}{SST}\]
R2 <- 1 - (SSE/SST)
\(R^2 = 0.9987888\)
Model 1 | SSE | SSR | SST | \(R^2\) |
---|---|---|---|---|
weight = 5.136364 * height -178.4978 | 24.6341991 | 20314.3181818 | 20338.952381 | 0.9987888 |
data$model1_weight <- a * data$height + b
data$model1_residuals <- data$weight - data$model1_weight
ggplot(data, aes(height)) + geom_point(aes(y = model1_residuals)) + labs(x = "Height", y = "Residuals")
Plot the actual vs. modeled data
The residual plot shows a clear diagonal line pattern which would indicate that this model may not provide the most appropriate fit for this data. Despite the high \(R^2\) value, the residual plot shows there could be a better model for this data.
Run the model using the lm
function to compare results:
model <- lm(data = data, weight ~ height)
summary(model)
##
## Call:
## lm(formula = weight ~ height, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4567 -0.7749 -0.3658 0.9978 2.3160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -178.49784 2.88313 -61.91 <0.0000000000000002 ***
## height 5.13636 0.04103 125.17 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.139 on 19 degrees of freedom
## Multiple R-squared: 0.9988, Adjusted R-squared: 0.9987
## F-statistic: 1.567e+04 on 1 and 19 DF, p-value: < 0.00000000000000022
Use the basic linear model \(y = ax + b\) to fit the following data sets. Provide the model, provide the values of SSE, SSR, SST, and R2, and provide a residual plot.
From Table 2.7, predict weight as a function of the cube of the height.
data <- data.frame(
height = seq(from = 60, to = 80, by = 1),
weight = c( 132, 136, 141, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190,
195, 201, 206, 212, 218, 223, 229, 234) )
data$height_cubed <- data$heigh^3
str(data)
## 'data.frame': 21 obs. of 3 variables:
## $ height : num 60 61 62 63 64 65 66 67 68 69 ...
## $ weight : num 132 136 141 145 150 155 160 165 170 175 ...
## $ height_cubed: num 216000 226981 238328 250047 262144 ...
# x = height
# y = weight
Solve the normal equation below to obtain the slope \(a\) for the least-squares best-fitting line.
\[a = \frac{m {\sum x_i y_i} - {\sum x_i}{\sum y_i}}{m {\sum x^2_i} - ({\sum x_i})^2}\]
\(m = \text{number of data points}\)
m <- nrow(data)
numerator <- m * sum(data$height_cubed*data$weight) - sum(data$height_cubed)*sum(data$weight)
denominator <- m * sum(data$height_cubed^2) - sum(data$height_cubed)^2
(a <- numerator/denominator)
## [1] 0.0003467044
The slope \(a = 0.0003467\)
Solve the normal equation below to obtain y-intercept \(b\) for the least-squares best-fitting line.
\[b = \frac { \sum x^2_i \sum y_i - \sum x_i y_i \sum x_i } {m {\sum x^2_i} - ({\sum x_i})^2} \]
m <- nrow(data)
numerator <- (sum(data$height_cubed^2)*sum(data$weight)) - (sum(data$height_cubed*data$weight)*sum(data$height_cubed))
denominator <- m * sum(data$height_cubed^2) - sum(data$height_cubed)^2
(b <- numerator/denominator)
## [1] 59.4584
The y-intercept \(b = 59.4583969\)
(SSE <- sum((data$weight - (a * data$height_cubed + b))^2))
## [1] 39.86196
(SST <- sum((data$weight - mean(data$weight))^2))
## [1] 20338.95
\[ SSR = SST = SSE \]
(SSR <- SST - SSE)
## [1] 20299.09
\[R^2 = 1 - \frac{SSE}{SST}\]
R2 <- 1 - (SSE/SST)
\(R^2 = 0.9980401\)
Model 2 | SSE | SSR | SST | \(R^2\) |
---|---|---|---|---|
weight = 0.0003467044 * height^3 + 59.4584 | 39.8619613 | 20299.0904197 | 20338.952381 | 0.9980401 |
data$model2_weight <- a * data$height^3 + b
data$model2_residuals <- data$weight - data$model2_weight
ggplot(data, aes(height)) + geom_point(aes(y = model2_residuals)) + labs(x = "Height", y = "Residuals")
Plot the actual vs. modeled data
Looking at the results of model 2 using height cubed, this model appears to provide a better fit to the data than model 1. Model 2 results in a slightly better \(R^2\) value plus the residual plot appears to be more random than model 1’s residual plot.