DATA 609: Homework 5

Page 228: #1

Consider a model for the long-term dining behavior of the students at College USA. It is found that 25% of the students who eat at the college’s Grease Dining Hall return to eat there again, whereas those who eat at Sweet Dining Hall have a 93% return rate. These are the only two dining halls available on campus, and assume that all students eat at one of these halls. Formulate a model to solve for the long-term percentage of students eating at each hall.

\[GreaseDining_{n + 1} = 0.25GreaseDining_{n} + 0.07SweetDining_{n}\] \[SweetDining_{n + 1} = 0.75GreaseDining_{n} + 0.93SweetDining_{n}\]

Assume that half of the diners start at Grease Dinner Hall and the other half start at Sweet Dining Hall.

grease_dining <- function(grease, sweet)  {return (0.25*grease + 0.07*sweet)}
sweet_dining  <- function(grease, sweet) {return (0.75*grease + 0.93*sweet)}


# start with 
data <- data.frame(n = 0:20, Grease = 0.5, Sweet = 0.5)

for (i in 1:20) {
  
  data$Grease[i + 1] <-  grease_dining(data$Grease[i], data$Sweet[i])
  data$Sweet[i + 1]  <-  sweet_dining(data$Grease[i], data$Sweet[i])
}

Iterated Solution to Grease vs. Sweet Dining
n	Grease	Sweet
0	0.5000000	0.5000000
1	0.1600000	0.8400000
2	0.0988000	0.9012000
3	0.0877840	0.9122160
4	0.0858011	0.9141989
5	0.0854442	0.9145558
6	0.0853800	0.9146200
7	0.0853684	0.9146316
8	0.0853663	0.9146337
9	0.0853659	0.9146341
10	0.0853659	0.9146341
11	0.0853659	0.9146341
12	0.0853659	0.9146341
13	0.0853659	0.9146341
14	0.0853659	0.9146341
15	0.0853659	0.9146341
16	0.0853659	0.9146341
17	0.0853659	0.9146341
18	0.0853659	0.9146341
19	0.0853659	0.9146341
20	0.0853659	0.9146341

ggplot(data, aes(n)) + 
  geom_point(aes(y = Grease, colour = "Grease Dining Hall")) + 
  geom_point(aes(y = Sweet, colour = "Sweet Dining Hall")) + 
  ggtitle("Long % Dining - Grease Diner vs. Sweet Diner") + labs(y = "Percentage")

We see that over time that equilibrium is achieved where 91.4% of diners choose to eat at Sweet Dining Hall versus just 8.5% who choose Grease Dining.

Page 232: #1

Consider a stereo with CD player, FM-AM radio tuner, speakers (dual), and power amplifier (PA) components, as displayed with the reliabilities shown in Figure 6.11. Determine the system’s reliability. What assumptions are required in your model?

Subsystem 1

\(R_1 = 0.95\)

Subsystem 2

Parallel system

\(R_2 = R_1(t) + R_2(t) - R_1(t) * R_2(t) = 0.98 + 0.97 - (0.98 * 0.97) = 0.9994\)

Subsystem 3

Parallel system

\(R_3 = R_1(t) + R_2(t) - R_1(t) * R_2(t) = 0.99 + 0.99 - (0.99 * 0.99) = 0.9999\)

For total system reliability, I’m assuming these three are a combination of series and parallel relationships.

System Reliability =

\[R_s(t) = R_{s1}(t) * R_{s2}(t) * R_{s3}(t) = 0.95 * 0.9994 * 0.9999 = 0.9493351\]

Page 240: #1

Use the basic linear model \(y = ax + b\) to fit the following data sets. Provide the model, provide the values of SSE, SSR, SST, and R2, and provide a residual plot.

From Table 2.7, predict weight as a function of height.

data <- data.frame(
         height =  seq(from = 60, to = 80,  by = 1),
         weight = c( 132, 136, 141, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 
             195, 201, 206, 212, 218, 223, 229, 234) )
str(data)

## 'data.frame':    21 obs. of  2 variables:
##  $ height: num  60 61 62 63 64 65 66 67 68 69 ...
##  $ weight: num  132 136 141 145 150 155 160 165 170 175 ...

# x = height
# y = weight

height	weight
60	132
61	136
62	141
63	145
64	150
65	155
66	160
67	165
68	170
69	175
70	180
71	185
72	190
73	195
74	201
75	206
76	212
77	218
78	223
79	229
80	234

Plot the given data which appears to be fairly linear.

Calcuate the slope \(a\)

Solve the normal equation below to obtain the slope \(a\) for the least-squares best-fitting line.

\[a = \frac{m {\sum x_i y_i} - {\sum x_i}{\sum y_i}}{m {\sum x^2_i} - ({\sum x_i})^2}\]

\(m = \text{number of data points}\)

m <- nrow(data)

numerator <-  m * sum(data$height*data$weight) - sum(data$height)*sum(data$weight)  

denominator <- m * sum(data$height^2) - sum(data$height)^2

(a <- numerator/denominator)

## [1] 5.136364

The slope \(a = 5.1363636\)

Solve the normal equation below to obtain y-intercept \(b\) for the least-squares best-fitting line.

\[b = \frac { \sum x^2_i \sum y_i - \sum x_i y_i \sum x_i } {m {\sum x^2_i} - ({\sum x_i})^2} \]

\(m = \text{number of data points}\)

m <- nrow(data)

numerator <- (sum(data$height^2)*sum(data$weight)) - (sum(data$height*data$weight)*sum(data$height))

denominator <- m * sum(data$height^2) - sum(data$height)^2

(b <- numerator/denominator)

## [1] -178.4978

The y-intercept \(b = -178.4978355\)

Calcuate \(SSE\) or error sum of squares

(SSE <-   sum((data$weight - (a * data$height + b))^2))

## [1] 24.6342

Calcuate \(SST\) or total corrected sum of squares

(SST <- sum((data$weight - mean(data$weight))^2))

## [1] 20338.95

Calcuate \(SSR\) or regression sum of squares

\[ SSR = SST = SSE \]

(SSR <- SST - SSE)

## [1] 20314.32

\(R^2\) Coefficient of determination which is a measure of fit for the regresssion line

\[R^2 = 1 - \frac{SSE}{SST}\]

R2 <- 1 - (SSE/SST)

\(R^2 = 0.9987888\)

Results

Model 1	SSE	SSR	SST	\(R^2\)
weight = 5.136364 * height -178.4978	24.6341991	20314.3181818	20338.952381	0.9987888

Residual Plot

data$model1_weight <- a * data$height + b
data$model1_residuals <- data$weight - data$model1_weight

ggplot(data, aes(height)) + geom_point(aes(y = model1_residuals)) + labs(x = "Height", y = "Residuals")

Plot the actual vs. modeled data

The residual plot shows a clear diagonal line pattern which would indicate that this model may not provide the most appropriate fit for this data. Despite the high \(R^2\) value, the residual plot shows there could be a better model for this data.

Run the model using the lm function to compare results:

model <- lm(data = data, weight ~ height)

summary(model)

## 
## Call:
## lm(formula = weight ~ height, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4567 -0.7749 -0.3658  0.9978  2.3160 
## 
## Coefficients:
##               Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) -178.49784    2.88313  -61.91 <0.0000000000000002 ***
## height         5.13636    0.04103  125.17 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.139 on 19 degrees of freedom
## Multiple R-squared:  0.9988, Adjusted R-squared:  0.9987 
## F-statistic: 1.567e+04 on 1 and 19 DF,  p-value: < 0.00000000000000022

Page 240: #2

Use the basic linear model \(y = ax + b\) to fit the following data sets. Provide the model, provide the values of SSE, SSR, SST, and R2, and provide a residual plot.

From Table 2.7, predict weight as a function of the cube of the height.

data <- data.frame(
         height =  seq(from = 60, to = 80,  by = 1),
         weight = c( 132, 136, 141, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 
             195, 201, 206, 212, 218, 223, 229, 234) )

data$height_cubed <- data$heigh^3

str(data)

## 'data.frame':    21 obs. of  3 variables:
##  $ height      : num  60 61 62 63 64 65 66 67 68 69 ...
##  $ weight      : num  132 136 141 145 150 155 160 165 170 175 ...
##  $ height_cubed: num  216000 226981 238328 250047 262144 ...

# x = height
# y = weight

Calcuate the slope \(a\)

Solve the normal equation below to obtain the slope \(a\) for the least-squares best-fitting line.

\[a = \frac{m {\sum x_i y_i} - {\sum x_i}{\sum y_i}}{m {\sum x^2_i} - ({\sum x_i})^2}\]

\(m = \text{number of data points}\)

m <- nrow(data)

numerator <-  m * sum(data$height_cubed*data$weight) - sum(data$height_cubed)*sum(data$weight)  

denominator <- m * sum(data$height_cubed^2) - sum(data$height_cubed)^2

(a <- numerator/denominator)

## [1] 0.0003467044

The slope \(a = 0.0003467\)

Solve the normal equation below to obtain y-intercept \(b\) for the least-squares best-fitting line.

\[b = \frac { \sum x^2_i \sum y_i - \sum x_i y_i \sum x_i } {m {\sum x^2_i} - ({\sum x_i})^2} \]

m <- nrow(data)

numerator <- (sum(data$height_cubed^2)*sum(data$weight)) - (sum(data$height_cubed*data$weight)*sum(data$height_cubed))

denominator <- m * sum(data$height_cubed^2) - sum(data$height_cubed)^2

(b <- numerator/denominator)

## [1] 59.4584

The y-intercept \(b = 59.4583969\)

Calcuate \(SSE\) or error sum of squares

(SSE <-   sum((data$weight - (a * data$height_cubed + b))^2))

## [1] 39.86196

Calcuate \(SST\) or total corrected sum of squares

(SST <- sum((data$weight - mean(data$weight))^2))

## [1] 20338.95

Calcuate \(SSR\) or regression sum of squares

\[ SSR = SST = SSE \]

(SSR <- SST - SSE)

## [1] 20299.09

\(R^2\) Coefficient of determination which is a measure of fit for the regresssion line

\[R^2 = 1 - \frac{SSE}{SST}\]

R2 <- 1 - (SSE/SST)

\(R^2 = 0.9980401\)

Results

Model 2	SSE	SSR	SST	\(R^2\)
weight = 0.0003467044 * height^3 + 59.4584	39.8619613	20299.0904197	20338.952381	0.9980401

Residual Plot

data$model2_weight <- a * data$height^3 + b
data$model2_residuals <- data$weight - data$model2_weight 

ggplot(data, aes(height)) + geom_point(aes(y = model2_residuals)) + labs(x = "Height", y = "Residuals")

Plot the actual vs. modeled data

Looking at the results of model 2 using height cubed, this model appears to provide a better fit to the data than model 1. Model 2 results in a slightly better \(R^2\) value plus the residual plot appears to be more random than model 1’s residual plot.

DATA 609: Homework 5

Keith Folsom

March 3, 2017

Page 228: #1

Page 232: #1

Page 240: #1

Calcuate the slope \(a\)

Calcuate \(SSE\) or error sum of squares

Calcuate \(SST\) or total corrected sum of squares

Calcuate \(SSR\) or regression sum of squares

\(R^2\) Coefficient of determination which is a measure of fit for the regresssion line

Results

Residual Plot

Page 240: #2

Calcuate the slope \(a\)

Calcuate \(SSE\) or error sum of squares

Calcuate \(SST\) or total corrected sum of squares

Calcuate \(SSR\) or regression sum of squares

\(R^2\) Coefficient of determination which is a measure of fit for the regresssion line

Results

Residual Plot

height	weight
60	132
61	136
62	141
63	145
64	150
65	155
66	160
67	165
68	170
69	175
70	180
71	185
72	190
73	195
74	201
75	206
76	212
77	218
78	223
79	229
80	234

height	weight
60	132
61	136
62	141
63	145
64	150
65	155
66	160
67	165
68	170
69	175
70	180
71	185
72	190
73	195
74	201
75	206
76	212
77	218
78	223
79	229
80	234

height	weight
60	132
61	136
62	141
63	145
64	150
65	155
66	160
67	165
68	170
69	175
70	180
71	185
72	190
73	195
74	201
75	206
76	212
77	218
78	223
79	229
80	234