BMGT430 Assignment 1: Simple Linear Regression

Points: 50

Submission Instructions

Please submit (1) the completed word document and (2) a formatted file generated from this provided R Markdown template (HTML) that includes both code chunks and their outputs to ELMS.

Q1. Energy cost and efficiency upgrading

An energy manager wants to understand whether investing in energy-efficiency upgrades is associated with lower annual energy costs across facilities. A random sample of facilities provided the data (in dollars) in the file “Energy.csv”, where the dependent variable is the total annual energy cost (column “EnergyCost”), and independent variable is the annual spending on energy-efficiency upgrades (column “UpgradeCost”).

Enter your code for Question 1 in the code chunks below.

Fit the regression model and print out the summary:

lr_model_energy <- lm(EnergyCost~UpgradeCost, data = energy)
summary(lr_model_energy)

## 
## Call:
## lm(formula = EnergyCost ~ UpgradeCost, data = energy)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12744  -8807  -1382   5180  21284 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 60210.1432  6934.5989   8.683 5.71e-06 ***
## UpgradeCost    -0.9578     0.2105  -4.551  0.00106 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11140 on 10 degrees of freedom
## Multiple R-squared:  0.6743, Adjusted R-squared:  0.6418 
## F-statistic: 20.71 on 1 and 10 DF,  p-value: 0.001057

Show the confidence interval of the coefficients:

library(modelsummary)
modelsummary(lr_model_energy, statistic = "conf.int", conf_level = 0.90)

	(1)
(Intercept)	60210.143
	[47641.452, 72778.834]
UpgradeCost	-0.958
	[-1.339, -0.576]
Num.Obs.	12
R2	0.674
R2 Adj.	0.642
AIC	261.5
BIC	263.0
Log.Lik.	-127.749
RMSE	10166.07

ANOVA table for F-test:

anova(lr_model_energy)

## Analysis of Variance Table
## 
## Response: EnergyCost
##             Df     Sum Sq    Mean Sq F value   Pr(>F)   
## UpgradeCost  1 2568062931 2568062931  20.707 0.001057 **
## Residuals   10 1240187069  124018707                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

qf(0.95, df1 = 1, df2 = 10)

## [1] 4.964603

Q2. E-commerce fulfillment cost estimation

An e-commerce company operates a small fulfillment center. Management believes the daily operating cost is driven by the number of customer orders packed and shipped that day. You are given 24 days of data in COSTEST3 with two columns:

y = COST: total fulfillment center cost for the day (in dollars, or in whatever currency units your dataset uses)

x = NUMBER: number of orders shipped that day

Run a simple linear regression using COST as the dependent variable and NUMBER as the independent variable.

Fit the regression model and print out the summary:

lr_model_ecom <- lm(COST~NUMBER, data = ecom)
summary(lr_model_ecom)

## 
## Call:
## lm(formula = COST ~ NUMBER, data = ecom)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3022 -2.3110  0.5253  1.8948  5.2685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  28.3107     4.0830   6.934 5.82e-07 ***
## NUMBER        2.1549     0.1437  14.995 4.94e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.84 on 22 degrees of freedom
## Multiple R-squared:  0.9109, Adjusted R-squared:  0.9068 
## F-statistic: 224.9 on 1 and 22 DF,  p-value: 4.942e-13

Show the confidence interval of the coefficients:

modelsummary(lr_model_ecom, statistic = "conf.int", conf_level = 0.95)

	(1)
(Intercept)	28.311
	[19.843, 36.778]
NUMBER	2.155
	[1.857, 2.453]
Num.Obs.	24
R2	0.911
R2 Adj.	0.907
AIC	122.1
BIC	125.7
Log.Lik.	-58.063
RMSE	2.72

Make prediction:

new_data <- data.frame(NUMBER = 50)
predict(lr_model_ecom, newdata = new_data, interval = "prediction", level = 0.90)

##        fit      lwr      upr
## 1 136.0547 128.7122 143.3972

Q3. Clinton-Obama voter study

Select variables and inspect missing values

vars <- vote[ c("ObamaWinningMargin", "Bachelors")]
colSums(is.na(vars))

## ObamaWinningMargin          Bachelors 
##                  0                  0

Fit the model:

lr_model_vote <- lm(ObamaWinningMargin~Bachelors, data = vote)
summary(lr_model_vote)

## 
## Call:
## lm(formula = ObamaWinningMargin ~ Bachelors, data = vote)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -68.803 -20.318  -1.037  18.853  94.972 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -34.71674    1.20038  -28.92   <2e-16 ***
## Bachelors     1.57493    0.06573   23.96   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.87 on 2861 degrees of freedom
## Multiple R-squared:  0.1671, Adjusted R-squared:  0.1668 
## F-statistic: 574.1 on 1 and 2861 DF,  p-value: < 2.2e-16

modelsummary(lr_model_vote, statistic = "conf.int", conf_level = 0.95)

	(1)
(Intercept)	-34.717
	[-37.070, -32.363]
Bachelors	1.575
	[1.446, 1.704]
Num.Obs.	2863
R2	0.167
R2 Adj.	0.167
AIC	27181.6
BIC	27199.5
Log.Lik.	-13587.821
RMSE	27.86

Show the ANOVA table:

anova(lr_model_vote)

## Analysis of Variance Table
## 
## Response: ObamaWinningMargin
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## Bachelors    1  445781  445781  574.07 < 2.2e-16 ***
## Residuals 2861 2221658     777                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Create the scatter plot:

plot(vote$Bachelors, vote$ObamaWinningMargin, main = 'Obama Winning Margin vs. Education Winning Margin', xlab = 'Bachelors Degree or Higher (%)', ylab = 'Obama Winning Margin (%)', col = adjustcolor('darkred',alpha.f=0.4), pch = 16)
abline(lr_model_vote)