# Load in data in this code block
energy <- read.csv('Energy.csv')
ecom <- read.csv('COSTEST3.csv')
test <- read.csv('Clinton-Obama-testing.csv')
vote <- read.csv('Obama-Clinton-Vote.csv')
Please submit (1) the completed word document and (2) a formatted file generated from this provided R Markdown template (HTML) that includes both code chunks and their outputs to ELMS.
An energy manager wants to understand whether investing in energy-efficiency upgrades is associated with lower annual energy costs across facilities. A random sample of facilities provided the data (in dollars) in the file “Energy.csv”, where the dependent variable is the total annual energy cost (column “EnergyCost”), and independent variable is the annual spending on energy-efficiency upgrades (column “UpgradeCost”).
Enter your code for Question 1 in the code chunks below.
Fit the regression model and print out the summary:
lr_model_energy <- lm(EnergyCost~UpgradeCost, data = energy)
summary(lr_model_energy)
##
## Call:
## lm(formula = EnergyCost ~ UpgradeCost, data = energy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12744 -8807 -1382 5180 21284
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60210.1432 6934.5989 8.683 5.71e-06 ***
## UpgradeCost -0.9578 0.2105 -4.551 0.00106 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11140 on 10 degrees of freedom
## Multiple R-squared: 0.6743, Adjusted R-squared: 0.6418
## F-statistic: 20.71 on 1 and 10 DF, p-value: 0.001057
Show the confidence interval of the coefficients:
library(modelsummary)
modelsummary(lr_model_energy, statistic = "conf.int", conf_level = 0.90)
| (1) | |
|---|---|
| (Intercept) | 60210.143 |
| [47641.452, 72778.834] | |
| UpgradeCost | -0.958 |
| [-1.339, -0.576] | |
| Num.Obs. | 12 |
| R2 | 0.674 |
| R2 Adj. | 0.642 |
| AIC | 261.5 |
| BIC | 263.0 |
| Log.Lik. | -127.749 |
| RMSE | 10166.07 |
ANOVA table for F-test:
anova(lr_model_energy)
## Analysis of Variance Table
##
## Response: EnergyCost
## Df Sum Sq Mean Sq F value Pr(>F)
## UpgradeCost 1 2568062931 2568062931 20.707 0.001057 **
## Residuals 10 1240187069 124018707
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
qf(0.95, df1 = 1, df2 = 10)
## [1] 4.964603
An e-commerce company operates a small fulfillment center. Management believes the daily operating cost is driven by the number of customer orders packed and shipped that day. You are given 24 days of data in COSTEST3 with two columns:
y = COST: total fulfillment center cost for the day (in dollars, or in whatever currency units your dataset uses)
x = NUMBER: number of orders shipped that day
Run a simple linear regression using COST as the dependent variable and NUMBER as the independent variable.
Fit the regression model and print out the summary:
lr_model_ecom <- lm(COST~NUMBER, data = ecom)
summary(lr_model_ecom)
##
## Call:
## lm(formula = COST ~ NUMBER, data = ecom)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3022 -2.3110 0.5253 1.8948 5.2685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.3107 4.0830 6.934 5.82e-07 ***
## NUMBER 2.1549 0.1437 14.995 4.94e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.84 on 22 degrees of freedom
## Multiple R-squared: 0.9109, Adjusted R-squared: 0.9068
## F-statistic: 224.9 on 1 and 22 DF, p-value: 4.942e-13
Show the confidence interval of the coefficients:
modelsummary(lr_model_ecom, statistic = "conf.int", conf_level = 0.95)
| (1) | |
|---|---|
| (Intercept) | 28.311 |
| [19.843, 36.778] | |
| NUMBER | 2.155 |
| [1.857, 2.453] | |
| Num.Obs. | 24 |
| R2 | 0.911 |
| R2 Adj. | 0.907 |
| AIC | 122.1 |
| BIC | 125.7 |
| Log.Lik. | -58.063 |
| RMSE | 2.72 |
Make prediction:
new_data <- data.frame(NUMBER = 50)
predict(lr_model_ecom, newdata = new_data, interval = "prediction", level = 0.90)
## fit lwr upr
## 1 136.0547 128.7122 143.3972
Select variables and inspect missing values
vars <- vote[ c("ObamaWinningMargin", "Bachelors")]
colSums(is.na(vars))
## ObamaWinningMargin Bachelors
## 0 0
Fit the model:
lr_model_vote <- lm(ObamaWinningMargin~Bachelors, data = vote)
summary(lr_model_vote)
##
## Call:
## lm(formula = ObamaWinningMargin ~ Bachelors, data = vote)
##
## Residuals:
## Min 1Q Median 3Q Max
## -68.803 -20.318 -1.037 18.853 94.972
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -34.71674 1.20038 -28.92 <2e-16 ***
## Bachelors 1.57493 0.06573 23.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.87 on 2861 degrees of freedom
## Multiple R-squared: 0.1671, Adjusted R-squared: 0.1668
## F-statistic: 574.1 on 1 and 2861 DF, p-value: < 2.2e-16
modelsummary(lr_model_vote, statistic = "conf.int", conf_level = 0.95)
| (1) | |
|---|---|
| (Intercept) | -34.717 |
| [-37.070, -32.363] | |
| Bachelors | 1.575 |
| [1.446, 1.704] | |
| Num.Obs. | 2863 |
| R2 | 0.167 |
| R2 Adj. | 0.167 |
| AIC | 27181.6 |
| BIC | 27199.5 |
| Log.Lik. | -13587.821 |
| RMSE | 27.86 |
Show the ANOVA table:
anova(lr_model_vote)
## Analysis of Variance Table
##
## Response: ObamaWinningMargin
## Df Sum Sq Mean Sq F value Pr(>F)
## Bachelors 1 445781 445781 574.07 < 2.2e-16 ***
## Residuals 2861 2221658 777
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Create the scatter plot:
plot(vote$Bachelors, vote$ObamaWinningMargin, main = 'Obama Winning Margin vs. Education Winning Margin', xlab = 'Bachelors Degree or Higher (%)', ylab = 'Obama Winning Margin (%)', col = adjustcolor('darkred',alpha.f=0.4), pch = 16)
abline(lr_model_vote)
Make prediction:
test$PredictedMargin <- predict(lr_model_vote, newdata = test)
summary(test$PredictedMargin)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -13.455 -12.195 -11.723 -8.384 -11.408 6.861
test[which.max(test$PredictedMargin), c("County","State","Bachelors","PredictedMargin")]
## County State Bachelors PredictedMargin
## 1 Beaverhead MT 26.4 6.861436
test[which.min(test$PredictedMargin), c("County","State","Bachelors","PredictedMargin")]
## County State Bachelors PredictedMargin
## 4 Miner SD 13.5 -13.45517