Linear Regression allows for prediction of future occurrences derived from one explanatory variable, and one response variable.
library(knitr)
revenue <- freeny.x
kable(head(revenue))
| lag quarterly revenue | price index | income level | market potential |
|---|---|---|---|
| 8.79636 | 4.70997 | 5.82110 | 12.9699 |
| 8.79236 | 4.70217 | 5.82558 | 12.9733 |
| 8.79137 | 4.68944 | 5.83112 | 12.9774 |
| 8.81486 | 4.68558 | 5.84046 | 12.9806 |
| 8.81301 | 4.64019 | 5.85036 | 12.9831 |
| 8.90751 | 4.62553 | 5.86464 | 12.9854 |
colnames(revenue) <- c("lag_quarterly_revenue", "price_index",
"income_level", "market_potential")
revenue <- data.frame(revenue)
model <- lm(market_potential~price_index, revenue)
cat("The Intercept =", model$coefficients[1])
## The Intercept = 15.21788
test <- data.frame(price_index=4.57592)
result <- predict(model, test)
plot(revenue$market_potential, revenue$price_index, col = "red",
main = "Price Index affecting Market Potential",
abline(lm(revenue$price_index~revenue$market_potential)),
cex = 1.3, pch = 16, xlab = "Market Potential",
ylab = "Price Index")
cat("For a Price Index of ", as.character(test), ", the predicted Market Potential = ", round(result, 2), ".", sep="")
## For a Price Index of 4.57592, the predicted Market Potential = 13.03.
Logistic Regression allows for prediction of a logical, (Yes or No), occurrence based on the effects of an explanatory variable on a response variable. For example, the probability of winning a congressional election vs campaign expenditures.
How does the amount of money spent on a campaign affect the probability that the candidate will win the election?
Source of Data Ranges: https://www.washingtonpost.com/news/the-fix/wp/2014/04/04/think-money-doesnt-matter-in-elections-this-chart-says-youre-wrong/
Expenditures <- c(1000000, 1100000, 1200000, 1300000, 1400000,
1500000, 1600000, 1700000, 1800000, 1900000,
2000000, 2100000, 2200000, 2300000, 2400000,
2000000, 2100000, 2200000, 2300000, 2400000)
ElectionResult <- c(0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1)
CampaignCosts <- data.frame(Expenditures, ElectionResult)
kable(CampaignCosts, caption = "Campaign Expenditures of Candidates vs Election Results")
| Expenditures | ElectionResult |
|---|---|
| 1000000 | 0 |
| 1100000 | 0 |
| 1200000 | 0 |
| 1300000 | 0 |
| 1400000 | 0 |
| 1500000 | 0 |
| 1600000 | 1 |
| 1700000 | 0 |
| 1800000 | 1 |
| 1900000 | 0 |
| 2000000 | 1 |
| 2100000 | 0 |
| 2200000 | 1 |
| 2300000 | 0 |
| 2400000 | 1 |
| 2000000 | 1 |
| 2100000 | 1 |
| 2200000 | 1 |
| 2300000 | 1 |
| 2400000 | 1 |
library(ggplot2)
ggplot(CampaignCosts, aes(Expenditures, ElectionResult)) +
geom_point(aes()) +
geom_smooth(method='glm',
method.args = list(family = "binomial"), se=FALSE) +
labs (x="Campaign Expenditures", y="Probability of Winning Election", title="Probability of Winning Election vs Campaign Costs")
The logistic regression analysis gives the following output:
model <- glm(ElectionResult ~., family=binomial(link='logit'), data=CampaignCosts)
model$coefficients
## (Intercept) Expenditures
## -7.615054e+00 4.098080e-06
The output indicates that campaign expenditures significantly affect the probability of winning the election. The output provides the coefficients for Intercept = -7.615054e+00, and Expenditures = 4.098080e-06. These coefficients are entered in the logistic regression equation to estimate the probability of winning the election:
Probability of winning election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))
For a Candidate that has $1,600,000 in expenditures:
CampaignExpenses <- 1600000
ProbabilityOfWinningElection <- 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))
cat("Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*",
CampaignExpenses, "))) = ", round(ProbabilityOfWinningElection, 2), ".", sep="")
## Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*1600000))) = 0.26.
For a Candidate that has $2,100,000 in expenditures:
CampaignExpenses <- 2100000
ProbabilityOfWinningElection <- 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))
cat("Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*",
CampaignExpenses, "))) = ", round(ProbabilityOfWinningElection, 2), ".", sep="")
## Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*2100000))) = 0.73.
ElectionWinTable <- data.frame(column1=c(1100000, 1400000,
1700000, 1900000,
2300000),
column2=
c(round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1100000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1400000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1700000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1900000))), 2),
round(1/(1+exp(-(-7.615054e+00+4.098080e-06*2300000))), 2)))
names(ElectionWinTable) <- c("Campaign Expenses", "Probability of Winning Election")
kable(ElectionWinTable)
| Campaign Expenses | Probability of Winning Election |
|---|---|
| 1100000 | 0.04 |
| 1400000 | 0.13 |
| 1700000 | 0.34 |
| 1900000 | 0.54 |
| 2300000 | 0.86 |
Multiple Regression allows for the prediction of the future values of a response variable, based on values of multiple explanatory variables.
input <- data.frame(state.x77[,1:4])
kable(head(input))
| Population | Income | Illiteracy | Life.Exp | |
|---|---|---|---|---|
| Alabama | 3615 | 3624 | 2.1 | 69.05 |
| Alaska | 365 | 6315 | 1.5 | 69.31 |
| Arizona | 2212 | 4530 | 1.8 | 70.55 |
| Arkansas | 2110 | 3378 | 1.9 | 70.66 |
| California | 21198 | 5114 | 1.1 | 71.71 |
| Colorado | 2541 | 4884 | 0.7 | 72.06 |
colnames(input) <- c("Population", "Income", "Illiteracy", "Life_Exp")
# Create the relationship model.
model <- lm(Life_Exp~Population+Income+Illiteracy, data=input)
# Show the model.
print(model)
##
## Call:
## lm(formula = Life_Exp ~ Population + Income + Illiteracy, data = input)
##
## Coefficients:
## (Intercept) Population Income Illiteracy
## 7.120e+01 -1.024e-05 2.477e-04 -1.179e+00
# Multiple Regression Plot
par(mfrow=c(1,3))
termplot(model)
a <- coef(model)[1]
cat("The Multiple Regression Intercept = ", a, ".", sep="")
## The Multiple Regression Intercept = 71.2023.
XPopulation <- coef(model)[2]
XIncome <- coef(model)[3]
XIlliteracy <- coef(model)[4]
modelCoef <- data.frame(XPopulation, XIncome, XIlliteracy)
colnames(modelCoef) <- c("Population", "Income", "Illiteracy")
row.names(modelCoef) <- c("Coefficients")
kable(modelCoef)
| Population | Income | Illiteracy | |
|---|---|---|---|
| Coefficients | -1.02e-05 | 0.0002477 | -1.178788 |
popl <- 3100
Incm <- 5348
Illt <- 1.1
Y = a + popl * XPopulation + Incm * XIncome + Illt * XIlliteracy
cat("For a City where Population = ", popl, ", Income = ", Incm, ", and Illiteracy = ", Illt, ",
the predicted Life Expectancy is: ", round(Y, 2), ".", sep="")
## For a City where Population = 3100, Income = 5348, and Illiteracy = 1.1,
## the predicted Life Expectancy is: 71.2.