ContextBase Predictive Analytics

This document contains examples of the Predictive Analytics capabilities of ContextBase, http://contextbase.github.io.

Predictive Analytics Example 1: Linear Regression

Linear Regression allows for prediction of future occurrences derived from one explanatory variable, and one response variable.

library(knitr)
revenue <- freeny.x
kable(head(revenue))

lag quarterly revenue	price index	income level	market potential
8.79636	4.70997	5.82110	12.9699
8.79236	4.70217	5.82558	12.9733
8.79137	4.68944	5.83112	12.9774
8.81486	4.68558	5.84046	12.9806
8.81301	4.64019	5.85036	12.9831
8.90751	4.62553	5.86464	12.9854

colnames(revenue) <- c("lag_quarterly_revenue", "price_index",
                       "income_level", "market_potential")
revenue <- data.frame(revenue)
model <- lm(market_potential~price_index, revenue)

cat("The Intercept =", model$coefficients[1])

## The Intercept = 15.21788

test <- data.frame(price_index=4.57592)
result <- predict(model, test)

plot(revenue$market_potential, revenue$price_index, col = "red",
     main = "Price Index affecting Market Potential",
     abline(lm(revenue$price_index~revenue$market_potential)),
     cex = 1.3, pch = 16, xlab = "Market Potential",
     ylab = "Price Index")

Example 1 - Linear Regression Conclusion:

cat("For a Price Index of ", as.character(test), ", the predicted Market Potential = ", round(result, 2), ".", sep="")

## For a Price Index of 4.57592, the predicted Market Potential = 13.03.

In conclusion to ContextBase Predictive Analytics Example 1, a direct correlation of Price Index to Market Potential was found, (see above graph). As a test of the Predictive Algorithm, a Price Index of 4.57592 was processed, and a Market Potential of 13.03 was predicted. The source R dataset shows this prediction to be accurate.

Predictive Analytics Example 2: Logistic Regression

Logistic Regression allows for prediction of a logical, (Yes or No), occurrence based on the effects of an explanatory variable on a response variable. For example, the probability of winning a congressional election vs campaign expenditures.

How does the amount of money spent on a campaign affect the probability that the candidate will win the election?

Source of Data Ranges: https://www.washingtonpost.com/news/the-fix/wp/2014/04/04/think-money-doesnt-matter-in-elections-this-chart-says-youre-wrong/

Expenditures <- c(1000000, 1100000, 1200000, 1300000, 1400000,
                  1500000, 1600000, 1700000, 1800000, 1900000,
                  2000000, 2100000, 2200000, 2300000, 2400000,
                  2000000, 2100000, 2200000, 2300000, 2400000)
ElectionResult <- c(0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,    1, 1, 1, 1, 1)

CampaignCosts <- data.frame(Expenditures, ElectionResult)

kable(CampaignCosts, caption = "Campaign Expenditures of Candidates vs Election Results")

Campaign Expenditures of Candidates vs Election Results
Expenditures	ElectionResult
1000000	0
1100000	0
1200000	0
1300000	0
1400000	0
1500000	0
1600000	1
1700000	0
1800000	1
1900000	0
2000000	1
2100000	0
2200000	1
2300000	0
2400000	1
2000000	1
2100000	1
2200000	1
2300000	1
2400000	1

library(ggplot2)
ggplot(CampaignCosts, aes(Expenditures, ElectionResult)) +
  geom_point(aes()) +
  geom_smooth(method='glm',
              method.args = list(family = "binomial"), se=FALSE) +
  labs (x="Campaign Expenditures", y="Probability of Winning Election", title="Probability of Winning Election vs Campaign Costs")

The logistic regression analysis gives the following output:

model <- glm(ElectionResult ~., family=binomial(link='logit'), data=CampaignCosts)
model$coefficients

##   (Intercept)  Expenditures 
## -7.615054e+00  4.098080e-06

The output indicates that campaign expenditures significantly affect the probability of winning the election. The output provides the coefficients for Intercept = -7.615054e+00, and Expenditures = 4.098080e-06. These coefficients are entered in the logistic regression equation to estimate the probability of winning the election:

Probability of winning election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))

For a Candidate that has $1,600,000 in expenditures:

CampaignExpenses <- 1600000
ProbabilityOfWinningElection <- 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))

cat("Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*",
    CampaignExpenses, "))) = ", round(ProbabilityOfWinningElection, 2), ".", sep="")

## Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*1600000))) = 0.26.

For a Candidate that has $2,100,000 in expenditures:

CampaignExpenses <- 2100000
ProbabilityOfWinningElection <- 1/(1+exp(-(-7.615054e+00+4.098080e-06*CampaignExpenses)))

cat("Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*",
    CampaignExpenses, "))) = ", round(ProbabilityOfWinningElection, 2), ".", sep="")

## Probability of winning Election = 1/(1+exp(-(-7.615054e+00+4.098080e-06*2100000))) = 0.73.

Example 2 - Logistic Regression Conclusion:

ElectionWinTable <- data.frame(column1=c(1100000, 1400000,
                                         1700000, 1900000,
                                         2300000),
                               column2=
      c(round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1100000))), 2),
      round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1400000))), 2),
      round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1700000))), 2),
      round(1/(1+exp(-(-7.615054e+00+4.098080e-06*1900000))), 2),
      round(1/(1+exp(-(-7.615054e+00+4.098080e-06*2300000))), 2)))
names(ElectionWinTable) <- c("Campaign Expenses", "Probability of Winning Election")
kable(ElectionWinTable)

Campaign Expenses	Probability of Winning Election
1100000	0.04
1400000	0.13
1700000	0.34
1900000	0.54
2300000	0.86

In conclusion to ContextBase Predictive Analytics Example 2, a direct correlation of Campaign Expenditures to Election Performance was verified. The above table displays corresponding probablities of winning an election to campaign expenses.

Predictive Analytics Example 3: Multiple Regression

Multiple Regression allows for the prediction of the future values of a response variable, based on values of multiple explanatory variables.

input <- data.frame(state.x77[,1:4])
kable(head(input))

	Population	Income	Illiteracy	Life.Exp
Alabama	3615	3624	2.1	69.05
Alaska	365	6315	1.5	69.31
Arizona	2212	4530	1.8	70.55
Arkansas	2110	3378	1.9	70.66
California	21198	5114	1.1	71.71
Colorado	2541	4884	0.7	72.06

colnames(input) <- c("Population", "Income", "Illiteracy", "Life_Exp")

# Create the relationship model.
model <- lm(Life_Exp~Population+Income+Illiteracy, data=input)

# Show the model.
print(model)

## 
## Call:
## lm(formula = Life_Exp ~ Population + Income + Illiteracy, data = input)
## 
## Coefficients:
## (Intercept)   Population       Income   Illiteracy  
##   7.120e+01   -1.024e-05    2.477e-04   -1.179e+00

# Multiple Regression Plot
par(mfrow=c(1,3))
termplot(model)

a <- coef(model)[1]
cat("The Multiple Regression Intercept = ", a, ".", sep="")

## The Multiple Regression Intercept = 71.2023.

XPopulation <- coef(model)[2]
XIncome <- coef(model)[3]
XIlliteracy <- coef(model)[4]

modelCoef <- data.frame(XPopulation, XIncome, XIlliteracy)
colnames(modelCoef) <- c("Population", "Income", "Illiteracy")
row.names(modelCoef) <- c("Coefficients")
kable(modelCoef)

	Population	Income	Illiteracy
Coefficients	-1.02e-05	0.0002477	-1.178788

Multiple Regression Conclusion:

popl <- 3100
Incm <- 5348
Illt <- 1.1

Y = a + popl * XPopulation + Incm * XIncome + Illt * XIlliteracy
cat("For a City where Population = ", popl, ", Income = ", Incm,  ", and Illiteracy = ", Illt, ",
the predicted Life Expectancy is: ", round(Y, 2), ".", sep="")

## For a City where Population = 3100, Income = 5348, and Illiteracy = 1.1,
## the predicted Life Expectancy is: 71.2.

In conclusion to ContextBase Predictive Analytics Example 3, the multiple variables of “Population”, “Income”, and “Illiteracy” were used to determine the predicted “Life Expectancy” of an area corresponding to a USA State. For an area with a Population of 3100, a per capita Income Rate of 5348, and an Illiteracy Rate of 1.1, a Life Expectancy of 71.2 years was predicted.