DATA609 Course Project Summary

Sreejaya Vasudevannair, Suman K Polavarapu
05/22/2016

  • Monte Carlo Simulation

  • Decision Theory

  • SIR Model (Differential Equations)

Monte Carlo Simulation

Problem Definition

Horse Race – Construct and perform a Monte Carlo simulation of a horse race. You can be creative and use odds from the newspaper, or simulate the Mathematical Derby with the entries and odds shown in the table.

Entry's name Odds
Euler's Folly 7-1
Leapin' Leibniz 5-1
Newton Lobell 9-1
Count Cauchy 12-1
Pumped up Poisson 4-1
Loping L'Hopital 35-1
Steamin' Stokes 15-1
Dancin Dantzig 4-1

Sum of the probabilities ...

probabilities <- c(1/8,1/6,1/10,1/13,1/5,1/36,1/16,1/5)
(sum(probabilities))
[1] 0.9588675
(factor <- (1/sum(probabilities)))
[1] 1.042897
(sum((probabilities.modified <- probabilities * factor)))
[1] 1

Data frame...

horses given_odds probabilities.modified
Euler's Folly 7-1 0.1303621
Leapin' Leibniz 5-1 0.1738162
Newton Lobell 9-1 0.1042897
Count Cauchy 12-1 0.0802228
Pumped up Poisson 4-1 0.2085794
Loping L'Hopital 35-1 0.0289694
Steamin' Stokes 15-1 0.0651811
Dancin Dantzig 4-1 0.2085794

Write a function to sample from the probability distribution. And make 1000 samples:

Simulate...

raceSamples <- function(noOfSamples) {

  sample(x = c("Euller's Folly","Leapin' Leibniz","Newton Lobell","Count Cauchy","Pumped up Poisson",
               "Loping L'Hopital","Steamin' Stokes","Dancin' Dantzig"), 
                noOfSamples, 
                replace = T, 
                prob = probabilities.modified)
}

set.seed(10)
horse.df <- as.data.frame(table(raceSamples(1000)))
colnames(horse.df) <- c('horseName', 'Frequency')

Plot...

horseName Frequency
Count Cauchy 91
Dancin' Dantzig 222
Euller's Folly 116
Leapin' Leibniz 164
Loping L'Hopital 42
Newton Lobell 107
Pumped up Poisson 203
Steamin' Stokes 55

From the above results - Loping L'Hopital won the fewest races and Dancin' Dantzig won the most.

plot of chunk unnamed-chunk-6

Decision Theory

Problem Definition

The NBC TV network earns an average of $400,000 from a hit show and loses an average of 100,000 on a flop (a show that cannot hold its rating and must be canceled). If the network airs a show without a market review, 25% turn out to be hits, and 75% are flops. For $40,000, a market research firm can be hired to help determine whether the show will be a hit or a flop. If the show is actually going to be a hit, there is a 90% chance that the market research firm will predict a hit. If the show is going to be a flop, there is an 80% chance that the market research will predict the show to be a flop. Determine how the network can maximize its profits over the long haul.

Conditional Probabilities ...

There are four outcomes. 2 with research and 2 without research (Decision Trees)

1.Air show a hit with research

2.Air show a flop with research

3.Air show a hit without research

4.Air show a flop without research

With the market research firm ...

NBC TV network will pay $40,000 to the market research firm.

Expected Value if the show is hit is 0.25 * 0.9 * 400,000 + 0.75 * 0.2 * 400,000 = $150,000

Expected Value if the show is Flop 0.25 * 0.1 * 100000 + 0.75 * 0.8 * 100000 = $62,500

So the total profit is 150000 - 62500 - 40,000 = $47,500

Without the market research firm

Expected Value if the show is hit is 0.25 * 400,000 = $100,000

Expected Value if the show is Flop 0.75 * 100000 = $75,000

So the total profit is 100000- 75000 = $25,000

Based on the above results :

“We would recommend NBC TV network to hire a market research firm before airing a show.”

Decision Tree

Decision Tree

SIR

Problem Definition: Modern Flu Epidemic. A modern flu is spread throughout a small community with a fixed population of size n. The disease is spread through contact between infected persons and persons who are susceptible to the disease. Assume that everyone is susceptible to the disease initially and that the community is contained so that no one leaves the community. The model given in (12.39) can be used to model the epidemic throughout the community. Suppose b = 0.002; a = 0.7, and n = 1000

Decision Tree

SIR Function

Fixed population given as 1000. And the initial values as: S(0) = 1000, evreyone is susceptable at the beginning. I(0) = 2 R(0) = 0 We assume the transmission coefficient as, b = 0.002. And the recovery coefficient as , a = 0.7.

#load required libraries 
library(deSolve)
# Create an SIR function 
sir <- function(time, state, parameters) {
  with(as.list(c(state, parameters)), {
    dS <- -b * S * I 
    dI <- b * S * I - a * I 
    dR <- a * I
    return(list(c(dS, dI, dR)))
  })
}

Flu Epidemic

flu_epidemic <- function(S0, I0, R0, times, trans_param, recovery_param) 
{
  ### Set parameters
  init       <- c(S =S0, I = I0, R = R0)
  ## b: infection parameter; a: recovery parameter
  parameters <- c(b = trans_param, a = recovery_param)
  ## Solve using ode (Ordinary Differential Equations)
  out <- ode(y = init, times = times, func = sir, parms = parameters)
  ## change to data frame
  out <- as.data.frame(out)
  return (out)
}

Call with I(0) as 2

time S I R
0 1000.00000 2.0000000 0.000000
1 991.84142 7.2913630 2.867215
2 963.12724 25.7233439 13.149415
3 873.14149 81.3783358 47.480175
4 668.40694 192.5928740 141.000182
5 410.36747 279.8864030 311.746129
6 235.34971 260.3115627 506.338732
7 149.96348 187.9588335 664.077683
8 110.48839 120.5149924 770.996618
9 91.34683 73.0702365 837.582938
10 81.52801 43.0883619 877.383625
11 76.27790 25.0413748 900.680726
12 73.39477 14.4388267 914.166408
13 71.78565 8.2891568 921.925193
14 70.87899 4.7470761 926.373938
15 70.36527 2.7148236 928.919906
16 70.07327 1.5513732 930.375357
17 69.90699 0.8861294 931.206882
18 69.81220 0.5060199 931.681781
19 69.75813 0.2889184 931.952949
20 69.72728 0.1649479 932.107770
21 69.70968 0.0941667 932.196158
22 69.69963 0.0537567 932.246617
23 69.69389 0.0306877 932.275422
24 69.69062 0.0175183 932.291866
25 69.68875 0.0100004 932.301253
26 69.68768 0.0057088 932.306611
27 69.68707 0.0032589 932.309670
28 69.68672 0.0018604 932.311416
29 69.68652 0.0010620 932.312413
30 69.68641 0.0006062 932.312982

Plot

Notice that, we have the highest number of infected people on the week 4.5 , around 280 people.

Everyone survives the epidemic, and not everyone gets the flu.

plot of chunk unnamed-chunk-10

Plot , I(0) = 5, I(0) =10

plot of chunk unnamed-chunk-11

Not much change here, the max infected people are around 277 during week 4.5

plot of chunk unnamed-chunk-12

When I(0)=10, the maximum infected people here are 292, occurs between week 3 and 4.

Plot , I(0) = 100

plot of chunk unnamed-chunk-13

Here, the max number of people infected are around 382 , with in the first 2 weeks itself!.

Though a large number of people infected at an early stage, even in this case, not everyone is infected.

Using S(0) = n - I(0)

population (n) = susceptible (S) + infected (I)

We assume that S(1) = n - I(1) , S(2) = n - I(2) etc. here

sir_custom <- function(s0, i0, r0, a, b, n)
{
  time <- numeric(25)
  S <- numeric(25)
  I <- numeric(25)
  R <- numeric(25)
  time[1] <- 0
  S[1] <- s0
  I[1] <- i0
  R[1] <- r0
  for(i in seq(2,25)) {
    s_prime <- -b * S[i-1] * I[i-1]
    i_prime <- (b * S[i-1] * I[i-1]) - (a * I[i-1])
    r_prime <- (a * I[i-1])
    I[i] <- (I[i-1] + i_prime * 1)
    S[i] <- n - I[i] #Instead of s[i-1] + s_prime * 1, we assign S[i] as n - I[i] and check the results.
    R[i] <- R[i-1] + r_prime
    time[i] <- i
  }
  return (data.frame(time, S, I, R))
}

Using S(0) = n - I(0)

plot of chunk unnamed-chunk-16

Infected population gradually increases and around week 12 it reaches a stable point, where in 350 are susceptible and 650 in the population are always infected.

Using S(0) = n - I(0)

plot of chunk unnamed-chunk-17

Around week 7, it reaches a point where in the population always had 350 susceptible and 650 infected.

For an epidemic like flu this does not seem to be practical.