DATA609 Course Project Summary

Sreejaya Vasudevannair, Suman K Polavarapu
05/22/2016

Monte Carlo Simulation
Decision Theory
SIR Model (Differential Equations)

Monte Carlo Simulation

Problem Definition

Horse Race – Construct and perform a Monte Carlo simulation of a horse race. You can be creative and use odds from the newspaper, or simulate the Mathematical Derby with the entries and odds shown in the table.

Entry's name	Odds
Euler's Folly	7-1
Leapin' Leibniz	5-1
Newton Lobell	9-1
Count Cauchy	12-1
Pumped up Poisson	4-1
Loping L'Hopital	35-1
Steamin' Stokes	15-1
Dancin Dantzig	4-1

Sum of the probabilities ...

probabilities <- c(1/8,1/6,1/10,1/13,1/5,1/36,1/16,1/5)
(sum(probabilities))

[1] 0.9588675

(factor <- (1/sum(probabilities)))

[1] 1.042897

(sum((probabilities.modified <- probabilities * factor)))

[1] 1

Data frame...

horses	given_odds	probabilities.modified
Euler's Folly	7-1	0.1303621
Leapin' Leibniz	5-1	0.1738162
Newton Lobell	9-1	0.1042897
Count Cauchy	12-1	0.0802228
Pumped up Poisson	4-1	0.2085794
Loping L'Hopital	35-1	0.0289694
Steamin' Stokes	15-1	0.0651811
Dancin Dantzig	4-1	0.2085794

Write a function to sample from the probability distribution. And make 1000 samples:

Simulate...

raceSamples <- function(noOfSamples) {

  sample(x = c("Euller's Folly","Leapin' Leibniz","Newton Lobell","Count Cauchy","Pumped up Poisson",
               "Loping L'Hopital","Steamin' Stokes","Dancin' Dantzig"), 
                noOfSamples, 
                replace = T, 
                prob = probabilities.modified)
}

set.seed(10)
horse.df <- as.data.frame(table(raceSamples(1000)))
colnames(horse.df) <- c('horseName', 'Frequency')

Plot...

horseName	Frequency
Count Cauchy	91
Dancin' Dantzig	222
Euller's Folly	116
Leapin' Leibniz	164
Loping L'Hopital	42
Newton Lobell	107
Pumped up Poisson	203
Steamin' Stokes	55

From the above results - Loping L'Hopital won the fewest races and Dancin' Dantzig won the most.

plot of chunk unnamed-chunk-6

Decision Theory

Problem Definition

The NBC TV network earns an average of $400,000 from a hit show and loses an average of 100,000 on a flop (a show that cannot hold its rating and must be canceled). If the network airs a show without a market review, 25% turn out to be hits, and 75% are flops. For $40,000, a market research firm can be hired to help determine whether the show will be a hit or a flop. If the show is actually going to be a hit, there is a 90% chance that the market research firm will predict a hit. If the show is going to be a flop, there is an 80% chance that the market research will predict the show to be a flop. Determine how the network can maximize its profits over the long haul.

Conditional Probabilities ...

There are four outcomes. 2 with research and 2 without research (Decision Trees)

1.Air show a hit with research

2.Air show a flop with research

3.Air show a hit without research

4.Air show a flop without research

With the market research firm ...

NBC TV network will pay $40,000 to the market research firm.

Expected Value if the show is hit is 0.25 * 0.9 * 400,000 + 0.75 * 0.2 * 400,000 = $150,000

Expected Value if the show is Flop 0.25 * 0.1 * 100000 + 0.75 * 0.8 * 100000 = $62,500

So the total profit is 150000 - 62500 - 40,000 = $47,500

Without the market research firm

Expected Value if the show is hit is 0.25 * 400,000 = $100,000

Expected Value if the show is Flop 0.75 * 100000 = $75,000

So the total profit is 100000- 75000 = $25,000

Based on the above results :

“We would recommend NBC TV network to hire a market research firm before airing a show.”

Decision Tree

SIR

Problem Definition: Modern Flu Epidemic. A modern flu is spread throughout a small community with a fixed population of size n. The disease is spread through contact between infected persons and persons who are susceptible to the disease. Assume that everyone is susceptible to the disease initially and that the community is contained so that no one leaves the community. The model given in (12.39) can be used to model the epidemic throughout the community. Suppose b = 0.002; a = 0.7, and n = 1000

Decision Tree

SIR Function

Fixed population given as 1000. And the initial values as: S(0) = 1000, evreyone is susceptable at the beginning. I(0) = 2 R(0) = 0 We assume the transmission coefficient as, b = 0.002. And the recovery coefficient as , a = 0.7.

#load required libraries 
library(deSolve)
# Create an SIR function 
sir <- function(time, state, parameters) {
  with(as.list(c(state, parameters)), {
    dS <- -b * S * I 
    dI <- b * S * I - a * I 
    dR <- a * I
    return(list(c(dS, dI, dR)))
  })
}

Flu Epidemic

flu_epidemic <- function(S0, I0, R0, times, trans_param, recovery_param) 
{
  ### Set parameters
  init       <- c(S =S0, I = I0, R = R0)
  ## b: infection parameter; a: recovery parameter
  parameters <- c(b = trans_param, a = recovery_param)
  ## Solve using ode (Ordinary Differential Equations)
  out <- ode(y = init, times = times, func = sir, parms = parameters)
  ## change to data frame
  out <- as.data.frame(out)
  return (out)
}

Call with I(0) as 2

time	S	I	R
0	1000.00000	2.0000000	0.000000
1	991.84142	7.2913630	2.867215
2	963.12724	25.7233439	13.149415
3	873.14149	81.3783358	47.480175
4	668.40694	192.5928740	141.000182
5	410.36747	279.8864030	311.746129
6	235.34971	260.3115627	506.338732
7	149.96348	187.9588335	664.077683
8	110.48839	120.5149924	770.996618
9	91.34683	73.0702365	837.582938
10	81.52801	43.0883619	877.383625
11	76.27790	25.0413748	900.680726
12	73.39477	14.4388267	914.166408
13	71.78565	8.2891568	921.925193
14	70.87899	4.7470761	926.373938
15	70.36527	2.7148236	928.919906
16	70.07327	1.5513732	930.375357
17	69.90699	0.8861294	931.206882
18	69.81220	0.5060199	931.681781
19	69.75813	0.2889184	931.952949
20	69.72728	0.1649479	932.107770
21	69.70968	0.0941667	932.196158
22	69.69963	0.0537567	932.246617
23	69.69389	0.0306877	932.275422
24	69.69062	0.0175183	932.291866
25	69.68875	0.0100004	932.301253
26	69.68768	0.0057088	932.306611
27	69.68707	0.0032589	932.309670
28	69.68672	0.0018604	932.311416
29	69.68652	0.0010620	932.312413
30	69.68641	0.0006062	932.312982

Plot

Notice that, we have the highest number of infected people on the week 4.5 , around 280 people.

Everyone survives the epidemic, and not everyone gets the flu.

plot of chunk unnamed-chunk-10

Plot , I(0) = 5, I(0) =10

plot of chunk unnamed-chunk-11

Not much change here, the max infected people are around 277 during week 4.5

plot of chunk unnamed-chunk-12

When I(0)=10, the maximum infected people here are 292, occurs between week 3 and 4.

Plot , I(0) = 100

plot of chunk unnamed-chunk-13

Here, the max number of people infected are around 382 , with in the first 2 weeks itself!.

Though a large number of people infected at an early stage, even in this case, not everyone is infected.

Using S(0) = n - I(0)

population (n) = susceptible (S) + infected (I)

We assume that S(1) = n - I(1) , S(2) = n - I(2) etc. here

sir_custom <- function(s0, i0, r0, a, b, n)
{
  time <- numeric(25)
  S <- numeric(25)
  I <- numeric(25)
  R <- numeric(25)
  time[1] <- 0
  S[1] <- s0
  I[1] <- i0
  R[1] <- r0
  for(i in seq(2,25)) {
    s_prime <- -b * S[i-1] * I[i-1]
    i_prime <- (b * S[i-1] * I[i-1]) - (a * I[i-1])
    r_prime <- (a * I[i-1])
    I[i] <- (I[i-1] + i_prime * 1)
    S[i] <- n - I[i] #Instead of s[i-1] + s_prime * 1, we assign S[i] as n - I[i] and check the results.
    R[i] <- R[i-1] + r_prime
    time[i] <- i
  }
  return (data.frame(time, S, I, R))
}

Using S(0) = n - I(0)

plot of chunk unnamed-chunk-16

Infected population gradually increases and around week 12 it reaches a stable point, where in 350 are susceptible and 650 in the population are always infected.

Using S(0) = n - I(0)

plot of chunk unnamed-chunk-17

Around week 7, it reaches a point where in the population always had 350 susceptible and 650 infected.

For an epidemic like flu this does not seem to be practical.