Chapter 16 (Part 2)

16.7 Exercises

1. In 1999, in England, Sally Clark was found guilty of the murder of two of her sons. Both infants were found dead in the morning, one in 1996 and another in 1998. In both cases, she claimed the cause of death was sudden infant death syndrome (SIDS). No evidence of physical harm was found on the two infants so the main piece of evidence against her was the testimony of Professor Sir Roy Meadow, who testified that the chances of two infants dying of SIDS was 1 in 73 million. He arrived at this figure by finding that the rate of SIDS was 1 in 8,500 and then calculating that the chance of two SIDS cases was \(8,500×8,500≈73 million\). Which of the following do you agree with?

Sir Meadow assumed that the probability of the second son being affected by SIDS was independent of the first son being affected, thereby ignoring possible genetic causes. If genetics plays a role then is. \(Pr(2^{nd} case SIDS|1^{st} case SIDS)<Pr(1^{st} case SIDS)\).

2. Let’s assume that there is in fact a genetic component to SIDS and the probability of \(Pr(2^{nd} case SIDS|1^{st} case SIDS)=1/100\), is much higher than 1 in 8,500. What is the probability of both of her sons dying of SIDS?

PR1<-1/8500
PR2<-1/100
PR1*PR2

## [1] 1.176471e-06

3. Many press reports stated that the expert claimed the probability of Sally Clark being innocent as 1 in 73 million. Perhaps the jury and judge also interpreted the testimony this way. This probability can be written as the probability of a mother is a son-murdering psychopath given that two of her children are found dead with no evidence of physical harm. According to Bayes’ rule, what is this? Pr(two children found dead with no evidence of harm∣mother is a murderer)Pr(mother is a murderer)/(Pr(two children found dead with no evidence of harm)) \(\frac{Pr(b|a)*Pr(a)}{Pr(b)}\)

Assume that the chance of a son-murdering psychopath finding a way to kill her children, without leaving evidence of physical harm, is: \(Pr(A|B)=0.50\) with A=two of her children are found dead with no evidence of physical harm and B=a mother is a son-murdering psychopath=0.50. Assume that the rate of son-murdering psychopaths mothers is 1 in 1,000,000. According to Bayes’ theorem, what is the probability of \(Pr(B| A)\)?

#Probability that 1st son dies of SIDS.#
Pr1<-1/8500
#Probability that 2nd son dies of SIDS.#
Pr2<-1/100
#Probability that both sons die without evidence.#
PrB<-Pr1*Pr2
#Probability that both sons don't die.#
PrnotB<-1-PrB
#Probability that mom is a  son-murdering psychopath who finds a way  to kill her children without leaving evidence of physical harm#
PrAB<-.5
#Probability that moms are murderers.#
PrA<-1/10^6
PrBA<-PrAB*PrA/(PrB)

5. After Sally Clark was found guilty, the Royal Statistical Society issued a statement saying that there was “no statistical basis” for the expert’s claim. They expressed concern at the “misuse of statistics in the courts”. Eventually, Sally Clark was acquitted in June 2003. What did the expert miss? He made two mistakes. First, he misused the multiplication rule and did not take into account how rare it is for a mother to murder her children. After using Bayes’ rule, we found a probability closer to 0.5 than 1 in 73 million.

6. Florida is one of the most closely watched states in the U.S. election because it has many electoral votes, and the election is generally close, and Florida tends to be a swing state that can vote either way. Create the following table with the polls taken during the last two weeks. Take the average spread of these polls. The CLT tells us this average is approximately normal. Calculate an average and provide an estimate of the standard error. Save your results in an object called results.

library(tidyverse)

## Warning: package 'stringr' was built under R version 4.3.3

## Warning: package 'lubridate' was built under R version 4.3.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dslabs) 
data(polls_us_election_2016) 
polls<-polls_us_election_2016|>filter(state == "Florida" & enddate >= "2016-11-04")|>  mutate(spread=rawpoll_clinton/100 - rawpoll_trump/100)
results<-summarize(polls, avg=mean(spread), sd(spread)/sqrt(n()))

Now assume a Bayesian model that sets the prior distribution for Florida’s election night spread \(d\) to be Normal with expected value \(μ\) and standard deviation \(τ\). What are the interpretations of \(μ\) and \(τ\)? \(μ\) and \(τ\) summarize what we would predict for Florida before seeing any polls.

8. The CLT tells us that our estimate of the spread \(d\) has normal distribution with expected value \(d\) and standard deviation \(σ\) calculated in problem 6. Use the formulas we showed for the posterior distribution to calculate the expected value of the posterior distribution if we set \(μ=0\) and \(τ=0.01\).

mu<-0
tau<-.01
std<-results[1,2]
y<-results[1,1]
B<-std^2/(std^2+tau^2)
expd<-B*mu+(1-B)*y

Now compute the standard deviation of the posterior distribution.

se<-(1/(1/std^2+1/tau^2))^.5

Using the fact that the posterior distribution is normal, create an interval that has a 95% probability of occurring centered at the posterior expected value. Note that we call these credible intervals.

lower<-expd-qnorm(.975)*se
upper<-expd+qnorm(.975)*se
ci<-c(lower, upper)

According to this analysis, what was the probability that Trump wins Florida?

pnorm(0, expd, se)

## [1] 0.3203769

Now use sapply function to change the prior variance from seq(0.05, 0.05, len=100) and observe how the probability changes by making a plot.

library(ggplot2)
library(dplyr)
library(dslabs)
data(polls_us_election_2016) 
polls<-polls_us_election_2016|>filter(state == "Florida" & enddate >= "2016-11-04")|>  mutate(spread=rawpoll_clinton/100 - rawpoll_trump/100)
results<-summarize(polls, avg=mean(spread), sd(spread)/sqrt(n()))
mtaus<-seq(.005, .05, len=100)
mu<-0
sig<-results[1, 2]
y<-results[1, 1]

pcalc<-function(tau) {
  B<-sig^2/(sig^2+tau^2)
  se<-sqrt(1/(1/(sig^2)+1/tau^2))
  expd<-B*mu+(1-B)*y
  pnorm(0, expd, se)
}
ps<-pcalc(mtaus)
chancedf<-data.frame(mtaus, ps)
chancedf

##           mtaus        ps
## 1   0.005000000 0.3715680
## 2   0.005454545 0.3643095
## 3   0.005909091 0.3577231
## 4   0.006363636 0.3517554
## 5   0.006818182 0.3463529
## 6   0.007272727 0.3414632
## 7   0.007727273 0.3370371
## 8   0.008181818 0.3330285
## 9   0.008636364 0.3293949
## 10  0.009090909 0.3260978
## 11  0.009545455 0.3231022
## 12  0.010000000 0.3203769
## 13  0.010454545 0.3178937
## 14  0.010909091 0.3156276
## 15  0.011363636 0.3135562
## 16  0.011818182 0.3116598
## 17  0.012272727 0.3099206
## 18  0.012727273 0.3083230
## 19  0.013181818 0.3068530
## 20  0.013636364 0.3054982
## 21  0.014090909 0.3042476
## 22  0.014545455 0.3030914
## 23  0.015000000 0.3020207
## 24  0.015454545 0.3010277
## 25  0.015909091 0.3001054
## 26  0.016363636 0.2992476
## 27  0.016818182 0.2984486
## 28  0.017272727 0.2977034
## 29  0.017727273 0.2970073
## 30  0.018181818 0.2963564
## 31  0.018636364 0.2957469
## 32  0.019090909 0.2951756
## 33  0.019545455 0.2946393
## 34  0.020000000 0.2941353
## 35  0.020454545 0.2936613
## 36  0.020909091 0.2932148
## 37  0.021363636 0.2927939
## 38  0.021818182 0.2923967
## 39  0.022272727 0.2920215
## 40  0.022727273 0.2916667
## 41  0.023181818 0.2913309
## 42  0.023636364 0.2910129
## 43  0.024090909 0.2907113
## 44  0.024545455 0.2904251
## 45  0.025000000 0.2901533
## 46  0.025454545 0.2898950
## 47  0.025909091 0.2896493
## 48  0.026363636 0.2894154
## 49  0.026818182 0.2891925
## 50  0.027272727 0.2889800
## 51  0.027727273 0.2887774
## 52  0.028181818 0.2885839
## 53  0.028636364 0.2883990
## 54  0.029090909 0.2882223
## 55  0.029545455 0.2880533
## 56  0.030000000 0.2878915
## 57  0.030454545 0.2877366
## 58  0.030909091 0.2875881
## 59  0.031363636 0.2874458
## 60  0.031818182 0.2873092
## 61  0.032272727 0.2871781
## 62  0.032727273 0.2870523
## 63  0.033181818 0.2869313
## 64  0.033636364 0.2868150
## 65  0.034090909 0.2867032
## 66  0.034545455 0.2865956
## 67  0.035000000 0.2864920
## 68  0.035454545 0.2863922
## 69  0.035909091 0.2862961
## 70  0.036363636 0.2862034
## 71  0.036818182 0.2861140
## 72  0.037272727 0.2860278
## 73  0.037727273 0.2859445
## 74  0.038181818 0.2858641
## 75  0.038636364 0.2857865
## 76  0.039090909 0.2857115
## 77  0.039545455 0.2856389
## 78  0.040000000 0.2855688
## 79  0.040454545 0.2855009
## 80  0.040909091 0.2854353
## 81  0.041363636 0.2853717
## 82  0.041818182 0.2853101
## 83  0.042272727 0.2852505
## 84  0.042727273 0.2851927
## 85  0.043181818 0.2851367
## 86  0.043636364 0.2850823
## 87  0.044090909 0.2850296
## 88  0.044545455 0.2849785
## 89  0.045000000 0.2849289
## 90  0.045454545 0.2848807
## 91  0.045909091 0.2848340
## 92  0.046363636 0.2847885
## 93  0.046818182 0.2847444
## 94  0.047272727 0.2847015
## 95  0.047727273 0.2846598
## 96  0.048181818 0.2846192
## 97  0.048636364 0.2845798
## 98  0.049090909 0.2845414
## 99  0.049545455 0.2845040
## 100 0.050000000 0.2844677

plot(chancedf$mtaus, chancedf$ps)

Chapter 16 (Part 2)

Dimple K. Patel

2024-04-27