Statistics & Data Analysis - 101

Mohar Guha
HealthTap

Making Data Meaningful

  • Present and describe information from data

  • Draw conclusions about large population based on samples

  • Make decisions / Conclusions

  • Detect changes in a process

  • Obtain forecasts

Making Data Meaningful

  • Present and describe information from data

    • Exploratory Analysis, Statistical Summary
  • Estimating large population characteristics based on samples

    • Point and Interval Estimates of Population Parameters
  • Make Decisions based on Samples

    • Statistical Inference
  • Detect changes in a process

    • Statistical Inference, Time Series Analysis
  • Obtain forecasts

    • Regression, Time Series analysis

Statistics

  • Descriptive Statistics: Feel for the data

    • Measures of central tendency (mean, median, mode)
    • Measures of dispersion (variance, dispersion)
    • How many people bought a product? Customer profiles?

Inferential Statistics

help decision makers

  • Where should we sell the product to make the most profit?

Probability as a bridge

  • Probability measures the accuracy of our inference.

Probability vs Statistical Reasoning

Example: Find the probablility that the first car I see in morning is a Tesla.

Probability vs Statistical Reasoning

Example: Find the probablility that the first car I see in morning is a Tesla.

  • Scenario I: Suppose I exactly know the proportion of car makes in California - compute the probability exactly.

    • Probabilistic Reasoning - Know the population and predict the sample

Probability vs Statistical Reasoning

Example: Find the probablility that the first car I see in morning is a Tesla.

  • Scenario I: Suppose I exactly know the proportion of car makes in California - compute the probability exactly.

    • Probabilistic Reasoning - Know the population and predict the sample
  • Scenario II : Do not have the information

Probability vs Statistical Reasoning

Example: Find the probablility that the first car I see in morning is a Tesla.

  • Scenario I: Suppose I exactly know the proportion of car makes in California - compute the probability exactly.

    • Probabilistic Reasoning - Know the population and predict the sample
  • Scenario II : Do not have the information - Statistical Reasoning

  • Collect a random sample of \(n\) cars in the street

  • Measure "how often" you see a Tesla \[\text{Relative Frequency}=\frac{f}{n}\]

Probability vs Statistical Reasoning

Example: Find the probablility that the first car I see in morning is a Tesla.

  • Scenario I: Suppose I exactly know the proportion of car makes in California - compute the probability exactly.

    • Probabilistic Reasoning - Know the population and predict the sample
  • Scenario II : Do not have the information - Statistical Reasoning

  • Collect a random sample of \(n\) cars in the street

  • Measure "how often" you see a Tesla \[\text{Relative Frequency}=\frac{f}{n}\]

  • As \(n\) increases, \[\begin{eqnarray*} \text{Sample}&\rightarrow&\text{Population}\\ \text{Relative Frequency}&\rightarrow&\text{Probability} \end{eqnarray*}\]

Random Variables and Probablility Distributions

plot of chunk unnamed-chunk-4

  • \(X\): Number of heads in 10 tosses of a unbiased coin - Binomial Random Variable

  • \(X\): number of phone calls arriving at your help desk in a 12-hour period - Poisson Random Variable

Why is Normal Distribution So Important?

  • Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another 'abnormal'. - Pearson

  • The position of a particle that experiences diffusion, exactly follows normal distribution

  • Logarithm of size of living tissue is assumed to follow a normal distribution

  • For large sample size, binomial and Poisson random variable follows approximately normal distribution

Why is Normal Distribution So Important?

  • Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another 'abnormal'. - Pearson

  • The position of a particle that experiences diffusion, exactly follows normal distribution

  • Logarithm of size of living tissue is assumed to follow a normal distribution

  • For large sample size, binomial and Poisson random variable follows approximately normal distribution

  • Normal Distribution is immensely popular due to Central Limit Theorem

Central Limit Theorem

  • Distribution of sum of large number of random variables will be approxmately normally distributed

  • Random variables ahould be independent and come from the same distribution

  • This result is true for NO matter what the underlying distribution is.

  • Application demonstrating Central Limit Theorem (Changes to be made to the app) 'http://guhapp.shinyapps.io/myapp/'

Sampling Distribution of Sample Mean

  • Take samples of size \(n\) from a population with parameters \(\mu\) (mean) and std deviation \(\sigma\)

  • The mean score \(\bar{X}\) for each sample creates a sampling distribution of mean

Sampling Distribution of Sample Mean

  • Take samples of size \(n\) from a population with parameters \(\mu\) (mean) and std deviation \(\sigma\)

  • The mean score \(\bar{X}\) for each sample creates a sampling distribution of mean

  • \(E[\bar{X}]=\mu\) and \(\text{SD}[\bar{X}]=\frac{\sigma}{\sqrt{n}}\)

Sampling Distribution of Sample Mean

  • Take samples of size \(n\) from a population with parameters \(\mu\) (mean) and std deviation \(\sigma\)

  • The mean score \(\bar{X}\) for each sample creates a sampling distribution of mean

  • \(E[\bar{X}]=\mu\) and \(\text{SD}[\bar{X}]=\frac{\sigma}{\sqrt{n}}\)

  • For large enough \(n\) the distribution of \(\bar{X}\) is approximately Normal Distribution

How large is large enough?

  • If population distribution is normal- any sample size (\(n>1\)) works

  • If sampling distribution is symmetric, unimodal, without outliers - sample size is 15 or less

  • If sampling distribution is moderately skewed, unimodal, without outliers - sample size is between 16 and 40

  • Else sample size is greater than 40, without outliers

Interval Estimation

  • Is sample mean (\(\bar{X}\)) the best estimate of \(\mu\) ?

Interval Estimation

  • Is sample mean (\(\bar{X}\)) the best estimate of \(\mu\) ? Interval Estimate

  • Or use the sample mean (\(\bar{X}\)) and provide an interval centered around \(\bar{X}\) of likely values of the population mean \(\mu\)?

Example

Ghirardelli Chocolate Company claims that a 20oz gift bag contains 50 squares. To test the claim we administer a study by sampling 60 bags out of 200 sent by the company.

plot of chunk unnamed-chunk-10

Example

Ghirardelli Chocolate Company claims that a 20oz gift bag contains 50 squares. To test the claim we administer a study by sampling 60 bags out of 200 sent by the company.

plot of chunk unnamed-chunk-11

  • Estimate the average number of squares \(\mu\) in the "population" of all gift bags.

Example

Ghirardelli Chocolate Company claims that a 20oz gift bag contains 50 squares. To test the claim we administer a study by sampling 60 bags out of 200 sent by the company.

plot of chunk unnamed-chunk-12

  • Estimate the average number of squares \(\mu\) in the "population" of all gift bags.

  • Sample mean of \(\bar{X}\) is $\bar{x}=$47.72 (not good)

Example

Ghirardelli Chocolate Company claims that a 20oz gift bag contains 50 squares. To test the claim we administer a study by sampling 60 bags out of 200 sent by the company.

plot of chunk unnamed-chunk-13

  • Estimate the average number of squares \(\mu\) in the "population" of all gift bags.

  • Sample mean of \(\bar{X}\) is $\bar{x}=$47.72 (not good)

  • Sample variance \(s^2=\) 9.58

Example

Ghirardelli Chocolate Company claims that a 20oz gift bag contains 50 squares. To test the claim we administer a study by sampling 60 bags out of 200 sent by the company.

plot of chunk unnamed-chunk-14

  • Estimate the average number of squares \(\mu\) in the "population" of all gift bags.

  • Sample mean of \(\bar{X}\) is $\bar{x}=$47.72 (not good)

  • Sample variance \(s^2=\) 9.58.

  • Confidence Interval : \((47.72-\rm error,47.72+\rm error)\)

Intuition on how to compute error

  • How good is the sample mean estimate \(\bar{x}\) in estimating \(\mu\)?

Intuition on how to compute error

  • How good is the sample mean estimate \(\bar{x}\) in estimating \(\mu\)?

  • If the sample size \(n\) is large, then the estimate is good - \[\text{error}\propto\frac{1}{n}\]

Intuition on how to compute error

  • How good is the sample mean estimate \(\bar{x}\) in estimating \(\mu\)?

  • If the sample size \(n\) is large, then the estimate is good - \[\text{error}\propto\frac{1}{n}\]

  • If the variance in the sample is high, then estimate is not good - \[\text{error}\propto\frac{s^2}{n}\]

Intuition on how to compute error

  • How good is the sample mean estimate \(\bar{x}\) in estimating \(\mu\)?

  • If the sample size \(n\) is large, then the estimate is good - \[\text{error}\propto\frac{1}{n}\]

  • If the variance in the sample is high, then estimate is not good - \[\text{error}\propto\frac{s^2}{n}\]

  • Close enough, \[\text{error}=z^{*}\frac{s}{\sqrt{n}},\] where the \(z^*\) is a value from standard normal distribution.

95% Confidence Interval for \(\mu\)

  • By Central Limit Theorem \(\bar{X}\) is \(N(\bar{x},\frac{s}{\sqrt{n}})\).

95% Confidence Interval for \(\mu\)

  • By Central Limit Theorem \(\bar{X}\) is \(N(\bar{x},\frac{s}{\sqrt{n}})\).

  • \(P(\bar{X}>|z|)=0.5 \implies P(\frac{\bar{X}-\bar{x}}{s/\sqrt{n}}>\left|\frac{z-\bar{x}}{s/\sqrt{n}}\right|)=0.5\implies \frac{z-\bar{x}}{s/\sqrt{n}}=\pm z^{*}_{0.25}=\pm 1.96\)

95% Confidence Interval for \(\mu\)

  • By Central Limit Theorem \(\bar{X}\) is \(N(\bar{x},\frac{s}{\sqrt{n}})\).

  • \(P(\bar{X}>|z|)=0.5 \implies P(\frac{\bar{X}-\bar{x}}{s/\sqrt{n}}>\left|\frac{z-\bar{x}}{s/\sqrt{n}}\right|)=0.5\implies \frac{z-\bar{x}}{s/\sqrt{n}}=\pm z^{*}_{0.25}=\pm 1.96\)

  • 95% confidence ineterval for the mean is \[\bar{x}\pm z^{*}\frac{s}{\sqrt{n}}= \left(49.22 - 1.96\frac{\sqrt{9.58}}{\sqrt{60}},49.22 + 1.96\frac{\sqrt{9.58}}{\sqrt{60}}\right)=(48.44,50)\]

95% Confidence Interval for \(\mu\)

  • By Central Limit Theorem \(\bar{X}\) is \(N(\bar{x},\frac{s}{\sqrt{n}})\).

  • \(P(\bar{X}>|z|)=0.5 \implies P(\frac{\bar{X}-\bar{x}}{s/\sqrt{n}}>\left|\frac{z-\bar{x}}{s/\sqrt{n}}\right|)=0.5\implies \frac{z-\bar{x}}{s/\sqrt{n}}=\pm z^{*}_{0.25}=\pm 1.96\)

  • 95% confidence ineterval for the mean is \[\bar{x}\pm z^{*}\frac{s}{\sqrt{n}}= \left(49.22 - 1.96\frac{\sqrt{9.58}}{\sqrt{60}},49.22 + 1.96\frac{\sqrt{9.58}}{\sqrt{60}}\right)=(48.44,50)\]

Interpretation of 95% confidence interval

  • If we repeat the following experiment 100 times

    • Collect sample from population of size \(n\).
    • Calculate confidence intervals

Interpretation of 95% confidence interval

  • If we repeat the following experiment 100 times

    • Collect sample from population of size \(n\).
    • Calculate confidence intervals
  • Out of 100 confidence intervals only 95 of them contain the population parameter.

Interpretation of 95% confidence interval

  • If we repeat the following experiment 100 times

    • Collect sample from population of size \(n\).
    • Calculate confidence intervals
  • Out of 100 confidence intervals only 95 of them contain the population parameter.

  • Calculation of confidence intervals used normal distribution (justified by CLT)

  • Assumptions that the sample size is large and population variance is known

How large sample size do I need?

  • How accurately do you need the answer?

  • What level of confidence do you intend to use?

  • Any historical knowledge about the data?

How large sample size do I need?

  • How accurately do you need the answer?

  • "We need a margin of error less than 4%"

  • What level of confidence do you intend to use?

  • "95% confidence interval"

  • Any historical knowledge about the data?

  • "From previous study the variance of the number of chococate squares is around 3"

How large sample size do I need?

  • How accurately do you need the answer?

  • "We need a margin of error less than 4%"

  • What level of confidence do you intend to use?

  • "95% confidence interval"

  • Any historical knowledge about the data?

  • "From previous study the variance of the number of chococate squares is around 3"

  • Answer: \(\text{SE}=z^*\frac{s}{\sqrt{n}}\leq 0.04\implies n\geq 7203\)

Testing of Hypothesis

  • Find out if the data confirm a specific hypothesis.

  • Null Hypothesis : \(H_{0}\) - status quo - initially assumed true

  • Alternative Hypothesis : \(H_{A}\) - the researcher's proposal - what you hope to show.

Testing of Hypothesis

  • Find out if the data confirm a specific hypothesis.

  • Null Hypothesis : \(H_{0}\) - status quo - initially assumed true

  • Alternative Hypothesis : \(H_{A}\) - the researcher's proposal - what you hope to show.

  • Main idea : Reject the null hypothesis in favor of the alternative only with significant evidence.

    • Cannot say : We accept the alternative hypothesis
    • Can Say : Significant evidence to reject the null hypothesis

Errors in hypothesis testing

  • Consider mistakes in jury trial.

  • Null Hypothesis \(H_0\): The victim is not guilty.

  • Innocent man is pronounced guilty: Reject \(H_0\) when it is true : TYPE I ERROR

Errors in hypothesis testing

  • Consider mistakes in jury trial.

  • Null Hypothesis \(H_0\): The victim is not guilty.

  • Innocent man is pronounced guilty: Reject \(H_0\) when it is true : TYPE I ERROR

  • Probability of Type I Error = \(\alpha\) = SIGNIFICANCE LEVEL

Errors in hypothesis testing

  • Consider mistakes in jury trial.

  • Null Hypothesis \(H_0\): The victim is not guilty.

  • Innocent man is pronounced guilty: Reject \(H_0\) when it is true : TYPE I ERROR

  • Probability of Type I Error = \(\alpha\) = SIGNIFICANCE LEVEL

  • Guilty man is pronounced innocent : Accept \(H_0\) when it is false : TYPE II ERROR

Errors in hypothesis testing

  • Consider mistakes in jury trial.

  • Null Hypothesis \(H_0\): The victim is not guilty.

  • Innocent man is pronounced guilty: Reject \(H_0\) when it is true : TYPE I ERROR

  • Probability of Type I Error = \(\alpha\) = SIGNIFICANCE LEVEL

  • Guilty man is pronounced innocent : Accept \(H_0\) when it is false : TYPE II ERROR

  • 1- Probability of Type II Error: (reject \(H_0\) when false) = \(\beta\) = POWER

Steps in Hypothesis Testing

  • Define parameter

  • Give null and alternative hypothesis

  • Select significance level \(\alpha\) (typical values 0.05, 0.01, 0.10)

  • Give test statistic formula - \(\frac{\text{Expected}-\text{Obsereved}}{\text{Standard Error}}\)

  • Verify the conditions of the test

  • Compute p- value - Probability of getting a value as extreme as the test statistic, assuming \(H_0\) is true.

  • State conclusion - If p-value \(\leq \alpha\) , significant evidence to reject \(H_0\) else fail to reject \(H_0\).

Example

In a study of math students at a high school, the backgrounds of students successful in entry-level courses were checked. For 30 students from city backgrounds, the average test score was 78 with a standard deviation of 10; and for 25 students from rural backgrounds, the average test score was 85 with a standard deviation of 15. Is there evidence that the average test score is same for both group of students?

Example

In a study of math students at a high school, the backgrounds of students successful in entry-level courses were checked. For 30 students from city backgrounds, the average test score was 78 with a standard deviation of 10; and for 25 students from rural backgrounds, the average test score was 85 with a standard deviation of 15. Is there evidence that the average test score is same for both group of students?

  • \(\mu_1\) be the mean test score of city students who succeed in the course, and \(\mu_2\) be the mean test score of all rural students who succeed.

  • \(H_{0}:\mu_{1}-\mu_{2}=0\), \(H_{A}:\mu_{1}- \mu_{2}\neq 0\)

Example

In a study of math students at a high school, the backgrounds of students successful in entry-level courses were checked. For 30 students from city backgrounds, the average test score was 78 with a standard deviation of 10; and for 25 students from rural backgrounds, the average test score was 85 with a standard deviation of 15. Is there evidence that the average test score is same for both group of students?

  • \(\mu_1\) be the mean test score of city students who succeed in the course, and \(\mu_2\) be the mean test score of all rural students who succeed.

  • \(H_{0}:\mu_{1}-\mu_{2}=0\), \(H_{A}:\mu_{1}- \mu_{2}\neq 0\)

  • For this analysis set \(\alpha=0.10\)

Example

In a study of math students at a high school, the backgrounds of students successful in entry-level courses were checked. For 30 students from city backgrounds, the average test score was 78 with a standard deviation of 10; and for 25 students from rural backgrounds, the average test score was 85 with a standard deviation of 15. Is there evidence that the average test score is same for both group of students?

  • \(\mu_1\) be the mean test score of city students who succeed in the course, and \(\mu_2\) be the mean test score of all rural students who succeed.

  • \(H_{0}:\mu_{1}-\mu_{2}=0\), \(H_{A}:\mu_{1}- \mu_{2}\neq 0\)

  • For this analysis set \(\alpha=0.10\)

  • \(\rm{SE}=\sqrt{\frac{s_{1}^2}{n_1}+\frac{s_{2}^2}{n_2}}=3.51\) and \(t=\frac{\bar{\mu}_{1}-\bar{\mu}_{2}-0}{\rm{SE}}=-1.99\) with \(\rm{df}=40.47\).

Example

In a study of math students at a high school, the backgrounds of students successful in entry-level courses were checked. For 30 students from city backgrounds, the average test score was 78 with a standard deviation of 10; and for 25 students from rural backgrounds, the average test score was 85 with a standard deviation of 15. Is there evidence that the average test score is same for both group of students?

  • \(\mu_1\) be the mean test score of city students who succeed in the course, and \(\mu_2\) be the mean test score of all rural students who succeed.

  • \(H_{0}:\mu_{1}-\mu_{2}=0\), \(H_{A}:\mu_{1}- \mu_{2}\neq 0\)

  • For this analysis set \(\alpha=0.10\)

  • \(\rm{SE}=\sqrt{\frac{s_{1}^2}{n_1}+\frac{s_{2}^2}{n_2}}=3.51\) and \(t=\frac{\bar{\mu}_{1}-\bar{\mu}_{2}-0}{\rm{SE}}=-1.99\) with \(\rm{df}=40.47\).

  • P-value \(= P(t<-1.99)+P(t>1.99)=0.054\)

Example

In a study of math students at a high school, the backgrounds of students successful in entry-level courses were checked. For 30 students from city backgrounds, the average test score was 78 with a standard deviation of 10; and for 25 students from rural backgrounds, the average test score was 85 with a standard deviation of 15. Is there evidence that the average test score is same for both group of students?

  • \(\mu_1\) be the mean test score of city students who succeed in the course, and \(\mu_2\) be the mean test score of all rural students who succeed.

  • \(H_{0}:\mu_{1}-\mu_{2}=0\), \(H_{A}:\mu_{1}- \mu_{2}\neq 0\)

  • For this analysis set \(\alpha=0.10\)

  • \(\rm{SE}=\sqrt{\frac{s_{1}^2}{n_1}+\frac{s_{2}^2}{n_2}}=3.51\) and \(t=\frac{\bar{\mu}_{1}-\bar{\mu}_{2}-0}{\rm{SE}}=-1.99\) with \(\rm{df}=40.47\).

  • P-value \(= P(t<-1.99)+P(t>1.99)=0.054\)

  • Interpret results: Since P-value is less than the significance level \(\alpha=0.10\), we have statistically significant evidence to reject the null hypothesis.

To P-val or not to P-val

  • P-value very small indicates that the observed effect (null hypothesis) is unlikely to have occured purely by chance

    • P-value below a predefined limit (significance level) provides statistically significant evidence against the null hypothesis.
  • P-value is moderately large then it is incorrect to interpret - There is evidence that intervention (alternate hypothesis) has no effect, alternative is much more plausible given the data.

Take a Break

Confidence Interval for Hypothesis testing

  • Compute the \(100(1-\alpha)\%\) confidence interval for the difference in mean

  • Check if the hypothesized value is in the interval

  • \(100(1-\alpha)\%\) confidence interval gives the range of values that should not be rejected using a \(\alpha\) level test.

  • If P-value of the test is less than \(\alpha\) (it is significant), the confidence interval will NOT contain the hypothesized mean.

  • If P-value of the test is greater than \(\alpha\) (it is not significant), the confidence interval will contain the hypothesized mean.

Checking Assumptions of \(t\) test

  • Each of the two populations being compared should follow a normal distribution.

Checking Assumptions of \(t\) test

  • Each of the two populations being compared should follow a normal distribution.

  • Tests to check normality

    • Shapiro–Wilk*
    • Kolmogorov–Smirnov*
    • Assessed graphically using a normal quantile plot.

Checking Assumptions of \(t\) test

  • Each of the two populations being compared should follow a normal distribution.

  • Tests to check normality

    • Shapiro–Wilk*
    • Kolmogorov–Smirnov* test
    • Assessed graphically using a normal quantile plot.
  • Data for classical t-tests should be sampled independently from the two populations being compared.

Methods to overcome non-normality

  • Try to investigate the reasons for non normality- outliers, sampling error

  • Depending on the shape of the distribution of the sample, log, square root or reciprocal transformation can be made to reduce skewness of the data. plot of chunk unnamed-chunk-21

  • For non normal data conduct non-parametric tests (Mann Whitney)

Appendix

A/B Testing Scenario : When to end the test?

  • Consider conversion rates \(c_A\) and \(c_B\) for two versions \(A\) and \(B\), green versus red signup button.

  • Simulate two experiment with population conversion rates \(c_{A}=0.51\) and \(c_B = 0.55\)

set.seed(456)
c_A=0.5
c_B=0.55
n=500
x.A=sum(rbinom(n,1,c_A))
x.B=sum(rbinom(n,1,c_B))
pval1=prop.test(c(x.A,x.B),c(n,n))$p.value

The two sided proportion test gives us a p-value 0.658, we cannot reject null hypothesis at 5% significance level.

A/B Testing Scenario : When to end the test?

  • We repeat the experiment for 5000 points and we get a significant p-value 3.6271 × 10-10.

  • What is the optimal \(n\)?

Power analysis

  • How many samples do we need to detect the difference?

  • Set a level for power, say 80% rejecting the null hypothesis when false.

  • Say \(p_1\) is the current conversion rate and \(p_2\) is the effect you wish to detect.

test<- power.prop.test(p1=0.5, p2=0.55, sig.level=0.05, power=0.8)

The sample size needed to pickup 10% increase in conversion rate at a 5% significance level and 80% power of test is 1565.

What more can we do?

  • Using frequentist test we can make a statement like:

    • "We rejected the null hypothesis that A=B with a P-value of 0.039"
  • We cannot make a statement like:

    • "There is an 60% chance that A has 10% lift over B"
  • We can take a Bayesian approach.

What more can we do?

Bayesian and frequentist reasoning in plain English

  • I have misplaced my phone somewhere in the home. I can use the phone locator on the base of the instrument to locate the phone and when I press the phone locator the phone starts beeping.

  • Problem: Which area of my home should I search?

Bayesian and frequentist reasoning in plain English

  • I have misplaced my phone somewhere in the home. I can use the phone locator on the base of the instrument to locate the phone and when I press the phone locator the phone starts beeping.

  • Problem: Which area of my home should I search?

  • Frequentist Reasoning: I can hear the phone beeping. I also have a mental model which helps me identify the area from which the sound is coming from. Therefore, upon hearing the beep, I infer the area of my home I must search to locate the phone.

Bayesian and frequentist reasoning in plain English

  • I have misplaced my phone somewhere in the home. I can use the phone locator on the base of the instrument to locate the phone and when I press the phone locator the phone starts beeping.

  • Problem: Which area of my home should I search?

  • Frequentist Reasoning:

    I can hear the phone beeping. I also have a mental model which helps me identify the area from which the sound is coming from. Therefore, upon hearing the beep, I infer the area of my home I must search to locate the phone.

  • Bayesian Reasoning:

    I can hear the phone beeping. Now, apart from a mental model which helps me identify the area from which the sound is coming from, I also know the locations where I have misplaced the phone in the past. So, I combine my inferences using the beeps and my prior information about the locations I have misplaced the phone in the past to identify an area I must search to locate the phone.

One more

Let us say a man rolls a six sided die and it has outcomes 1, 2, 3, 4, 5, or 6. Furthermore, he says that if it lands on a 3, he'll give you a free text book.

One more

Let us say a man rolls a six sided die and it has outcomes 1, 2, 3, 4, 5, or 6. Furthermore, he says that if it lands on a 3, he'll give you a free text book.

  • Frequentist : Each outcome has an equal 1 in 6 chance of occurring. Probability is viewed as long run frequency distributions.

One more

Let us say a man rolls a six sided die and it has outcomes 1, 2, 3, 4, 5, or 6. Furthermore, he says that if it lands on a 3, he'll give you a free text book.

  • Frequentist : Each outcome has an equal 1 in 6 chance of occurring. Probability is viewed as long run frequency distributions.

  • Bayesian : Hang on a second, I know that man, he's David Blaine, a famous trickster! I have a feeling he's up to something. I'm going to say that there's only a 1% chance of it landing on a 3 BUT I'll re-evaluate that belief and change it the more times he rolls the die. If I see the other numbers come up equally often, then I'll iteratively increase the chance from 1% to something slightly higher, otherwise I'll reduce it even further. Probability is viewed as degrees of belief in a proposition.

Quiz 1

P value is

(A) The ctritical value of a test

(B) The estimate of the population parameter

(C) Probability that the null hypothesis is true

(D) Percentages of experiments in which the sample differences would be larger or smaller than we observed.

Quiz 2

A biologist has taken a random sample of a specific type of fish from a large lake. A 95 percent confidence interval was calculated to be [5.6,8] pounds. Which of the following is true?

(A) 95 percent of all the fish in the lake weigh between 5.6 and 8 pounds.

(B) In repeated sampling, 95 percent of the sample proportions will fall within 5.6 and 8 pounds.

(C) In repeated sampling, 95% of the time the true population mean of fish weights will be equal to 6.8 pounds.

(D) In repeated sampling, 95% of the time the true population mean of fish weight will be captured in the constructed interval.

(E) We are 95 percent confident that all the fish weigh less than 8 pounds in this lake.

Quiz 3

A manufacturer claims that a particular automobile model will get 50 miles per gallon on the highway. The researchers at a consumer-oriented magazine believe that this claim is high and plan a test with a simple ramdom sample of 30 cars. Assuming the standard deviation between individual cars is 2.3 miles per gallon, what should the researchers conclude if the sample mean is 49 miles per gallon and the P-value for the test is 0.0087?

(A) There is not sufficient evidence to reject the manufacturer’s claim; 49 miles per gallon is too close to the claimed 50 miles per gallon.

(B) The manufacturer’s claim should not be rejected because the P-value of .0087 is too small.

(C) The manufacturer’s claim should be rejected because the sample mean is less than the claimed mean.

(D) The P-value of .0087 is sufficient evidence to reject the manufacturer’s claim.

(E) The P -value of .0087 is sufficient evidence to prove that the manufacturer’s claim is false.

Quiz 4

A 90% confidence interval for a population mean \(\mu\) is determined to be (800,900). If the confidence level is increased to 95% while the sample statistics and sample size remain same, the confidence interval for \(\mu\) becomes

(A) narrower

(B) 0.05

(C) wider

(D) 0.025

(C) does'nt change since the sample does'nt change.