Corinne Riddell
August 22, 2018
A clear statement of what we are trying to achieve.
The procedures we use to carry out the study.
The data which is collected according to the Plan.
The data is summarized and analysed to answer the questions posed by the Problem.
Conclusions are drawn about what has been learned about answering the Problem.
Problem: Suppose we wish to study the smoking behaviour of California residents aged 14-20 years. In particular, we are interested in the prevalence of current smoking by gender.
Plan: We need to first choose a time period, because we know that smoking behaviour has changed immensely over time. It is infeasible to gather these data for all residents in California who are 14-20 years old. Instead we conduct a random sample of size \(n\) persons. We collect their: age, gender, and smoking status.
Note that we need to decide how large \(n\) should be, and how to obtain the random sample. The latter question is, in particular, very important if we want to ensure that our sample is representative of the population of interest. Time and money also constrain how the sample will be collected.
Data: Suppose that a random sample of 200 persons aged 14-20 was selected, yielding these data:
Gender | Number of smokers | Number of non-smokers | Total |
---|---|---|---|
Teen girls and women | 32 | 66 | 98 |
Teen boys and men | 27 | 75 | 102 |
Total | 59 | 141 | 200 |
Analysis: The proportion of women in the sample who smoke is 32/98 = 33%. The proportion of men in the sample who smoke is 27/102 = 26%.
We would also like some idea as to how close this estimate is likely to be from the actual proportion in the population. If we selected a second random sample of the same size, we would likely estimate different proportions for men and women. We will learn how to estimate the precision of these estimates.
Conclusion: 33% of girls and women aged 14-20 and 26% of boys and men of the same age group are current smokers in California in 2018 (plus a measure of uncertainty).
The PPDAC method is described based on course notes from STAT 231 from the University of Waterloo (Ontario, Canada). Spring 2006 Course Packet.