Homework 1: Introduction to Statistics

Author: Linnea Martin


Question 4.) Identify (i) the cases, (ii) the variables and their types, and (iii) the main research question of the studies described below.

Question 4a.) While obesity is measured based on body fat percentage (more than 35% body fat for women and more than 25% for men), precisely measuring body fat percentage is difficult. Body mass index (BMI), calculated as the ratio weight/height2, is often used as an alternative indicator for obesity. A common criticism of BMI is that it assumes the same relative body fat percentage regardless of age, sex, or ethnicity. In order to determine how useful BMI is for predicting body fat percentage across age, sex and ethnic groups, researchers studied 202 black and 504 white adults who resided in or near New York City, were ages 20-94 years old, had BMIs of 18-35 kg/m2, and who volunteered to be a part of the study. Participants reported their age, sex, and ethnicity and were measured for weight and height. Body fat percentage was measured by submerging the participants in water.

  1. The cases are each of the 706 adults who volunteered.
  2. The variables are age (quantitative), sex (categorical and nominal), ethnicity (categorical and nominal), race (categorical and nominal) weight (quantitative), height (quantitative), location of residence (categorical and nominal), and BMI (quantitative).
  3. The main research question is how useful/accurate calculations of BMI are compared to observed body fat percentage for various ethnicities, ages and sexes.

Question 4b.) In a study of the relationship between socio-economic class and unethical behavior, 129 University of California undergraduates at Berkeley were asked to identify themselves as having low or high social-class by comparing themselves to others with the most (least) money, most (least) education, and most (least) respected jobs. They were also presented with a jar of individually wrapped candies and informed that they were for children in a nearby laboratory, but that they could take some if they wanted. Participants completed unrelated tasks and then reported the number of candies they had taken. It was found that those in the upper-class rank condition took more candy than did those in the lower-rank condition.

  1. The cases are each of the 129 university of California undergraduates at Berkeley.
  2. The variables are social class (categorical and ordinal), money (categorical and ordinal), education (categorical and ordinal), jobs (categorical and ordinal), and the number of candies taken (quantitative).
  3. The main research question is whether there is a correlation between the number of candies taken by the students and the (self-identified) social class as defined using money, education and jobs.

Question 6.) A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. Note that “£” stands for British Pounds Sterling, “cig” stands for cigarettes, and “N/A” refers to a missing component of the data

Question 6a.) What does each row of the data matrix represent?

Each row of the data matrix represents an individual participant in the making habits study.

Question 6b.) How many participants were included in the survey?

1691 participants were included in the study because there are 1691 rows in the data matrix.

Question 6c.) Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.

The variables in the study are: Gender (categorical and nominal) Age (quantitative) Marital Status (categorical and nominal) Gross Income (categorical and ordinal) Smoking or Nonsmoking (categorical and nominal) Number of Cigarettes smoked on weekends (quantitative) Number of Cigarettes smoked on weekdays (quantitative)


Question 14.) Sampling strategies. A statistics student who is curious about the relationship between the amount of time students spend on social networking sites and their performance at school decides to conduct a survey. Three research strategies for collecting data are described below. In each, name the sampling method proposed and any bias you might expect.

Question 14a.) He randomly samples 40 students from the study’s population, gives them the survey, asks them to fill it out and bring it back the next day.

This research strategy is a simple random sample. Non-response bias may be high for this sample because of the short time frame given to complete the survey.

Question 14b.) He gives out the survey only to his friends, and makes sure each one of them fills out the survey.

This research strategy is a convenience sample. There could be selection bias in this strategy because the student was only using his friends and so the sample may not be representative of the larger population. There will be less non-response bias because he makes sure each student fills out the survey.

Question 22.) Chia Pets – those terra-cotta figurines that sprout fuzzy green hair – made the chia plant a household name. But chia has gained an entirely new reputation as a diet supplement. In one 2009 study, a team of researchers recruited 38 men and divided them evenly into two groups: treatment or control. They also recruited 38 women, and they randomly placed half of these participants into the treatment group and the other half into the control group. One group was given 25 grams of chia seeds twice a day, and the other was given a placebo. The subjects volunteered to be a part of the study. After 12 weeks, the scientists found no significant difference between the groups in appetite or weight loss.

Question 22a.) What type of study is this?

This is a randomized blind experiment.

Question 22b.) What are the experimental and control treatments in this study?

The experimental treatment in this study is 25 grams of chia seeds twice a day for 12 weeks. The control treatment in this study is a placebo for the chia seeds.

Question 22c.) Has blocking been used in this study? If so, what is the blocking variable?

Blocking has been used in this study. The volunteers were first grouped by men and women and then were randomly assigned to the treatment and control groups; the blocking variable is gender.

Question 22d.) Has blinding been used in this study?

Blinding has been used in this experiment because of the placebo; the purpose of a placebo is that volunteers in the control group are not aware that they are not receiving the treatment (in this case the chia seeds).

Question 22e.) Comment on whether or not we can make a causal statement, and indicate whether or not we can generalize the conclusion to the population at large.

Because this is an experiment (containing both a treatment and a control) a casual statement could be made, but should not be made because there are many possible confounding variables (such as age, excercise, diet, etc.). This statement also has a relatively small sample size, and thus should not be generalized to the population at large. In order to reasonably make a causal statement, the experiment should be redone with more controlling and blocking of confounding variables.


Question 26.) Parameters and statistics. Identify which value represents the sample mean and which value represents the claimed population mean.

Question 26a.) A recent article in a college newspaper stated that college students get an average of 5.5 hrs of sleep each night. A student who was skeptical about this value decided to conduct a survey by randomly sampling 25 students. On average, the sampled students slept 6.25 hours per night.

An average of 5.5 hours of sleep represents the claimed population mean. 6.25 hours of sleep (the average hours slept for the 25 students) represents the sample mean.

Question 26b.) American households spent an average of about $52 in 2007 on Halloween merchandise such as costumes, decorations and candy. To see if this number had changed, researchers conducted a new survey in 2008 before industry numbers were reported. The survey included 1,500 households and found that average Halloween spending was $58 per household.

$52 per household in 2007 represents the claimed population mean. $58 per household (the average Halloween spending of the 1,500 households in 2008) represents the sample mean.

Question 26c.) The average GPA of students in 2001 at a private university was 3.37. A survey on a sample of 203 students from this university yielded an average GPA of 3.59 in Spring semester of 2012.

A GPA of 3.37 in 2001 at a private university represents the claimed population mean. A GPA of 3.59 (the average GPA of the 203 students in the Spring semester in 2012) represents the sample mean.