Chapter 1

Chapter 1: #14

  1. Number of sexual partners in a year - Discrete numerical.
  2. Petal area of rose flowers - Continuous numerical.
  3. hearbeats per minute of a tour de france cyclist, averaged over the duration of a race - continuous numerical
  4. birth weight - continuous numerical.
  5. Stage of fruit ripeness - ordinal categorical.
  6. angle of flower orientation relative to position of the sun - continuous numerical.
  7. tree species - nominal categorical.
  8. Year of birth - discrete numerical.
  9. Gender - discrete numerical (or is it)

Chapter 1: #16

  1. Leaving out cell phones from the random sample of voters or consumers would bias the sample towards people with landlines, including elderly people, less busy people, etc. It could also violate the independence criterion for random sampling, in that families share landlines, so more respondents would be related than would be in the cell phone and landline sample.
  2. Probably the equal chance criterion, as landline use is becoming increasingly rare.
  3. Accuracy, because the landline sample would be more homogenous, but overall less representative of the population; and precision because the landline sample would include more non-independent observations that would deviate from the population average as well as the sample average.

Chapter 1: #17

  1. Population of interest – all piñon pine trees in the coast ranges of California.
  2. The trees within each plot are in relatively close proximity to each other and are thereby influencing each other-each tree in a plot would not be an independent observation.

Chapter 1: #18

  1. I don’t think there’s reason to believe that there’s sampling error in the tree survey, unless 500 plots is too small of a portion of the species cover in California, or unless their surveying techniques/protocol was inconsistent. Tree groups had an equal and independent chance of being plotted
  2. There would probably be more sampling error if there were only 100 plots sampled, as the sample average would be more influenced by less representative or outlier plots.

Chapter 1: #19

  1. explanatory - mutation presence, response - mortality.
  2. explanatory - anxiety treatment type, response - anxiety score.
  3. explanatory - reward sensitivity, response - brain activity.
  4. explanatory - endostatin dose, response - tumor growth rate.

Chapter 1: #20

  1. observational.
  2. experimental.
  3. observational.
  4. experimental

Chapter 1: #24

  1. plant species (categorical, explanatory) and leaves removed (numerical, response).
  2. The leaves are not a random sample because they’re likely not independent-many of the leaves probably come from the same trees-and the sample is limited to trees in the immediate forest, which could be different from the general tree population.
  3. It would likely affect the precision because leaves had a high chance of coming from the same trees, which could be unrepresentative of the leaves in the whole forest. So a variety of less representative leaves would skew the sample.
  4. Ant colonies were randomly selected.

Chapter 2

Chapter 2: #20

  1. 12 mm.
  2. 55%
  3. Make the bin width smaller.
  4. bimodal.

Chapter 2: #23

  1. Two numerical values, temperature is continuous but looks to be recorded as discrete, flicker is discrete.
  2. Scatter plot with strip-like quality.
  3. Nonlinear.
  4. No, because the values aren’t independent (many come from single fish) and the 20 fish might be non-representative, i.e., from the same region/related.

Chapter 2: #29

  1. Line plot.
  2. There is a correlation between cases of influenza reported and number of Google searches for “flu” or “influenza”. Because searches increase significantly when influenza cases begin to increase and decrease significantly when new cases are no longer reported, it may be the case that influenza activity causes people to look up the virus on Google.
  3. Clarify what ILI stands for, maybe overlap different years to see how they compare and to see the data more closely.

Chapter 2: #32

  1. Contingency table.
  2. Explanatory - Infection status, response - accident status.
  3. Infection status is being compared in groups who had driving accidents and who haven’t had driving accidents. A higher proportion of infected people had accidents compared to uninfected people.

Chapter 2: #33

mydata <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02q33BirthMonthADHD.csv")
head(mydata)
##   birthMonth diagnosis frequencies
## 1    January      ADHD        2219
## 2    January   no ADHD       36917
## 3   December      ADHD        2870
## 4   December   no ADHD       36107
str(mydata)
## 'data.frame':    4 obs. of  3 variables:
##  $ birthMonth : Factor w/ 2 levels "December","January": 2 2 1 1
##  $ diagnosis  : Factor w/ 2 levels "ADHD","no ADHD": 1 2 1 2
##  $ frequencies: int  2219 36917 2870 36107
mydataMat <- matrix(mydata$frequencies, byrow = FALSE, nrow = 2)

colnames(mydataMat) <- c("January Birth", "December Birth")
rownames(mydataMat) <- c("ADHD", "no ADHD")

mosaicplot(t(mydataMat), 
           col = c("burlywood2", "coral"), 
           sub = "Birth Month", 
           ylab = "Relative Frequency", 
           cex.axis = 1.1, 
           main = "")

Chapter 2: #34

  1. Box Plot.
  2. Symmetric - boxes A, D and E.
  3. Positive skew - B.
  4. Negative skew - C.
  5. Largest upper quartile - D.
  6. Smallest median - E.
  7. Most extreme observation - C.

Chapter 2: #35

foodData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02q35FoodReductionLifespan.csv")
head(foodData)
##      sex foodTreatment lifespan
## 1 female       reduced     16.5
## 2 female       reduced     18.9
## 3 female       reduced     22.6
## 4 female       reduced     27.8
## 5 female       reduced     30.2
## 6 female       reduced     30.7
str(foodData)
## 'data.frame':    34 obs. of  3 variables:
##  $ sex          : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ foodTreatment: Factor w/ 2 levels "control","reduced": 2 2 2 2 2 2 2 1 1 1 ...
##  $ lifespan     : num  16.5 18.9 22.6 27.8 30.2 30.7 35.9 23.7 24.5 24.7 ...
table(foodData$foodTreatment)
## 
## control reduced 
##      17      17
table(foodData$sex)
## 
## female   male 
##     15     19
par(bty = "l")
boxplot(lifespan ~ foodTreatment*sex, 
        data = foodData,
        col = "lightblue3",
        boxwex = 0.5, 
        whisklty = 1, 
        outcol = "black", 
        outcex = 1, 
        outlty = "blank", 
        las = 1, 
        xlab="Diet/Sex Groupings", 
        ylab = "Lifespan")

  1. Lifespan varied more greatly between the sexes than between the diet groups.

Chapter 2: #36

  1. Box plot.
  2. Swearing seems to be associated with higher latency, at least as per the quartiles.
  3. the whiskers show values that reach up to 1.5 times the interquartile range from the box. The whisker ranges are relatively similar between both.
  4. Other appropriate graphs - Strip chart, multiple histograms.