Handout 1: RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Excerpted (with permission) from Chapter 5 of Investigating Statistical Concepts, Applications and Methods, 3rd Edition, Beth L. Chance and Allan J. Rossman, 2106.

In these activities you will analyze data sets with two quantitative variables. The goal will be to describe the relationship between the variables. As always, you will start by learning some useful numerical and graphical techniques for summarizing the data. Then you will explore how to use a mathematical model of the relationship to make predictions of one variable from the other. In the next section you will then move on to inferential techniques based on simulated sampling and randomization distributions as well as a mathematical model.

Investigation 1.1: Cat Jumping

Evolutionary biologists are often interested in “form-function relationships” to help explain evolution history of say an animal species. Harris and Steudel (2002) investigated factors that are related to the jumping ability of domestic cats. Because jump ability and height are largely dependent on takeoff velocity, several traits were recorded for 18 healthy adult cats such as relative limb length, relative extensor muscle mass, body mass, fat mass relative to lean body mass, and the percentage of fast-twitch muscle fibers to see which might best explain maximum takeoff velocity (based on high-speed videos). In this investigation, you will examine the following data, also available in the file CatJumping.txt: [link] (http://www.rossmanchance.com/iscam2/data/CatJumping.txt)
(a) Identify the observational units and the primary response variable of interest here. Also classify this variable as quantitative or categorical.

Observational units:

Response variable:

Type:

  1. Open the CatJumping.txt data file in RStuido and produce numerical and graphical summaries of the takeoff velocity variable. Describe the distribution of takeoff velocities in this sample (shape, center, variability, unusual observations). To do this, copy and paste the following commands into the RStudio console window.
    Note - first we will load some packages that contain commands we will use. You may have to load twice.
install.packages("mosaic")
install.packages("mosaicData")
install.packages("ggplot2")
install.packages("lattice")
install.packages("car")
install.packages("manipulate")
load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
require(mosaic)
trellis.par.set(theme=theme.mosaic()) # change default color scheme for lattice

Now copy the following R code and paste it into your Console window. DO NOT hit ENTER yet.

catjumps = read.table("clipboard", header=TRUE)

Now go back to the CatJumpting text file and copy the entire page. That is, CTRL-A, followed by CTRL-C. Then return to the R Console window and Execute the code.

You can check that the data has been put in R in several ways, but the easiest is

View(catjumps)

We will “attach” this data so the R knows what which data we wish to work with.

attach(catjumps)

Now let’s ask R for summary statistics and to make a dotplot of the take-off velocities.

summary(velocity)
dotPlot(velocity, cex=0.4,data = catjumps, xlab="velocity", panel=panel.dotPlot)
  1. Based on your analysis in (b), if you were going to randomly select a domestic cat, what is your best prediction of its takeoff velocity?

  2. Do you think there will be a relationship between a cat’s takeoff velocity and its body mass? If so, do you think heavier cats will tend to have larger or smaller takeoff velocities than lighter cats?

We will need a new graphical summary to visually explore the relationship between two quantitative variables, the scatterplot. The commands below makes a default, scatterplot, but they are not particularly pretty.

plot(bodymass,velocity)
plot(velocity~bodymass)

Are there any differences in the plots created by the previous two commands?

Now try this one:

plotPoints(bodymass~velocity,data = catjumps)
  1. Describe the relationship between a cat’s takeoff velocity and its body mass, as displayed in this scatterplot. Does this pattern confirm your expectation in (d)?

  2. Do any of these cats appear to be outliers in the sense that its pair of values (body mass, takeoff velocity) does not fit the pattern of the majority of cats? If so, identify the ID for that cat and describe what’s different about this cat (in context).

Terminology Detour

Scatterplots are useful for displaying the relationship between two quantitative variables. If one variable has been defined as the response variable and the other as the explanatory variable, we will put the response variable on the vertical axis and the explanatory variable along the horizontal axis.

In describing scatterplots, you will describe the overall pattern between the two variables focusing primarily on three things:

  • Direction: Is there a positive association (small values of \(y\) tend to occur with small values of \(x\) and large values of \(y\) tend to occur with large values of \(x\)) or a negative association (small values of y tend to occur at large values of \(x\) and vice versa)?

  • Linearity: Is the overall pattern in the scatterplot linear or not?

  • Strength: How closely are the observations following the observed pattern?

The above scatterplot reveals a fairly strong, negative association between body mass and takeoff velocity, meaning that heavier cats tend to have a smaller takeoff velocity than larger cats. The relationship is somewhat linear but has a bit of a curved pattern. There is one outlier cat (cat C) with a very high takeoff velocity despite having a very large body mass.

  1. Now produce a scatterplot of takeoff velocity vs. percentage of body fat. Describe the association. Would you say that the association with velocity is stronger than with body mass? More or less linear?

  2. For the other two variables (hind limb length and muscle mass), would you expect to see a positive or negative association with takeoff velocity? Explain. Then look at scatterplots, and comment on whether the association is as you expected.

  3. Now produce a coded scatterplot of takeoff velocity vs. body mass that uses different symbols for male and female cats. Based on this graph, do you notice any differences between male and female cats with regard to these variables? Explain.

plotPoints( velocity ~ bodymass, data=catjumps, groups=Sex, pch=20)

Study Conclusions

These researchers reported that variation in cat maximum takeoff velocity was significantly explained by both hind limb length (cats with longer limbs tended to have higher takeoff velocities) and fat mass relative to lean body mass (cats with lower fat mass tended to have higher takeoff velocities), but not to extensor muscle mass relative to lean mass or fast-twitch fiber content. They explained the “pervasive effect” of body mass by the increase in muscle work invested in increasing the center of mass potential energy as compared with kinetic energy during takeoff.

Later you will learn how they determined the statistical significance of these relationships. First, we will examine a numerical measure of the strength of the association between two variables.

Investigation 1.2: Drive for Show, Putt for Dough

Some have cited “Drive for show, putt for dough” as the oldest clich

