August, 2014

Graphical Methods

Swimming World Records

library(mosaicData)
data(SwimRecords)
View(SwimRecords)
help(SwimRecords)

Research Question: How have 100m records varied over the years?

Variable Analysis

Variables involved:

  • year explanatory
  • time response

Their types:

  • year numerical
  • time numerical

Are Old Methods Useful?

With numerical variables we have used:

  • densityplot()
  • bwplot()
  • histogram()
  • favstats()

Try this:

histogram(~time|year,data=SwimRecords)

Weirdness

The Problem

  • explanatory variable year is numerical, not a factor, and
  • it has LOTS of values!

We need a new graphical method to study the relationship between two numerical variables.

A Scatter Plot

How to Make a Scatter Plot

Use the xyplot() function:

xyplot(time~year,data=SwimRecords,
       main="100m Swim Records",
       xlab="year",
       ylab="time (seconds)",
       pch=19)

Basic Form:

\[xyplot(response \sim explanatory, data = \ldots)\]

Relationship Terminology

Positive Linear

Rising cloud: the taller the father, the taller the son!

Negative Linear

Falling cloud: bigger x's go with smaller y's!

Curvilinear

There's a relationship, but it's not linear!

A Bit Curvilinear

See the cloud "leveling off"?

No Relationship

As you move to the right, cloud neither rises nor falls.

Examples

Beetles and Stumps

data(stumps)
View(stumps)
help(stumps)

Research Question: Are there more larvae cluster in plots where there are more cottonwood stumps?

Practice

  • stumps numerical
  • larvae numerical

Which is explanatory?

Which is the response?

Make the scatter plot.

Describe the relationship between number of larvae and number of stumps.

Speed and Fuel Efficiency

data(fuel)
View(fuel)
help(fuel)

Research Question: How does the speed at which the Ford Escort is driven affect its fuel efficiency?

Practice

  • speed numerical
  • efficiency numerical

Which is explanatory? Which is the response?

Make the scatter plot.

Describe the relationship between number of speed and fuel efficiency.

Units for efficiency are "liters per 100 km". Does a high number for efficiency represent good or poor fuel efficiency?

Groups in Scatter Plots

Dealing With Groups: 1

Use a groups argument, with a key:

xyplot(time~year,data=SwimRecords,
       main="100m Swim Records",
       xlab="year",
       ylab="time (seconds)",
       pch=19,
       groups=sex,
       auto.key=TRUE)

The Result

Dealing with Groups: 2

data(TenMileRace)
View(TenMileRace)
help(TenMileRace)

Let's look at the relationship between age (explanatory) and net time (response), for both men and women.

First Pass

xyplot(net~age,data=TenMileRace,
       groups=sex,
       auto.key=TRUE)

Sexes are "Too Mixed Up"

Plot in Separate Panels

Make a scatter plot for each sex:

xyplot(net~age|sex,data=TenMileRace,
       main="Age and Race Time, by Sex",
       xlab="age (years)",
       ylab="10k time (seconds)",
       layout=c(2,1))

The Result

The guys appear to run faster. Hard to tell if age make a difference.

Verlander Over the Plate

help(verlander)
  • px horizontal distance of ball from center of plate (feet)
  • pz vertical distance of ball above plate (feet)

Research Question: How are these variables related?

First Try

xyplot(pz~px,data=verlander,
       main="Verlander over the Plate",
       pch=19)

Result

Over-plotting! (15,307 points sitting on top of each other.)

Second Try

xyplot(pz~px,data=verlander,
       main="Verlander over the Plate",
       pch=19,
       alpha=0.10)

With alpha set to 0.10, you need 10 points stacked on each other to get full color!

Result

Third Try

Let's group by type of pitch:

xyplot(pz~px,data=verlander,
       main="Verlander over the Plate",
       pch=19,
       alpha=0.10,
       groups=pitch_type,
       auto.key=list(space="right"))

The Result

Hard to tell!

Fourth Try

Let's plot in separate panels for each pitch type:

xyplot(pz~px|pitch_type,data=verlander,
       main="Verlander over the Plate",
       pch=19,
       alpha=0.10)

The Result

Practice

For which types of pitch is there a relationship between px and pz?

How would you describe the relationships you see (if any)?