KITADA

Lesson #11

Exploring Two Quantitative Variables – Scatterplots

Motivation:

For the first part of this course, we’ve talked about analyses with both categorical and quantitative response variables but only with categorical explanatory variables. What if both the response AND explanatory variable are quantitative? The answer is to perform a simple linear regression analysis. A simple linear regression analysis starts the same way as analyses we’ve discussed so far: exploring the data. Scatterplots are a very useful way to display two quantitative variables.

What you need to know from this lesson:

After completing this lesson, you should be able to

To accomplish the above “What You Need to Know”, do the following:

The Lesson

Example 1: Arby’s Sandwiches

The following table lists some nutritional information reported in 2013 about some sandwiches offered by the fast food chain Arby’s. Serving sizes, fat, and protein are in grams; sodium and chol (cholesterol) are in milligrams.

ARBYS<-read.csv("/Users/heatherhisako1/Desktop/Teaching/ST352_Summer16/ARBYS.csv",
                     header=TRUE)
ARBYS
##                        sandwich servingsize calories fat protein sodium
## 1            Roast Beef Classic         154      360  14      31    970
## 2                Roast Beef Mid         210      460  21      34   1420
## 3                Roast Beef Max         267      560  27      45   1870
## 4        Beef-n-Cheddar Classic         195      450  20      24   1310
## 5            Beef-n-Cheddar Mid         251      560  27      34   1760
## 6     French Dip & Swiss/Au jus         327      540  11      34   2560
## 7             Junior Roast Beef          87      210   8      13    530
## 8    Angus Three Cheese & Bacon         267      630  30      47   2220
## 9                  Angus Philly         289      590  27      38   2130
## 10            Grand Turkey Club         233      480  24      31   1610
## 11   Roast Turkey Ranch & Bacon         344      800  35      48   2250
## 12         Roast Turkey & Swiss         326      700  28      39   1760
## 13    Cravin\x92 Chicken Crispy         221      500  22      27   1110
## 14 Chicken Bacon & Swiss Crispy         205      600  29      36   1430
## 15      Jr Ham and Cheddar Melt         115      210   6      14    900
## 16        Jr Bacon Cheddar Melt         117      280  12      17    890
## 17                       Reuben         308      640  30      32   1610
## 18                       Arby-Q         182      400  11      19   1240
## 19                 Arby's Melt          146      340  13      19    930
## 20           Ham and Swiss Melt         131      300   9      19   1030
## 21             Super Roast Beef         295      570  27      40   1720
## 22          Chicken Cordon Bleu         241      620  31      38   1700
## 23                   Jr Chicken         110      310  15      13    680
##    cholest        meat
## 1       60  roast beef
## 2       95  roast beef
## 3      145  roast beef
## 4       60  roast beef
## 5       95  roast beef
## 6      100  roast beef
## 7       30  roast beef
## 8      120       steak
## 9       95       steak
## 10      70      turkey
## 11     100      turkey
## 12      75      turkey
## 13      50     chicken
## 14      80     chicken
## 15      25         ham
## 16      40       bacon
## 17      55 corned beef
## 18      40  roast beef
## 19      40  roast beef
## 20      35         ham
## 21     105  roast beef
## 22      90     chicken
## 23      25     chicken

Let’s consider just the variables serving size and calories.

1. What is the response variable and what is the explanatory variable? Is each variable categorical or quantitative?

Response : Calories (Quantitative)

Explanatory: Serving Size (Quantitative)

2. Construct a scatterplot of calories versus serving size.

a. On which axis is the response variable and on which is the explanatory variable?

y = Response

x = Explanatory

b. Make a proper scale on each axis starting at a value just less than the minimum value and ending at a value just greater than the maximum value. Label each axis (with units if the variable has units).

c. Each sandwich has a value for the explanatory variable and the response variable. We can think of these values as forming an “ordered pair” (x, y). For each sandwich, place a dot (or some other symbol) at its (x, y) coordinate on the scatterplot.

d. Below is the scatterplot of calories versus serving size.

plot(ARBYS$servingsize, ARBYS$calories, 
     xlab="Serving Size", ylab="Calories",
     main="Arby's Sandwiches", pch=16)

plot of chunk unnamed-chunk-3

3. Interpreting the scatterplot.

a. What are the two primary things to look for when looking at a scatterplot?

b. For the Arby’s sandwich, describe the pattern. Are there any “deviations” from the pattern?

The relationship is strong, positive, and linear.

There are some devitions from the linear pattern between 200 and 350 serving size.