KITADA
Lesson #11
Motivation:
For the first part of this course, we’ve talked about analyses with both categorical and quantitative response variables but only with categorical explanatory variables. What if both the response AND explanatory variable are quantitative? The answer is to perform a simple linear regression analysis. A simple linear regression analysis starts the same way as analyses we’ve discussed so far: exploring the data. Scatterplots are a very useful way to display two quantitative variables.
What you need to know from this lesson:
After completing this lesson, you should be able to
To accomplish the above “What You Need to Know”, do the following:
The Lesson
Example 1: Arby’s Sandwiches
The following table lists some nutritional information reported in 2013 about some sandwiches offered by the fast food chain Arby’s. Serving sizes, fat, and protein are in grams; sodium and chol (cholesterol) are in milligrams.
ARBYS<-read.csv("/Users/heatherhisako1/Desktop/Teaching/ST352_Summer16/ARBYS.csv",
header=TRUE)
ARBYS
## sandwich servingsize calories fat protein sodium
## 1 Roast Beef Classic 154 360 14 31 970
## 2 Roast Beef Mid 210 460 21 34 1420
## 3 Roast Beef Max 267 560 27 45 1870
## 4 Beef-n-Cheddar Classic 195 450 20 24 1310
## 5 Beef-n-Cheddar Mid 251 560 27 34 1760
## 6 French Dip & Swiss/Au jus 327 540 11 34 2560
## 7 Junior Roast Beef 87 210 8 13 530
## 8 Angus Three Cheese & Bacon 267 630 30 47 2220
## 9 Angus Philly 289 590 27 38 2130
## 10 Grand Turkey Club 233 480 24 31 1610
## 11 Roast Turkey Ranch & Bacon 344 800 35 48 2250
## 12 Roast Turkey & Swiss 326 700 28 39 1760
## 13 Cravin\x92 Chicken Crispy 221 500 22 27 1110
## 14 Chicken Bacon & Swiss Crispy 205 600 29 36 1430
## 15 Jr Ham and Cheddar Melt 115 210 6 14 900
## 16 Jr Bacon Cheddar Melt 117 280 12 17 890
## 17 Reuben 308 640 30 32 1610
## 18 Arby-Q 182 400 11 19 1240
## 19 Arby's Melt 146 340 13 19 930
## 20 Ham and Swiss Melt 131 300 9 19 1030
## 21 Super Roast Beef 295 570 27 40 1720
## 22 Chicken Cordon Bleu 241 620 31 38 1700
## 23 Jr Chicken 110 310 15 13 680
## cholest meat
## 1 60 roast beef
## 2 95 roast beef
## 3 145 roast beef
## 4 60 roast beef
## 5 95 roast beef
## 6 100 roast beef
## 7 30 roast beef
## 8 120 steak
## 9 95 steak
## 10 70 turkey
## 11 100 turkey
## 12 75 turkey
## 13 50 chicken
## 14 80 chicken
## 15 25 ham
## 16 40 bacon
## 17 55 corned beef
## 18 40 roast beef
## 19 40 roast beef
## 20 35 ham
## 21 105 roast beef
## 22 90 chicken
## 23 25 chicken
Let’s consider just the variables serving size and calories.
1. What is the response variable and what is the explanatory variable? Is each variable categorical or quantitative?
Response : Calories (Quantitative)
Explanatory: Serving Size (Quantitative)
2. Construct a scatterplot of calories versus serving size.
a. On which axis is the response variable and on which is the explanatory variable?
y = Response
x = Explanatory
b. Make a proper scale on each axis starting at a value just less than the minimum value and ending at a value just greater than the maximum value. Label each axis (with units if the variable has units).
c. Each sandwich has a value for the explanatory variable and the response variable. We can think of these values as forming an “ordered pair” (x, y). For each sandwich, place a dot (or some other symbol) at its (x, y) coordinate on the scatterplot.
d. Below is the scatterplot of calories versus serving size.
plot(ARBYS$servingsize, ARBYS$calories,
xlab="Serving Size", ylab="Calories",
main="Arby's Sandwiches", pch=16)
3. Interpreting the scatterplot.
a. What are the two primary things to look for when looking at a scatterplot?
1) Direction
2) Strength (of linearity)
b. For the Arby’s sandwich, describe the pattern. Are there any “deviations” from the pattern?
The relationship is strong, positive, and linear.
There are some devitions from the linear pattern between 200 and 350 serving size.