KITADA
Lab Activity #6
Objectives:
I. Example: Roses for Valentine’s Day!
On Valentine’s Day a lot of flowers are purchased, especially roses. In this example, we’re interested in determining if a relationship exists between the selling price of a dozen roses and the number of roses sold. To examine this issue, open the ROSES data set. The data in the data set are the number of roses sold and selling price of a dozen roses for 16 wholesalers:
Column Variable Description
1 number number of roses sold around Mother’s Day a few years ago
2 price the wholesale selling price of a dozen roses
ROSES
## number price
## 1 11484 21.25
## 2 9348 29.95
## 3 8429 32.25
## 4 10079 25.90
## 5 9240 31.00
## 6 8862 28.99
## 7 6216 33.99
## 8 8253 29.99
## 9 8038 31.35
## 10 7476 34.75
## 11 5911 35.50
## 12 7950 32.49
## 13 6134 35.59
## 14 5868 39.99
## 15 3160 48.95
## 16 5872 37.80
Use the R commands in Part III to obtain the following output:
Use the output to discuss the following:
1. Using the scatterplot (without the fitted regression line), discuss the relationship between number of roses sold and selling price (direction, linear or curved, strength, outliers or other deviations from the pattern).
### MAKE A SCATTERPLOT
with(ROSES, plot(number, price,
main="Number of Roses vs Wholesale Price",
xlab="Number of Roses Sold Around Mother's Day",
ylab="Price of a Dozen Roses ($)",
pch=16))
From the plot it appears that:
2. When is it appropriate to use the correlation coefficient? Are the conditions satisfied in this problem to use the correlation coefficient? If so, give and interpret the correlation coefficient.
Its appropriate to use the correlation coefficient when:
Yes, these are satisfied in this problem.
### CORRELATION COEFFICIENT
with(ROSES, cor(number, price))
## [1] -0.9568285
3. Using the scatterplot with the fitted regression line, does the least-squares regression line “fit” the data well? What do you look for when answering this question?
### FITTED REGRESSION LINE
mod<-with(ROSES, lm(price~number))
with(ROSES, plot(number, price,
main="Number of Roses vs Wholesale Price",
xlab="Number of Roses Sold Around Mother's Day",
ylab="Price of a Dozen Roses ($)",
pch=16))
abline(coefficients(mod),
lwd=2, lty=2,
col="red")
4. If appropriate, interpret the simple linear regression analysis output:
### MODEL SUMMARY
summary(mod)
##
## Call:
## lm(formula = price ~ number)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2577 -0.8941 -0.3015 1.4926 2.8510
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.2515197 1.8568565 29.75 4.67e-14 ***
## number -0.0028964 0.0002351 -12.32 6.68e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.86 on 14 degrees of freedom
## Multiple R-squared: 0.9155, Adjusted R-squared: 0.9095
## F-statistic: 151.7 on 1 and 14 DF, p-value: 6.684e-09
\( \hat{y}=\beta_0+\beta_1x \)
Equation: \( \hat{y}=55.25-0.003x \)
\( x \): Number of roses sold around Mother's Day
Interpret the slope and y-intercept in the context of the problem. (Does the interpretation of the y-intercept have any meaning in this problem?)
Slope: For every additional flower sold the wholesale price for a dozen roses sold decreases by $0.003.
Y-intercept: The wholesale price of zero roses is $55.25.
NOTE: This has no real world meaning
### NUMBER OF ROSES FOR $35 PRICE PER DOZEN
## NOTE: THERE MAY BE SMALL ROUNDING ERRORS
(35-55.2515197)/(-0.0028964)
## [1] 6991.962
$60
It is not appropriate to calcuate the number of roses for $60 since is value is outside of the slope of our data. Therefore, this would be extrapolation.
Explain how to use both the least-squares regression equation AND the scatterplot to do the prediction.
We use algebra to find this value and match the expected value to its respective location on the fitted line.
\( \frac{\hat{y}-\beta_0}{\beta_1}=x \)
Understand how to use R to obtain these predicted (or “fitted”) values.
### TO PREDICT FITTED X VALUES FROM Y
new_y<-data.frame(c(35, 60))
(new_y-as.numeric(coefficients(mod)[1]))/as.numeric(coefficients(mod)[2])
## c.35..60.
## 1 6992.028
## 2 -1639.458
### TO PREDICT FITTED Y VALUES FROM X
predict(mod)
## 1 2 3 4 5 6 7 8
## 21.98957 28.17623 30.83799 26.05898 28.48904 29.58386 37.24767 31.34776
## 9 10 11 12 13 14 15 16
## 31.97048 33.59824 38.13106 32.22536 37.48517 38.25560 46.09898 38.24402
cbind(Number=ROSES$number[order(ROSES$number)],
Fitted=predict(mod)[order(ROSES$number)])
## Number Fitted
## 15 3160 46.09898
## 14 5868 38.25560
## 16 5872 38.24402
## 11 5911 38.13106
## 13 6134 37.48517
## 7 6216 37.24767
## 10 7476 33.59824
## 12 7950 32.22536
## 9 8038 31.97048
## 8 8253 31.34776
## 3 8429 30.83799
## 6 8862 29.58386
## 5 9240 28.48904
## 2 9348 28.17623
## 4 10079 26.05898
## 1 11484 21.98957
### WHAT PERCENT OF VARIATION IS EXPLAINED BY THE MODEL
summary(mod)$r.squared
## [1] 0.9155208
### WHAT PERCENT IS NOT EXPLAINED
1-summary(mod)$r.squared
## [1] 0.08447919
What other variables can you think of might explain some of this “unexplained” variation? (That is, what other variables might help explain the number of roses sold besides the selling price?)
5. What is a residual? By hand, calculate the residual for the first observations in the data set.
### RESIDUAL FOR FIRST OBSERVATION
## X-VALUE
ROSES$number[1]
## [1] 11484
## OBSERVED Y
obs<-ROSES$price[1]
obs
## [1] 21.25
## EXPECTED
exp<-fitted(mod)[1]
exp
## 1
## 21.98957
obs-exp
## 1
## -0.739575
### WE CAN ALSO FIND THE RESIDUAL IN R
resid(mod)[1]
## 1
## -0.739575
6. Discuss how a residual plot is constructed.
### RESIDUAL PLOT
plot(ROSES$number, ROSES$price-fitted(mod),
main="Residual Plot",
xlab="Number of Roses Sold Around Mother's Day",
ylab="Residuals",
pch=16)
abline(h=0, lwd=2,lty=2, col="blue")
## OR YOU CAN USE THE RESID FUNCTION
## plot(ROSES$number, resid(mod))
7. How can a residual plot be used to check for a linear relationship between the response and explanatory variables?
We can check for the linearity assumption by checking to see if there is a pattern in the residuals.
8. Using the residual plot, does the relationship between number of roses sold and selling price appear to be linear? Explain.
Yes, there dont appear to be any patterns