library(pander)
\(~\)
Problem 1: Suppose data were collected from a sample of 10 branches of a pizza restaurant chain located near college campuses, shown below.
Restaurant | Student Population (in 1000s) | Quarterly Sales (in $1000) |
---|---|---|
1 | 2 | 58 |
2 | 6 | 105 |
3 | 8 | 88 |
4 | 8 | 118 |
5 | 12 | 117 |
6 | 16 | 137 |
7 | 20 | 157 |
8 | 20 | 169 |
9 | 22 | 149 |
10 | 26 | 202 |
\(~\)
Create data in RStudio:
popn <- c(2, 6, 8, 8, 12, 16, 20, 20, 22, 26)
qsales <- c(58, 105, 88, 118, 117, 137, 157, 169, 149, 202)
a.) Plot the scatter diagram. (2 pts.)
plot(popn, qsales, main="Scatter Plot of Data", xlab = "Student Population",
ylab = "Quarterly Sales")
\(~\)
b.) What does the scatter diagram indicate about the relationship between student population and quarterly sales? (1 pt.)
The scatter diagram shows that there is an upward linear trend or relationship as exhibited by the scatter points. This implies a positive linear relationship between student population and quarterly sales.
\(~\)
c.) Develop the estimated linear regression equation that could be used to predict the quarterly sales from the student population. (4 pts.)
reg.model1 <- lm(qsales~popn)
pander(summary(reg.model1))
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 60 | 9.226 | 6.503 | 0.0001874 |
popn | 5 | 0.5803 | 8.617 | 2.549e-05 |
Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
---|---|---|---|
10 | 13.83 | 0.9027 | 0.8906 |
The estimated linear regression equation is given by: \(Quarterly Sales = 60 + 5*Student Population\).
\(~\)
d.) Compute the Pearson correlation coefficient and the coefficient of determination and interpret these. (4 pts.)
pander(cor.test(popn, qsales, method = "pearson"))
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
8.617 | 8 | 2.549e-05 * * * | two.sided | 0.9501 |
The Pearson correlation coefficient obtained is \(0.9501\) which indicates a very strong, positive linear relationship between Quarterly Sales and Student Population. The coefficient of determination of \(0.9027\), as obtained in the simple linear regression analysis results, indicates that \(90.27%\) of the variation in the dependent variable, Quarterly Sales, is explained by the linear relationship between Quarterly Sales and Student Population.
\(~\)
Problem 2: A study was made by a retail merchant to determine the relation between weekly advertising expenditure and sales. The following data were recorded:
Advertising Cost (in $) | Sales (in $) |
---|---|
40 | 385 |
20 | 400 |
25 | 395 |
20 | 365 |
30 | 475 |
50 | 440 |
40 | 490 |
20 | 420 |
50 | 560 |
40 | 525 |
25 | 480 |
50 | 510 |
\(~\)
Enter data manually in RStudio:
adcost <- c(40, 20, 25, 20, 30, 50, 40, 20, 50, 40, 25, 50)
wksales <- c(385, 400, 395, 365, 475, 440, 490, 420, 560, 525, 480, 510)
\(~\)
a.) Plot and interpret the scatter diagram. (3 pts.)
plot(adcost, wksales, main = "Scatter Plot of Data",
xlab = "Advertising Cost", ylab = "Weekly Sales")
\(~\)
The scatter plot shows some degree of upward linear trend as exhibited by the scatter points, which implies a positive, linear relationship between Weekly Sales and Advertising Cost.
\(~\)
b.) Find the estimated linear regression equation to predict weekly sales from weekly advertising expenditures. (4 pts.)
reg.model2 <- lm(wksales~adcost)
pander(summary(reg.model2))
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 343.7 | 44.77 | 7.678 | 1.685e-05 |
adcost | 3.221 | 1.24 | 2.598 | 0.02657 |
Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
---|---|---|---|
12 | 50.23 | 0.403 | 0.3433 |
The estimated linear regression equation is given by: \(Weekly Sales = 343.7 + 3.221*Advertising Cost\)
\(~\)
c.) Compute the coefficient of correlation. Interpret. (2 pts.)
pander(cor.test(adcost, wksales, method = "pearson"))
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
2.598 | 10 | 0.02657 * | two.sided | 0.6348 |
With the assumption that the data for both variables are approximately normally distributed, the Pearson correlation coefficient is \(0.6348\), indicating a strong, positive relationship between Advertising Cost and Weekly Sales.
\(~\)
d.) Compute the coefficient of determination. Interpret. (2 pts.)
The coefficient of determination, as already obtained from the simple linear regression analysis, is \(0.403\) indicating that \(40.3%\) of the variation in the dependent variable, Weekly Sales, is explained by the linear relationship between Weekly Sales and Advertising Cost.
\(~\)
e.) Estimate the weekly sales when advertising costs are $35. (2 pts.)
Advertising_Cost <- 35
Weekly_Sales <- 343.7+(3.221*Advertising_Cost)
Weekly_Sales
## [1] 456.435
\(~\)
When the advertising cost is \(\$35\), the estimated weekly sales is \(\$456.44\).
\(~\)
Problem 3: The paired data below consists of the costs of advertising (in thousands of dollars) and the number of products sold (in thousands).
Cost | 9 | 2 | 3 | 4 | 2 | 5 | 9 | 10 |
---|---|---|---|---|---|---|---|---|
Number | 85 | 52 | 55 | 68 | 67 | 86 | 83 | 73 |
\(~\)
Create the data manually in RStudio:
cost <- c(9, 2, 3, 4, 2, 5, 9, 10)
number <- c(85, 52, 55, 68, 67, 86, 83, 73)
\(~\)
a.) Plot and interpret the scatter diagram. (3 pts.)
plot(number, cost, main = "Scatter Plot of the Data",
xlab = "Cost of Advertising", ylab = "Number of Products Sold")
\(~\)
The scatter plot shows a great degree of dispersion among the scatter points, with an upward trend observed. This could imply a positive, linear relationship between the Number of Products Sold and the Cost of Advertising
\(~\)
b.) Find the estimated linear regression equation to predict number of products sold from advertising costs. (4 pts.)
reg.model3 <- lm(number~cost)
pander(summary(reg.model3))
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 55.79 | 7.187 | 7.762 | 0.0002405 |
cost | 2.788 | 1.136 | 2.454 | 0.04954 |
Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
---|---|---|---|
8 | 10.04 | 0.5009 | 0.4177 |
The estimated linear regression model is given by: \(Number = 55.79 + 2.788*Cost\).
\(~\)
c.) Compute the coefficient of correlation. Interpret. (2 pts.)
pander(cor.test(cost, number, method = "pearson"))
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
2.454 | 6 | 0.04954 * | two.sided | 0.7077 |
The obtained correlation coefficient of \(0.7077\) indicates a strong, positive linear relationship between the costs of advertising and the number of products sold.
\(~\)
d.) Compute the coefficient of determination. Interpret. (2 pts.)
The coefficient of determination, as obtained from the simple linear regression analysis, is \(0.5009\) and this indicates that \(50.09%\) of the variation in the dependent variable, Number of Products Sold, is explained by the linear relationship between the Number of Products Sold and the Costs of Advertising.
\(~\)
e.) Estimate the number of products sold when advertising costs are $4500. (2 pts.)
adcosts <- 4.5
numprod <- 55.79+(2.788*adcosts)
numprod
## [1] 68.336
\(~\)
When the advertising cost is \(\$4500\), the estimated number of products sold is \(68,336\).
\(~\)
Problem 4: An article in Business Week listed the “Best Small Companies” with its sales and earnings. A random sample of 12 companies was selected and the sales and earnings, in millions of dollars, are reported below.
Small Company | Sales (in million $) | Earnings (in million $) |
---|---|---|
1 | 89.2 | 4.9 |
2 | 18.6 | 4.4 |
3 | 18.2 | 1.3 |
4 | 71.7 | 8.0 |
5 | 58.6 | 6.6 |
6 | 46.8 | 4.1 |
7 | 17.5 | 2.6 |
8 | 11.9 | 1.7 |
9 | 19.6 | 3.5 |
10 | 51.2 | 8.2 |
11 | 28.6 | 6.0 |
12 | 69.2 | 12.8 |
\(~\)
Create data manually in RStudio:
sales1 <- c(89.2, 18.6, 18.2, 71.7, 58.6, 46.8, 17.5, 11.9, 19.6, 51.2, 28.6, 69.2)
earnings <- c(4.9, 4.4, 1.3, 8.0, 6.6, 4.1, 2.6, 1.7, 3.5, 8.2, 6.0, 12.8)
\(~\)
plot(sales1, earnings, main = "Scatter Plot of Data",
xlab = "Sales (in million $)", ylab = "Earnings (in million $)")
\(~\)
The scatter plot indicates an upward linear trend of the scatter points. This would imply a positive linear relationship between Sales and Earnings.
\(~\)
b.) Find the estimated linear regression equation to predict earnings from sales. (4 pts.)
reg.model4 <- lm(earnings~sales1)
pander(summary(reg.model4))
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 1.852 | 1.413 | 1.311 | 0.2192 |
sales1 | 0.08357 | 0.02901 | 2.881 | 0.01635 |
Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
---|---|---|---|
12 | 2.518 | 0.4536 | 0.399 |
The estimated linear regression equation is given by: \(Earnings = 1.852 + 0.08357*Sales\)
\(~\)
c.) Compute for the coefficient of correlation. Interpret (2 pts.)
pander(cor.test(sales1, earnings, method = "pearson"))
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
2.881 | 10 | 0.01635 * | two.sided | 0.6735 |
The obtained correlation coefficient of \(0.6735\) indicates a strong, positive linear relationship between Sales and Earnings.
\(~\)
d.) Compute the coefficient of determination. Interpret. (2 pts.)
The obtained coefficient of determination of \(0.4536\) from the simple linear regression analysis indicates that 45.36% of the variation in the dependent variable, Earning, is explained by the linear relationship between Earnings and Sales.
\(~\)
e.) For a small company with $50 million in sales, estimate the earnings. (2 pts.)
comsales <- 50
comearnings <- 1.852+(0.08357*comsales)
comearnings
## [1] 6.0305
\(~\)
For the given company with \(\$50\) million sales, the estimated earnings is \(\$6.0305\) million.