LEARNING REINFORCEMENT ACTIVITY NO. 11-2: Simple Linear Regression

library(readr)

PROBLEM 1

Suppose data were collected from a sample of 10 branches of a pizza restaurant chain located near college campuses.

Problem 1a

Plot the scatter diagram.

pizza<-read.csv("pizza.csv")
head(pizza)
##    X   Y
## 1  2  58
## 2  6 105
## 3  8  88
## 4  8 118
## 5 12 117
## 6 16 137
plot(pizza$X,pizza$Y, main="Scatter Plot of Data from a Pizza Restaurant",xlab="Student Population in 1000s", ylab="Quarterly Sales in 1000 dollars")

Problem 1b

What does the scatter diagram indicate about the relationship between student population and quarterly sales?

The scatter plot shows a linear trend being exhibited by the points. This implies a linear relationship between the student population of the nearby campus and the quarterly sales of the restaurant.

Problem 1c

Develop the estimated linear regression equation that could be used to predict the quarterly sales from the student population.

pizzamodel<-lm(Y~X,data=pizza)
summary(pizzamodel)
## 
## Call:
## lm(formula = Y ~ X, data = pizza)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -21.00  -9.75  -3.00  11.25  18.00 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  60.0000     9.2260   6.503 0.000187 ***
## X             5.0000     0.5803   8.617 2.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.83 on 8 degrees of freedom
## Multiple R-squared:  0.9027, Adjusted R-squared:  0.8906 
## F-statistic: 74.25 on 1 and 8 DF,  p-value: 2.549e-05

The estimated linear regression equation is Y = 60+5X.

Problem 1d

Compute the Pearson correlation coefficient and the coefficient of determination and interpret these.

cor.test(pizza$X, pizza$Y, method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  pizza$X and pizza$Y
## t = 8.6167, df = 8, p-value = 2.549e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7976967 0.9884414
## sample estimates:
##      cor 
## 0.950123
pizzadetermination<-0.950123^2
pizzadetermination
## [1] 0.9027337

The Pearson correlation coefficient is 0.9501, and the coefficient of determination is 0.9027.

There is a very strong positive correlation between the student population of the nearby campus and the quarterly sales of the restaurant as indicated by the correlation coefficient value of 0.9501. Ideally, as the student population increases, the quarterly sales is expected to increase.

The coefficient of determination means that 90.27% of the total variation in the quarterly sales can be explained by the linear relationship between the student population and the quarterly sales.

PROBLEM 2

A study was made by a retail merchant to determine the relation between weekly advertising expenditures and sales.

Problem 2a

Plot and interpret the scatter diagram.

retail<-read.csv("retail.csv")
head(retail)
##    X   Y
## 1 40 385
## 2 20 400
## 3 25 395
## 4 20 365
## 5 30 475
## 6 50 440
plot(retail$X, retail$Y, main="Scatter Plot of Data from a Retail Merchant",xlab="Advertising Costs", ylab="Sales")

The scatter plot shows a linear trend being exhibited by the points. This implies a linear relationship between the weekly advertising costs and the weekly sales.

Problem 2b

Find the estimated linear regression equation to predict weekly sales from weekly advertising expenditures.

retailmodel<-lm(Y~X,data=retail)
summary(retailmodel)
## 
## Call:
## lm(formula = Y ~ X, data = retail)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -87.538 -32.700   8.566  39.118  55.774 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  343.706     44.766   7.678 1.68e-05 ***
## X              3.221      1.240   2.598   0.0266 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.23 on 10 degrees of freedom
## Multiple R-squared:  0.403,  Adjusted R-squared:  0.3433 
## F-statistic: 6.751 on 1 and 10 DF,  p-value: 0.02657

The estimated linear regression equation is Y = 343.706+3.221X.

Problem 2c

Compute the coefficient of correlation. Interpret.

cor.test(retail$X, retail$Y, method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  retail$X and retail$Y
## t = 2.5983, df = 10, p-value = 0.02657
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.09586113 0.88595517
## sample estimates:
##       cor 
## 0.6348373

The coefficient of correlation is 0.6348.There is a strong positive correlation between the weekly advertising costs and the weekly sales as indicated by the correlation coefficient value of 0.6348. Ideally, as the advertising cost increase, the sales is expected to increase.

Problem 2d

Compute the coefficient of determination. Interpret.

retaildetermination<-0.6348373^2
retaildetermination
## [1] 0.4030184

The coefficient of determination is 0.4030 which means that 40.30% of the total variation in the weekly sales can be explained by the linear relationship between the weekly advertising cost and the weekly sales.

Problem 2e

Estimate the weekly sales when advertising costs are $35.

343.706+(3.221*35)
## [1] 456.441

The estimated weekly sales is $456.44 when the advertising costs are 35 dollars.

PROBLEM 3

The paired data below consist of the costs of advertising (in thousands of dollars) and the number of products sold (in thousands).

Problem 3a

Plot and interpret the scatter diagram.

cost<-read.csv("cost.csv")
head(cost)
##   X  Y
## 1 9 85
## 2 2 52
## 3 3 55
## 4 4 68
## 5 2 67
## 6 5 86
plot(cost$X, cost$Y, main="Scatter Plot of Data",xlab="Cost of Advertising in thousands of dollars", ylab="Products Sold in thousands")

The scatter plot shows a linear trend being exhibited by the points. This implies a linear relationship between the advertising costs and the number of products sold.

Problem 3b

Find the estimated linear regression equation to predict number of products sold from advertising costs.

costmodel<-lm(Y~X,data=cost)
summary(costmodel)
## 
## Call:
## lm(formula = Y ~ X, data = cost)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.673  -9.207   1.587   4.495  16.269 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   55.788      7.187   7.762  0.00024 ***
## X              2.788      1.136   2.454  0.04954 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.04 on 6 degrees of freedom
## Multiple R-squared:  0.5009, Adjusted R-squared:  0.4177 
## F-statistic: 6.021 on 1 and 6 DF,  p-value: 0.04954

The estimated linear regression equation is Y = 55.788+2.788X.

Problem 3c

Compute the coefficient of correlation. Interpret.

cor.test(cost$X, cost$Y, method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  cost$X and cost$Y
## t = 2.4538, df = 6, p-value = 0.04954
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.0060812 0.9424054
## sample estimates:
##       cor 
## 0.7077214

The coefficient of correlation is 0.7077.There is a strong positive correlation between the cost of advertising and the number of products sold as indicated by the correlation coefficient value of 0.7077. As the advertising cost increase, the number of products sold is expected to increase.

Problem 3d

Compute the coefficient of determination. Interpret.

costdetermination<-0.7077214^2
costdetermination
## [1] 0.5008696

The coefficient of determination is 0.5009 which means that 50.09% of the total variation in the number of products sold can be explained by the linear relationship between the cost of advertising and the number of products sold.

Problem 3e

Estimate the number of products sold when advertising costs are $4500.

55.788+(2.788*4.5)
## [1] 68.334

The estimated number of products sold is 68.334 or 69 when the advertising costs are 4500 dollars.

PROBLEM 4

An article in Business Week listed the “Best Small Companies” with its sales and earnings. A random sample of 12 companies was selected and the sales and earnings, in millions of dollars, are reported below.

Problem 4a

Plot and interpret the scatter diagram.

small<-read.csv("small.csv")
head(small)
##      X   Y
## 1 89.2 4.9
## 2 18.6 4.4
## 3 18.2 1.3
## 4 71.7 8.0
## 5 58.6 6.6
## 6 46.8 4.1
plot(small$X, small$Y, main="Scatter Plot of Data from 12 Small Companies",xlab="Sales in million dollars", ylab="Earnings in million dollars")

The scatter plot shows a linear trend being exhibited by the points. This implies a linear relationship between the sales and the earnings.

Problem 4b

Find the estimated linear regression equation to predict earnings from sales.

smallmodel<-lm(Y~X,data=small)
summary(smallmodel)
## 
## Call:
## lm(formula = Y ~ X, data = small)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4066 -1.2755 -0.0695  1.1848  5.1649 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  1.85174    1.41257   1.311   0.2192  
## X            0.08357    0.02901   2.881   0.0164 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.518 on 10 degrees of freedom
## Multiple R-squared:  0.4536, Adjusted R-squared:  0.399 
## F-statistic: 8.302 on 1 and 10 DF,  p-value: 0.01635

The estimated linear regression equation is Y = 1.85174+0.08357X.

Problem 4c

Compute the coefficient of correlation. Interpret.

cor.test(small$X, small$Y, method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  small$X and small$Y
## t = 2.8813, df = 10, p-value = 0.01635
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1623492 0.8996616
## sample estimates:
##       cor 
## 0.6734993

The coefficient of correlation is 0.6735.There is a strong positive correlation between the sales and earnings of the companies as indicated by the correlation coefficient value of 0.6375. As the sales increase, the earnings is expected to increase.

Problem 4d

Compute the coefficient of determination. Interpret.

smalldetermination<-0.6734993^2
smalldetermination
## [1] 0.4536013

The coefficient of determination is 0.4536 which means that 45.36% of the total variation in the earnings can be explained by the linear relationship between the sales and earnings.

Problem 4e

For a small company with $50 million in sales, estimate the earnings.

1.85174+(0.08357*50)
## [1] 6.03024

The estimated earnings is 6.03 million dollars for a small company when the sales are 50 million dollars.