library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(skimr)
library(radiant)
## Loading required package: radiant.data
## Loading required package: magrittr
## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## Loading required package: tidyr
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract
## 
## Attaching package: 'radiant.data'
## The following objects are masked from 'package:lubridate':
## 
##     month, wday
## The following object is masked from 'package:skimr':
## 
##     n_missing
## The following object is masked from 'package:ggplot2':
## 
##     diamonds
## The following object is masked from 'package:base':
## 
##     date
## Loading required package: radiant.design
## Loading required package: mvtnorm
## Loading required package: radiant.basics
## Loading required package: radiant.model
## Loading required package: radiant.multivariate
library(esquisse)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
library(gvlma)

Marks and Spencer

The Marks and Spencer food division has a problem and you have previously recommended a course of action for solving the problem. Their original problem was to compare a Tikka Masala and Vindaloo because of the supplier disruptions. Both products were produced by the same vendor and the vendor could no longer fully supply Marks and Spencer’s required quantities for both products. The vendor can continue to supply one of the two products but a new vendor will be needed if both products were continued. The company has decided to leave the Tikka Masala product with the existing vendor and to seek out a new vendor for the Vindaloo product. Your task is to help them choose a vendor among CRU Curries R Us and TOI Taste of India; two UK based vendors that Marks and Spencer has worked with in the past. Variables with names starting with Stacked indicate stacked/tidy versions.

The prior recipe has a trademarked sauce that is an ingredient in the current product. The product must be changed.

Ingredients

  1. The two vendors have supplied random samples of costs for their own sauce replacements Sauce.Cost and some (self-reported) customer purchase intentions (Buy/NotBuy) to help us choose. Customer opinions matter. These data are supplied in Ingredients in both stacked and unstacked form.
names(Ingredients)
## [1] "TOISauceCost"     "CRUSauceCost"     "StackedSauceCost" "StackedVendor"   
## [5] "TOIPurchase"      "CRUPurchase"      "StackedPurchase"
skim(Ingredients)
Data summary
Name Ingredients
Number of rows 145
Number of columns 7
_______________________
Column type frequency:
character 4
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
StackedVendor 0 1.00 3 3 0 2 0
TOIPurchase 81 0.44 3 6 0 2 0
CRUPurchase 64 0.56 3 6 0 2 0
StackedPurchase 0 1.00 3 6 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
TOISauceCost 81 0.44 2.46 0.25 1.95 2.29 2.42 2.63 3.00 ▂▇▇▅▃
CRUSauceCost 64 0.56 2.59 0.13 2.23 2.50 2.59 2.68 2.87 ▁▃▇▇▂
StackedSauceCost 0 1.00 2.54 0.20 1.95 2.41 2.55 2.68 3.00 ▁▃▇▇▂

Ratings

  1. Our marketing team has called upon focus groups to compare the products. These customer ratings Ratings [from 0 to 100 with 0 as worst and 100 as best] are provided for 50 consumers that evaluated both products.
names(Ratings)
## [1] "Rater"          "CRU.Ratings"    "TOI.Ratings"    "Stacked.Rating"
## [5] "Stacked.Vendor" "Stacked.Rater"
skim(Ratings)
Data summary
Name Ratings
Number of rows 100
Number of columns 6
_______________________
Column type frequency:
character 1
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Stacked.Vendor 0 1 3 3 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Rater 50 0.5 25.50 14.58 1 13.25 25.5 37.75 50 ▇▇▇▇▇
CRU.Ratings 50 0.5 76.04 3.57 68 73.00 76.0 78.75 87 ▁▇▇▃▁
TOI.Ratings 50 0.5 74.70 4.60 65 72.00 75.0 77.00 85 ▃▅▇▂▂
Stacked.Rating 0 1.0 75.37 4.15 65 73.00 75.0 78.00 87 ▂▅▇▃▁
Stacked.Rater 0 1.0 25.50 14.50 1 13.00 25.5 38.00 50 ▇▇▇▇▇

Financials

  1. Bottom line financials are most important. The product was not outstanding with the existing supplier. We expect the performance to be no better with a new vendor and a new recipe, but it is important that we pick the best vendor. These data are supplied in Financials. The data shows Costs and Boxes for each vendor over the last 52 weeks: CRU.Costs and CRU.Boxes provide the observations for Curries R Us. TOI.Costs and TOI.Boxes provide the observations for Taste of India.
names(Financials)
## [1] "Week"      "CRU.Boxes" "CRU.Costs" "TOI.Boxes" "TOI.Costs"
skim(Financials)
Data summary
Name Financials
Number of rows 52
Number of columns 5
_______________________
Column type frequency:
numeric 5
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Week 0 1 26.50 15.15 1.00 13.75 26.50 39.25 52.00 ▇▇▇▇▇
CRU.Boxes 0 1 23819.38 2685.74 17402.00 21854.75 24072.50 25475.75 29623.00 ▂▇▇▇▂
CRU.Costs 0 1 73828.91 7660.47 53113.02 68879.12 73832.18 78487.08 93313.29 ▁▃▇▆▁
TOI.Boxes 0 1 25543.46 4847.48 16364.00 21477.25 25063.00 28832.00 34973.00 ▃▇▇▆▅
TOI.Costs 0 1 64858.44 11017.27 40945.97 56265.18 65721.95 73525.62 89363.09 ▂▇▆▆▂

Instructions

Simple computer code and/or output is never an adequate response; responses should be provided and discussed in their appropriate metrics. Answer the questions clearly interpreting the evidence. You must do your own work; an honor pledge is provided and required for submission.

Ratings

  1. Provide some graphic that adequately captures and compares the core elements of the ratings data. What does it show?
ggplot(Ratings) +
 aes(x = Stacked.Rating, fill = Stacked.Vendor) +
 geom_histogram(bins = 30L) +
 scale_fill_viridis_d(option = "viridis") +
 theme_minimal()

ANSWER: Shows the rating breakdown of each vendor. CRU = consistent feedback from consumers. TOI = unstable variable range and reveals poor ratings.

  1. Provide summary statistics for the Ratings for each producer. Compare the means and standard deviations. What is the interquartile range for each?
summary(Ratings)
##      Rater        CRU.Ratings     TOI.Ratings   Stacked.Rating 
##  Min.   : 1.00   Min.   :68.00   Min.   :65.0   Min.   :65.00  
##  1st Qu.:13.25   1st Qu.:73.00   1st Qu.:72.0   1st Qu.:73.00  
##  Median :25.50   Median :76.00   Median :75.0   Median :75.00  
##  Mean   :25.50   Mean   :76.04   Mean   :74.7   Mean   :75.37  
##  3rd Qu.:37.75   3rd Qu.:78.75   3rd Qu.:77.0   3rd Qu.:78.00  
##  Max.   :50.00   Max.   :87.00   Max.   :85.0   Max.   :87.00  
##  NA's   :50      NA's   :50      NA's   :50                    
##  Stacked.Vendor     Stacked.Rater 
##  Length:100         Min.   : 1.0  
##  Class :character   1st Qu.:13.0  
##  Mode  :character   Median :25.5  
##                     Mean   :25.5  
##                     3rd Qu.:38.0  
##                     Max.   :50.0  
## 

ANSWER: CRU: average rating = 76.4 / standard deviation = 3.57. TOI: average rating = 74.7 / standard deviation = 4.60.

CRU has a better rating than TOI. TOI’s standard deviation is higher/ratings have more range.

CRU interquartile range: 78.75-73.00 = 5.75 TOI interquartile range: 77.0-72.0 = 5.00

  1. What is the 95% confidence interval for average ratings for each producer?
t.test(Ratings$CRU.Ratings)
## 
##  One Sample t-test
## 
## data:  Ratings$CRU.Ratings
## t = 150.44, df = 49, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  75.02426 77.05574
## sample estimates:
## mean of x 
##     76.04
t.test(Ratings$TOI.Ratings)
## 
##  One Sample t-test
## 
## data:  Ratings$TOI.Ratings
## t = 114.85, df = 49, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  73.39291 76.00709
## sample estimates:
## mean of x 
##      74.7

ANSWER: CRU: Average rating = 2.5 97.5 75.024 77.056 TOI: Average rating = 2.5 97.5 73.393 76.007

  1. Is either product better with 95% confidence? Which product is better? ANSWER: CRU is the better rated product with 95% confidence.

Ingredients

Sauce Cost

  1. Provide summary statistics and some combined graphic for the SauceCost for each vendor. Interpret it to compare the two vendors.
Ingredients%>%select(CRUSauceCost, TOISauceCost) %>% skim()
Data summary
Name Piped data
Number of rows 145
Number of columns 2
_______________________
Column type frequency:
numeric 2
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
CRUSauceCost 64 0.56 2.59 0.13 2.23 2.50 2.59 2.68 2.87 ▁▃▇▇▂
TOISauceCost 81 0.44 2.46 0.25 1.95 2.29 2.42 2.63 3.00 ▂▇▇▅▃
ggplot(Ingredients) +
 aes(x = StackedSauceCost, fill = StackedVendor) +
 geom_density(adjust = 1L) +
 scale_fill_viridis_d(option = "viridis") +
 theme_minimal()

ANSWER: CRU: standard deviation = 0.128 TOI: standard deviation = 0.249

CRU costs more than TOI. Yet CRU has more consistent numbers.

  1. Assume that costs are normal with mean and standard deviation exactly as you have calculated above.
    1. What is the probability of costs below $2.35 for Curries R Us?
result <- prob_norm(mean = 2.59, stdev = 0.128, ub = 2.35)
summary(result)
## Probability calculator
## Distribution: Normal
## Mean        : 2.59 
## St. dev     : 0.128 
## Lower bound : -Inf 
## Upper bound : 2.35 
## 
## P(X < 2.35) = 0.03
## P(X > 2.35) = 0.97
plot(result)

result <- single_mean(
  Ingredients, 
  var = "CRUSauceCost", 
  comp_value = 2.35, 
  alternative = "less"
)
summary(result)
## Single mean test
## Data      : Ingredients 
## Variable  : CRUSauceCost 
## Confidence: 0.95 
## Null hyp. : the mean of CRUSauceCost = 2.35 
## Alt. hyp. : the mean of CRUSauceCost is < 2.35 
## 
##   mean   n n_missing    sd    se    me
##  2.594 145        64 0.128 0.014 0.028
## 
##   diff    se t.value p.value df   0%   95%  
##  0.244 0.014  17.157       1 80 -Inf 2.618  
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result, plots = "hist", custom = FALSE)
## Warning: Removed 64 rows containing non-finite values (stat_bin).

ANSWER:Probability: P(X>2.35) = 97%

b. What is the probability of costs below \$2.35 for Taste of India?
result <- prob_norm(mean = 2.46, stdev = 0.249, ub = 2.35)
summary(result)
## Probability calculator
## Distribution: Normal
## Mean        : 2.46 
## St. dev     : 0.249 
## Lower bound : -Inf 
## Upper bound : 2.35 
## 
## P(X < 2.35) = 0.329
## P(X > 2.35) = 0.671
plot(result)

result <- single_mean(
  Ingredients, 
  var = "TOISauceCost", 
  comp_value = 2.35, 
  alternative = "less"
)
summary(result)
## Single mean test
## Data      : Ingredients 
## Variable  : TOISauceCost 
## Confidence: 0.95 
## Null hyp. : the mean of TOISauceCost = 2.35 
## Alt. hyp. : the mean of TOISauceCost is < 2.35 
## 
##   mean   n n_missing    sd    se    me
##  2.465 145        81 0.249 0.031 0.062
## 
##   diff    se t.value p.value df   0%   95%  
##  0.115 0.031   3.688       1 63 -Inf 2.517  
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result, plots = "hist", custom = FALSE)
## Warning: Removed 81 rows containing non-finite values (stat_bin).

ANSWER: 67%

  1. What is a 95% confidence interval for the average SauceCost for each vendor?
t.test(Ingredients$TOISauceCost)
## 
##  One Sample t-test
## 
## data:  Ingredients$TOISauceCost
## t = 79.154, df = 63, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  2.402616 2.527072
## sample estimates:
## mean of x 
##  2.464844
t.test(Ingredients$CRUSauceCost)
## 
##  One Sample t-test
## 
## data:  Ingredients$CRUSauceCost
## t = 182.19, df = 80, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  2.565983 2.622659
## sample estimates:
## mean of x 
##  2.594321

ANSWER: CRU average sauce cost: 2.565983 - 2.622659 TOI average sauce cost: 2.402616 - 2.527072

  1. What is the 95% confidence interval for the difference in average costs?
t.test(StackedSauceCost~StackedVendor, data=Ingredients)
## 
##  Welch Two Sample t-test
## 
## data:  StackedSauceCost by StackedVendor
## t = 3.7813, df = 89.037, p-value = 0.0002819
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.06144086 0.19751362
## sample estimates:
## mean in group CRU mean in group TOI 
##          2.594321          2.464844

ANSWER: 95 percent confidence interval: 0.06144086 - 0.19751362

  1. Is either vendor cheaper with 95% confidence? ANSWER: TOI is the cheaper vendor. The average cost of TOI is 2.464 compaired to the average cost of CRU 2.594

Purchase Intentions

  1. What proportion would buy the products of each vendor?
binom.test(table(Ingredients$TOIPurchase))
## 
##  Exact binomial test
## 
## data:  table(Ingredients$TOIPurchase)
## number of successes = 43, number of trials = 64, p-value = 0.008147
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.5431232 0.7841280
## sample estimates:
## probability of success 
##               0.671875
binom.test(table(Ingredients$CRUPurchase))
## 
##  Exact binomial test
## 
## data:  table(Ingredients$CRUPurchase)
## number of successes = 43, number of trials = 81, p-value = 0.657
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4166718 0.6427332
## sample estimates:
## probability of success 
##              0.5308642

ANSWER: 0.671875 - 0.5308642. The proportion of people who would buy the products of each vendor is the probability of success.

  1. The surveys were, in each case, sent to 85 individuals. With 95% confidence, what is the [two-sided] probability of responding to the survey for each product?
prop.test(x=c(64,81), n=c(85,85))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(64, 81) out of c(85, 85)
## X-squared = 12.006, df = 1, p-value = 0.0005304
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.31390985 -0.08609015
## sample estimates:
##    prop 1    prop 2 
## 0.7529412 0.9529412

ANSWER:
TOI (Probability of response):64.5% and 83.4%. = 75% CRU (Probability of response):87.7% and 98.5%. = 95%.

  1. What is a 95% confidence interval for the probability of Buy for each product?
binom.test(table(Ingredients$TOIPurchase))
## 
##  Exact binomial test
## 
## data:  table(Ingredients$TOIPurchase)
## number of successes = 43, number of trials = 64, p-value = 0.008147
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.5431232 0.7841280
## sample estimates:
## probability of success 
##               0.671875
binom.test(table(Ingredients$CRUPurchase))
## 
##  Exact binomial test
## 
## data:  table(Ingredients$CRUPurchase)
## number of successes = 43, number of trials = 81, p-value = 0.657
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4166718 0.6427332
## sample estimates:
## probability of success 
##              0.5308642

ANSWER: TOI (95% confidence interval): 0.4166718-0.6427332 –> 41%-64% would buy from TOI CRU (95% confidence interval): 0.5431232-.07841280 –> 54%-78% would buy from CRU

  1. What is a 95% confidence interval for the difference in the probability of Buy? Does it favor either product with 95% confidence? If so, which one?
prop.test(x=c(43,43), n=c(64,81))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(43, 43) out of c(64, 81)
## X-squared = 2.3904, df = 1, p-value = 0.1221
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.03122668  0.31324829
## sample estimates:
##    prop 1    prop 2 
## 0.6718750 0.5308642

ANSWER: Confidence interval: -0.3122668 - 0.31324829 TOI: prop 1 (0.6718750) CRU: prop 2 (0.5308642)

Financials

  1. Summarize the data for each vendor. Include a 95% confidence interval for boxes and costs for each vendor and describe these intervals in their relevant metrics.
summary(Financials)
##       Week         CRU.Boxes       CRU.Costs       TOI.Boxes    
##  Min.   : 1.00   Min.   :17402   Min.   :53113   Min.   :16364  
##  1st Qu.:13.75   1st Qu.:21855   1st Qu.:68879   1st Qu.:21477  
##  Median :26.50   Median :24072   Median :73832   Median :25063  
##  Mean   :26.50   Mean   :23819   Mean   :73829   Mean   :25543  
##  3rd Qu.:39.25   3rd Qu.:25476   3rd Qu.:78487   3rd Qu.:28832  
##  Max.   :52.00   Max.   :29623   Max.   :93313   Max.   :34973  
##    TOI.Costs    
##  Min.   :40946  
##  1st Qu.:56265  
##  Median :65722  
##  Mean   :64858  
##  3rd Qu.:73526  
##  Max.   :89363
t.test(Financials$CRU.Boxes)
## 
##  One Sample t-test
## 
## data:  Financials$CRU.Boxes
## t = 63.954, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  23071.67 24567.10
## sample estimates:
## mean of x 
##  23819.38
t.test(Financials$TOI.Boxes)
## 
##  One Sample t-test
## 
## data:  Financials$TOI.Boxes
## t = 37.998, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  24193.91 26893.01
## sample estimates:
## mean of x 
##  25543.46
t.test(Financials$CRU.Costs)
## 
##  One Sample t-test
## 
## data:  Financials$CRU.Costs
## t = 69.498, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  71696.23 75961.60
## sample estimates:
## mean of x 
##  73828.91
t.test(Financials$TOI.Costs)
## 
##  One Sample t-test
## 
## data:  Financials$TOI.Costs
## t = 42.452, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  61791.21 67925.67
## sample estimates:
## mean of x 
##  64858.44

ANSWER: 95% confidence interval for CRU boxes: 23071.67-24567.10 95% confidence interval for CRU costs: 71696.23-75961.60 95% onfidence interval for TOI boxes: 24193.91-26893.01 95% confidence interval for TOI costs: 61791.21-67925.67

TOI boxes = higher interval and larger mean than CRU boxes. Yet, TOI costs has a smaller interval when compared to CRU costs. So the company would be able to get more boxes from TOI than CRU.

  1. Provide a scatterplot of Costs and Boxes for each vendor.
ggplot(Financials) +
 aes(x = CRU.Boxes, y = CRU.Costs) +
 geom_point(size = 3L, colour = "#ef562d") +
 theme_minimal()

ggplot(Financials) +
 aes(x = TOI.Boxes, y = TOI.Costs) +
 geom_point(size = 3L, colour = "#ef562d") +
 theme_minimal()

  1. Provide a regression model for the cost structure of each vendor; costs as a function of boxes.
LM.TOI <- lm(TOI.Costs ~ TOI.Boxes, data = Financials)
summary(LM.TOI)
## 
## Call:
## lm(formula = TOI.Costs ~ TOI.Boxes, data = Financials)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6817.8 -1904.7  -374.3  2156.4  7835.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.214e+03  2.382e+03   3.868 0.000318 ***
## TOI.Boxes   2.178e+00  9.164e-02  23.771  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3173 on 50 degrees of freedom
## Multiple R-squared:  0.9187, Adjusted R-squared:  0.9171 
## F-statistic:   565 on 1 and 50 DF,  p-value: < 2.2e-16
LM.TOI$coefficients
## (Intercept)   TOI.Boxes 
## 9213.574189    2.178439
confint.lm(LM.TOI)
##                   2.5 %       97.5 %
## (Intercept) 4429.380827 13997.767552
## TOI.Boxes      1.994365     2.362512
anova(LM.TOI)
## Analysis of Variance Table
## 
## Response: TOI.Costs
##           Df     Sum Sq    Mean Sq F value    Pr(>F)    
## TOI.Boxes  1 5687137126 5687137126  565.04 < 2.2e-16 ***
## Residuals 50  503253094   10065062                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
LM.CRU <- lm(CRU.Costs ~ CRU.Boxes, data = Financials)
summary(LM.CRU, level = 0.95)
## 
## Call:
## lm(formula = CRU.Costs ~ CRU.Boxes, data = Financials)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4246.9 -2016.1  -315.6  1936.4  5096.1 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.009e+04  3.346e+03   3.015  0.00403 ** 
## CRU.Boxes   2.676e+00  1.396e-01  19.170  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2677 on 50 degrees of freedom
## Multiple R-squared:  0.8802, Adjusted R-squared:  0.8778 
## F-statistic: 367.5 on 1 and 50 DF,  p-value: < 2.2e-16
LM.CRU$coefficients
##  (Intercept)    CRU.Boxes 
## 10087.588635     2.676027
confint.lm(LM.CRU, level = 0.95)
##                   2.5 %       97.5 %
## (Intercept) 3367.505287 16807.671984
## CRU.Boxes      2.395643     2.956411
anova(LM.CRU)
## Analysis of Variance Table
## 
## Response: CRU.Costs
##           Df     Sum Sq    Mean Sq F value    Pr(>F)    
## CRU.Boxes  1 2634390301 2634390301  367.49 < 2.2e-16 ***
## Residuals 50  358431580    7168632                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
a. Write the relevant equation using the point estimates.

ANSWER: Y=b0+b1x CRU: Y_i = 100087.589 + 2.676 +/-0.140 TOI: Y_i = 9213.574 + 2.178 +/- 0.092

b. What is your best guess of one-number fixed costs [for each vendor]?

ANSWER: CRU fixed cost: 10087.589 TOI fixed cost: 9213.574

c. What is your best guess of one-number variable costs [for each vendor?

ANSWER: CRU variable cost: 2.676 TOI variable cost: 2.178

d. Add the regression line to your plot in question 2
ggplot(Financials) +
 aes(x = CRU.Boxes, y = CRU.Costs) +
 geom_point(size = 3L, colour = "#ef562d") +
geom_smooth(span = 1L) +
 labs(x = "CRU Costs", y = "CRU Boxes", title = "CRU regression ") +
     theme_minimal()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(Financials) +
 aes(x = TOI.Boxes, y = TOI.Costs) +
 geom_point(size = 3L, colour = "#ef562d") +
geom_smooth(span = 1L) +
 labs(x = "CRU Costs", y = "CRU Boxes", title = "CRU regression ") +
     theme_minimal()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

  1. Estimate each regression model and summarize the relevant regression using summary().
result <- regress(Financials, rvar = "TOI.Costs", evar = "TOI.Boxes")
summary(result, sum_check = "confint") 
## Linear regression (OLS)
## Data     : Financials 
## Response variable    : TOI.Costs 
## Explanatory variables: TOI.Boxes 
## Null hyp.: the effect of TOI.Boxes on TOI.Costs is zero
## Alt. hyp.: the effect of TOI.Boxes on TOI.Costs is not zero
## 
##              coefficient std.error t.value p.value    
##  (Intercept)    9213.574  2381.903   3.868  < .001 ***
##  TOI.Boxes         2.178     0.092  23.771  < .001 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-squared: 0.919,  Adjusted R-squared: 0.917 
## F-statistic: 565.037 df(1,50), p.value < .001
## Nr obs: 52 
## 
##             coefficient     2.5%     97.5%      +/-
## (Intercept)    9213.574 4429.381 13997.768 4784.193
## TOI.Boxes         2.178    1.994     2.363    0.184
plot(result, plots = "scatter", lines = "line", nrobs = -1, custom = FALSE)
## `geom_smooth()` using formula 'y ~ x'

result <- regress(Financials, rvar = "CRU.Costs", evar = "CRU.Boxes")
summary(result, sum_check = "confint")
## Linear regression (OLS)
## Data     : Financials 
## Response variable    : CRU.Costs 
## Explanatory variables: CRU.Boxes 
## Null hyp.: the effect of CRU.Boxes on CRU.Costs is zero
## Alt. hyp.: the effect of CRU.Boxes on CRU.Costs is not zero
## 
##              coefficient std.error t.value p.value    
##  (Intercept)   10087.589  3345.723   3.015   0.004 ** 
##  CRU.Boxes         2.676     0.140  19.170  < .001 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-squared: 0.88,  Adjusted R-squared: 0.878 
## F-statistic: 367.489 df(1,50), p.value < .001
## Nr obs: 52 
## 
##             coefficient     2.5%     97.5%      +/-
## (Intercept)   10087.589 3367.505 16807.672 6720.083
## CRU.Boxes         2.676    2.396     2.956    0.280
plot(result, plots = "scatter", lines = "line", nrobs = -1, custom = FALSE)
## `geom_smooth()` using formula 'y ~ x'

a. Are the fixed costs zero with 95% confidence for each vendor?

ANSWER: No the fixed cost does not equal 0. TOI interval is between: 4429.381 and 13997.768 CRU interval is between: 3367.505 and 16807.672

b. Are the variable costs zero with 95% confidence for each vendor?

ANSWER: No the fixed cost does not equal 0. TOI interval is between: 1.994 and 2.363 CRU interval is between: 2.396 and 2.956

c. Which vendor experienced the largest cost overrun?

ANSWER: TOI experienced the largest cost overrun.

d. Which vendor experienced the largest cost underrun?

ANSWER: CRU experienced the largest cost underrun.

  1. What is the residual standard error for each vendor? What does this mean about the predictability of their incurred costs? In absolute terms, which has more predictable costs? ANSWER: Residual standard error for CRU: 2677 Residual standard error for TOI: 3173

CRU has the more predictable cost (2677) because it is a lower number

  1. Which of the two regressions has a higher proportion of explained variance? Provide the relevant statistic for each vendor. ANSWER: CRU R squared: 0.8802 TOI = 0.919 TOI has the higher proportion of explained variance. Simply has a better model.

    1. What value of the F distribution/ratio would show that the proportion of variance explained per degree of freedom in the regression is strictly greater than the variance per degree of freedom in the residual with 1% probability?
qf(0.01, 1, 50, 565.037)
## [1] 353.9132
qf(0.01, 1, 50, 367.49)
## [1] 224.585

ANSWER: Value needed… For TOI: 353.9132 For CRU: 224.585

  1. Are the residuals normal for the regression for each vendor? Provide at least two pieces of evidence and a definitive statistical answer.
shapiro.test(LM.CRU$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  LM.CRU$residuals
## W = 0.96171, p-value = 0.09284
qqnorm(LM.CRU$residuals)

shapiro.test(LM.TOI$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  LM.TOI$residuals
## W = 0.98948, p-value = 0.9247
qqnorm(LM.TOI$residuals)

ANSWER: The shapiro test reveals if the data is normal. Both residuals come out to be greater than 0.05, meaning that normalcy cannot be excluded.

Normal Q plots compare two probability distributions by plotting the quantiles against each other. This again reveals that normalcy cannot be excluded.

  1. In what remains, assume that the residuals you located were normal.

    1. What is the probability that each vendor’s cost are predicted to within \(\pm \$5000\)?
1-(pnorm(c(5000), mean=0, sd=3173, lower.tail=FALSE)+pnorm(c(-5000), mean=0, sd=3173, lower.tail = TRUE))
## [1] 0.8849271
1-pnorm(c(20000), mean=0, sd=2677, lower.tail=FALSE)+pnorm(c(-5000), mean=0, sd=2677, lower.tail=TRUE)
## [1] 1.030898

ANSWER: TOI= 88.49% probability CRU= 96.91% probability

  1. Predict the distribution of average and all costs for each vendor were they to produce 20000 units.
result <- regress(Financials, rvar = "TOI.Costs", evar = "TOI.Boxes")
summary(result)
## Linear regression (OLS)
## Data     : Financials 
## Response variable    : TOI.Costs 
## Explanatory variables: TOI.Boxes 
## Null hyp.: the effect of TOI.Boxes on TOI.Costs is zero
## Alt. hyp.: the effect of TOI.Boxes on TOI.Costs is not zero
## 
##              coefficient std.error t.value p.value    
##  (Intercept)    9213.574  2381.903   3.868  < .001 ***
##  TOI.Boxes         2.178     0.092  23.771  < .001 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-squared: 0.919,  Adjusted R-squared: 0.917 
## F-statistic: 565.037 df(1,50), p.value < .001
## Nr obs: 52
pred <- predict(result, pred_cmd = "TOI.Boxes=(2000)")
pred <- predict(result, pred_cmd = "TOI.Boxes=(2000)", interval="prediction", level =0.95)
print(pred, n = 10)
## Linear regression (OLS)
## Data                 : Financials 
## Response variable    : TOI.Costs 
## Explanatory variables: TOI.Boxes 
## Interval             : prediction 
## Prediction command   : TOI.Boxes = (2000) 
## 
##  TOI.Boxes Prediction     2.5%     97.5%      +/-
##   2000.000  13570.452 5813.671 21327.233 7756.781
result <- regress(Financials, rvar = "CRU.Costs", evar = "CRU.Boxes")
summary(result)
## Linear regression (OLS)
## Data     : Financials 
## Response variable    : CRU.Costs 
## Explanatory variables: CRU.Boxes 
## Null hyp.: the effect of CRU.Boxes on CRU.Costs is zero
## Alt. hyp.: the effect of CRU.Boxes on CRU.Costs is not zero
## 
##              coefficient std.error t.value p.value    
##  (Intercept)   10087.589  3345.723   3.015   0.004 ** 
##  CRU.Boxes         2.676     0.140  19.170  < .001 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-squared: 0.88,  Adjusted R-squared: 0.878 
## F-statistic: 367.489 df(1,50), p.value < .001
## Nr obs: 52
pred <- predict(result, pred_cmd = "CRU.Boxes=(2000)")
pred <- predict(result, pred_cmd = "CRU.Boxes=(2000)", intervals="prediction", level =0.95)
print(pred, n=10)
## Linear regression (OLS)
## Data                 : Financials 
## Response variable    : CRU.Costs 
## Explanatory variables: CRU.Boxes 
## Interval             : confidence 
## Prediction command   : CRU.Boxes = (2000) 
## 
##  CRU.Boxes Prediction     2.5%     97.5%      +/-
##   2000.000  15439.643 9276.550 21602.737 6163.093

ANSWER: 95% confidence interval of 5813.671 - 21327.233 the predicted distribution of TOI is 13570.452.

95% confidence interval of 7260.145 - 23619.142, the predicted distribution for CRU is 15439.643.

  1. Provide the 95% confidence intervals for the parameters [slope and intercept].
result <- regress(
  Financials, 
  rvar = "CRU.Costs", 
  evar = "CRU.Boxes"
)
summary(result, sum_check = c("rmse", "sumsquares", "confint"))
## Linear regression (OLS)
## Data     : Financials 
## Response variable    : CRU.Costs 
## Explanatory variables: CRU.Boxes 
## Null hyp.: the effect of CRU.Boxes on CRU.Costs is zero
## Alt. hyp.: the effect of CRU.Boxes on CRU.Costs is not zero
## 
##              coefficient std.error t.value p.value    
##  (Intercept)   10087.589  3345.723   3.015   0.004 ** 
##  CRU.Boxes         2.676     0.140  19.170  < .001 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-squared: 0.88,  Adjusted R-squared: 0.878 
## F-statistic: 367.489 df(1,50), p.value < .001
## Nr obs: 52 
## 
## Prediction error (RMSE):  2625.436 
## Residual st.dev   (RSD):  2677.43 
## 
## Sum of squares:
##            df                SS
## Regression  1 2,634,390,301.486
## Error      50   358,431,580.079
## Total      51 2,992,821,881.565
## 
##             coefficient     2.5%     97.5%      +/-
## (Intercept)   10087.589 3367.505 16807.672 6720.083
## CRU.Boxes         2.676    2.396     2.956    0.280
result <- regress(
  Financials, 
  rvar = "TOI.Costs", 
  evar = "TOI.Boxes"
)
summary(result, sum_check = c("rmse", "sumsquares", "confint"))
## Linear regression (OLS)
## Data     : Financials 
## Response variable    : TOI.Costs 
## Explanatory variables: TOI.Boxes 
## Null hyp.: the effect of TOI.Boxes on TOI.Costs is zero
## Alt. hyp.: the effect of TOI.Boxes on TOI.Costs is not zero
## 
##              coefficient std.error t.value p.value    
##  (Intercept)    9213.574  2381.903   3.868  < .001 ***
##  TOI.Boxes         2.178     0.092  23.771  < .001 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-squared: 0.919,  Adjusted R-squared: 0.917 
## F-statistic: 565.037 df(1,50), p.value < .001
## Nr obs: 52 
## 
## Prediction error (RMSE):  3110.939 
## Residual st.dev   (RSD):  3172.548 
## 
## Sum of squares:
##            df                SS
## Regression  1 5,687,137,126.315
## Error      50   503,253,094.298
## Total      51 6,190,390,220.613
## 
##             coefficient     2.5%     97.5%      +/-
## (Intercept)    9213.574 4429.381 13997.768 4784.193
## TOI.Boxes         2.178    1.994     2.363    0.184
  1. Upper management prefers that we choose the vendor with the lowest cost per unit. Which vendor has a lower estimated cost per unit? Is it lower with 95% confidence? ANSWER: CRU costs 2.676 per unit. TOI costs 2.178 per unit. So yes, TOI is lower with 95% confidence.

Other Concerns

Contamination is always a concern for food vendors. If contamination levels are too high, the risks are too large and the vendor must be disqualified. In this case, concerns arise if the relevant probabilities are less than 0.05. Outside information about salmonella and Listeria monocytogenes should have no bearing on your decision, only the probabilities.

  1. For Curries R Us, salmonella has been a recurring issue. If industry wide, salmonella outbreaks happen at a rate of 1.6 per year, what is the probability of the 4 [or more] outbreaks experienced by Taste of India?
result <- prob_pois(lambda = 1.6, ub = 4)
summary(result)
## Probability calculator
## Distribution: Poisson
## Lambda      : 1.6 
## Mean        : 1.6 
## Variance    : 1.6 
## Lower bound :  
## Upper bound : 4 
## 
## P(X  = 4) = 0.055
## P(X  < 4) = 0.921
## P(X <= 4) = 0.976
## P(X  > 4) = 0.024
## P(X >= 4) = 0.079
plot(result)

ANSWER:P(X >= 4) = 0.079 2. For Taste of India, Listeria monocytogenes has been a recurring issue. If industry wide, Listeria monocytogenes happen at a rate of 3 per year, what is the probability of the 5 [or more] outbreaks experienced by Curries R Us?

result <- prob_pois(lambda = 3, ub = 5)
summary(result)
## Probability calculator
## Distribution: Poisson
## Lambda      : 3 
## Mean        : 3 
## Variance    : 3 
## Lower bound :  
## Upper bound : 5 
## 
## P(X  = 5) = 0.101
## P(X  < 5) = 0.815
## P(X <= 5) = 0.916
## P(X  > 5) = 0.084
## P(X >= 5) = 0.185
plot(result)

ANSWER: P(X >= 5) = 0.185

The Choice and the Evidence

  1. What vendor do you choose and why? Describe each of your findings and how they influence your recommendation. I would choose CRU: -CRU has a higher average rating than TOI. -CRU has a better predictable cost and the data shows a higher percentage of people would buy CRU products. -Although TOI come out cheaper per unit, CRU is the better choice because it has the highest chance of earning revenue than TOI.