library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(skimr)
library(radiant)
## Loading required package: radiant.data
## Loading required package: magrittr
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Loading required package: tidyr
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
##
## Attaching package: 'radiant.data'
## The following objects are masked from 'package:lubridate':
##
## month, wday
## The following object is masked from 'package:skimr':
##
## n_missing
## The following object is masked from 'package:ggplot2':
##
## diamonds
## The following object is masked from 'package:base':
##
## date
## Loading required package: radiant.design
## Loading required package: mvtnorm
## Loading required package: radiant.basics
## Loading required package: radiant.model
## Loading required package: radiant.multivariate
library(esquisse)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
library(gvlma)
The Marks and Spencer food division has a problem and you have previously recommended a course of action for solving the problem. Their original problem was to compare a Tikka Masala and Vindaloo because of the supplier disruptions. Both products were produced by the same vendor and the vendor could no longer fully supply Marks and Spencer’s required quantities for both products. The vendor can continue to supply one of the two products but a new vendor will be needed if both products were continued. The company has decided to leave the Tikka Masala product with the existing vendor and to seek out a new vendor for the Vindaloo product. Your task is to help them choose a vendor among CRU Curries R Us and TOI Taste of India; two UK based vendors that Marks and Spencer has worked with in the past. Variables with names starting with Stacked indicate stacked/tidy versions.
The prior recipe has a trademarked sauce that is an ingredient in the current product. The product must be changed.
names(Ingredients)
## [1] "TOISauceCost" "CRUSauceCost" "StackedSauceCost" "StackedVendor"
## [5] "TOIPurchase" "CRUPurchase" "StackedPurchase"
skim(Ingredients)
| Name | Ingredients |
| Number of rows | 145 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| StackedVendor | 0 | 1.00 | 3 | 3 | 0 | 2 | 0 |
| TOIPurchase | 81 | 0.44 | 3 | 6 | 0 | 2 | 0 |
| CRUPurchase | 64 | 0.56 | 3 | 6 | 0 | 2 | 0 |
| StackedPurchase | 0 | 1.00 | 3 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TOISauceCost | 81 | 0.44 | 2.46 | 0.25 | 1.95 | 2.29 | 2.42 | 2.63 | 3.00 | ▂▇▇▅▃ |
| CRUSauceCost | 64 | 0.56 | 2.59 | 0.13 | 2.23 | 2.50 | 2.59 | 2.68 | 2.87 | ▁▃▇▇▂ |
| StackedSauceCost | 0 | 1.00 | 2.54 | 0.20 | 1.95 | 2.41 | 2.55 | 2.68 | 3.00 | ▁▃▇▇▂ |
names(Ratings)
## [1] "Rater" "CRU.Ratings" "TOI.Ratings" "Stacked.Rating"
## [5] "Stacked.Vendor" "Stacked.Rater"
skim(Ratings)
| Name | Ratings |
| Number of rows | 100 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Stacked.Vendor | 0 | 1 | 3 | 3 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Rater | 50 | 0.5 | 25.50 | 14.58 | 1 | 13.25 | 25.5 | 37.75 | 50 | ▇▇▇▇▇ |
| CRU.Ratings | 50 | 0.5 | 76.04 | 3.57 | 68 | 73.00 | 76.0 | 78.75 | 87 | ▁▇▇▃▁ |
| TOI.Ratings | 50 | 0.5 | 74.70 | 4.60 | 65 | 72.00 | 75.0 | 77.00 | 85 | ▃▅▇▂▂ |
| Stacked.Rating | 0 | 1.0 | 75.37 | 4.15 | 65 | 73.00 | 75.0 | 78.00 | 87 | ▂▅▇▃▁ |
| Stacked.Rater | 0 | 1.0 | 25.50 | 14.50 | 1 | 13.00 | 25.5 | 38.00 | 50 | ▇▇▇▇▇ |
names(Financials)
## [1] "Week" "CRU.Boxes" "CRU.Costs" "TOI.Boxes" "TOI.Costs"
skim(Financials)
| Name | Financials |
| Number of rows | 52 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Week | 0 | 1 | 26.50 | 15.15 | 1.00 | 13.75 | 26.50 | 39.25 | 52.00 | ▇▇▇▇▇ |
| CRU.Boxes | 0 | 1 | 23819.38 | 2685.74 | 17402.00 | 21854.75 | 24072.50 | 25475.75 | 29623.00 | ▂▇▇▇▂ |
| CRU.Costs | 0 | 1 | 73828.91 | 7660.47 | 53113.02 | 68879.12 | 73832.18 | 78487.08 | 93313.29 | ▁▃▇▆▁ |
| TOI.Boxes | 0 | 1 | 25543.46 | 4847.48 | 16364.00 | 21477.25 | 25063.00 | 28832.00 | 34973.00 | ▃▇▇▆▅ |
| TOI.Costs | 0 | 1 | 64858.44 | 11017.27 | 40945.97 | 56265.18 | 65721.95 | 73525.62 | 89363.09 | ▂▇▆▆▂ |
Simple computer code and/or output is never an adequate response; responses should be provided and discussed in their appropriate metrics. Answer the questions clearly interpreting the evidence. You must do your own work; an honor pledge is provided and required for submission.
ggplot(Ratings) +
aes(x = Stacked.Rating, fill = Stacked.Vendor) +
geom_histogram(bins = 30L) +
scale_fill_viridis_d(option = "viridis") +
theme_minimal()
ANSWER: Shows the rating breakdown of each vendor. CRU = consistent feedback from consumers. TOI = unstable variable range and reveals poor ratings.
summary(Ratings)
## Rater CRU.Ratings TOI.Ratings Stacked.Rating
## Min. : 1.00 Min. :68.00 Min. :65.0 Min. :65.00
## 1st Qu.:13.25 1st Qu.:73.00 1st Qu.:72.0 1st Qu.:73.00
## Median :25.50 Median :76.00 Median :75.0 Median :75.00
## Mean :25.50 Mean :76.04 Mean :74.7 Mean :75.37
## 3rd Qu.:37.75 3rd Qu.:78.75 3rd Qu.:77.0 3rd Qu.:78.00
## Max. :50.00 Max. :87.00 Max. :85.0 Max. :87.00
## NA's :50 NA's :50 NA's :50
## Stacked.Vendor Stacked.Rater
## Length:100 Min. : 1.0
## Class :character 1st Qu.:13.0
## Mode :character Median :25.5
## Mean :25.5
## 3rd Qu.:38.0
## Max. :50.0
##
ANSWER: CRU: average rating = 76.4 / standard deviation = 3.57. TOI: average rating = 74.7 / standard deviation = 4.60.
CRU has a better rating than TOI. TOI’s standard deviation is higher/ratings have more range.
CRU interquartile range: 78.75-73.00 = 5.75 TOI interquartile range: 77.0-72.0 = 5.00
t.test(Ratings$CRU.Ratings)
##
## One Sample t-test
##
## data: Ratings$CRU.Ratings
## t = 150.44, df = 49, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 75.02426 77.05574
## sample estimates:
## mean of x
## 76.04
t.test(Ratings$TOI.Ratings)
##
## One Sample t-test
##
## data: Ratings$TOI.Ratings
## t = 114.85, df = 49, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 73.39291 76.00709
## sample estimates:
## mean of x
## 74.7
ANSWER: CRU: Average rating = 2.5 97.5 75.024 77.056 TOI: Average rating = 2.5 97.5 73.393 76.007
Ingredients%>%select(CRUSauceCost, TOISauceCost) %>% skim()
| Name | Piped data |
| Number of rows | 145 |
| Number of columns | 2 |
| _______________________ | |
| Column type frequency: | |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| CRUSauceCost | 64 | 0.56 | 2.59 | 0.13 | 2.23 | 2.50 | 2.59 | 2.68 | 2.87 | ▁▃▇▇▂ |
| TOISauceCost | 81 | 0.44 | 2.46 | 0.25 | 1.95 | 2.29 | 2.42 | 2.63 | 3.00 | ▂▇▇▅▃ |
ggplot(Ingredients) +
aes(x = StackedSauceCost, fill = StackedVendor) +
geom_density(adjust = 1L) +
scale_fill_viridis_d(option = "viridis") +
theme_minimal()
ANSWER: CRU: standard deviation = 0.128 TOI: standard deviation = 0.249
CRU costs more than TOI. Yet CRU has more consistent numbers.
result <- prob_norm(mean = 2.59, stdev = 0.128, ub = 2.35)
summary(result)
## Probability calculator
## Distribution: Normal
## Mean : 2.59
## St. dev : 0.128
## Lower bound : -Inf
## Upper bound : 2.35
##
## P(X < 2.35) = 0.03
## P(X > 2.35) = 0.97
plot(result)
result <- single_mean(
Ingredients,
var = "CRUSauceCost",
comp_value = 2.35,
alternative = "less"
)
summary(result)
## Single mean test
## Data : Ingredients
## Variable : CRUSauceCost
## Confidence: 0.95
## Null hyp. : the mean of CRUSauceCost = 2.35
## Alt. hyp. : the mean of CRUSauceCost is < 2.35
##
## mean n n_missing sd se me
## 2.594 145 64 0.128 0.014 0.028
##
## diff se t.value p.value df 0% 95%
## 0.244 0.014 17.157 1 80 -Inf 2.618
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result, plots = "hist", custom = FALSE)
## Warning: Removed 64 rows containing non-finite values (stat_bin).
ANSWER:Probability: P(X>2.35) = 97%
b. What is the probability of costs below \$2.35 for Taste of India?
result <- prob_norm(mean = 2.46, stdev = 0.249, ub = 2.35)
summary(result)
## Probability calculator
## Distribution: Normal
## Mean : 2.46
## St. dev : 0.249
## Lower bound : -Inf
## Upper bound : 2.35
##
## P(X < 2.35) = 0.329
## P(X > 2.35) = 0.671
plot(result)
result <- single_mean(
Ingredients,
var = "TOISauceCost",
comp_value = 2.35,
alternative = "less"
)
summary(result)
## Single mean test
## Data : Ingredients
## Variable : TOISauceCost
## Confidence: 0.95
## Null hyp. : the mean of TOISauceCost = 2.35
## Alt. hyp. : the mean of TOISauceCost is < 2.35
##
## mean n n_missing sd se me
## 2.465 145 81 0.249 0.031 0.062
##
## diff se t.value p.value df 0% 95%
## 0.115 0.031 3.688 1 63 -Inf 2.517
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result, plots = "hist", custom = FALSE)
## Warning: Removed 81 rows containing non-finite values (stat_bin).
ANSWER: 67%
t.test(Ingredients$TOISauceCost)
##
## One Sample t-test
##
## data: Ingredients$TOISauceCost
## t = 79.154, df = 63, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 2.402616 2.527072
## sample estimates:
## mean of x
## 2.464844
t.test(Ingredients$CRUSauceCost)
##
## One Sample t-test
##
## data: Ingredients$CRUSauceCost
## t = 182.19, df = 80, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 2.565983 2.622659
## sample estimates:
## mean of x
## 2.594321
ANSWER: CRU average sauce cost: 2.565983 - 2.622659 TOI average sauce cost: 2.402616 - 2.527072
t.test(StackedSauceCost~StackedVendor, data=Ingredients)
##
## Welch Two Sample t-test
##
## data: StackedSauceCost by StackedVendor
## t = 3.7813, df = 89.037, p-value = 0.0002819
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.06144086 0.19751362
## sample estimates:
## mean in group CRU mean in group TOI
## 2.594321 2.464844
ANSWER: 95 percent confidence interval: 0.06144086 - 0.19751362
binom.test(table(Ingredients$TOIPurchase))
##
## Exact binomial test
##
## data: table(Ingredients$TOIPurchase)
## number of successes = 43, number of trials = 64, p-value = 0.008147
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.5431232 0.7841280
## sample estimates:
## probability of success
## 0.671875
binom.test(table(Ingredients$CRUPurchase))
##
## Exact binomial test
##
## data: table(Ingredients$CRUPurchase)
## number of successes = 43, number of trials = 81, p-value = 0.657
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4166718 0.6427332
## sample estimates:
## probability of success
## 0.5308642
ANSWER: 0.671875 - 0.5308642. The proportion of people who would buy the products of each vendor is the probability of success.
prop.test(x=c(64,81), n=c(85,85))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(64, 81) out of c(85, 85)
## X-squared = 12.006, df = 1, p-value = 0.0005304
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.31390985 -0.08609015
## sample estimates:
## prop 1 prop 2
## 0.7529412 0.9529412
ANSWER:
TOI (Probability of response):64.5% and 83.4%. = 75% CRU (Probability of response):87.7% and 98.5%. = 95%.
binom.test(table(Ingredients$TOIPurchase))
##
## Exact binomial test
##
## data: table(Ingredients$TOIPurchase)
## number of successes = 43, number of trials = 64, p-value = 0.008147
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.5431232 0.7841280
## sample estimates:
## probability of success
## 0.671875
binom.test(table(Ingredients$CRUPurchase))
##
## Exact binomial test
##
## data: table(Ingredients$CRUPurchase)
## number of successes = 43, number of trials = 81, p-value = 0.657
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4166718 0.6427332
## sample estimates:
## probability of success
## 0.5308642
ANSWER: TOI (95% confidence interval): 0.4166718-0.6427332 –> 41%-64% would buy from TOI CRU (95% confidence interval): 0.5431232-.07841280 –> 54%-78% would buy from CRU
prop.test(x=c(43,43), n=c(64,81))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(43, 43) out of c(64, 81)
## X-squared = 2.3904, df = 1, p-value = 0.1221
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.03122668 0.31324829
## sample estimates:
## prop 1 prop 2
## 0.6718750 0.5308642
ANSWER: Confidence interval: -0.3122668 - 0.31324829 TOI: prop 1 (0.6718750) CRU: prop 2 (0.5308642)
summary(Financials)
## Week CRU.Boxes CRU.Costs TOI.Boxes
## Min. : 1.00 Min. :17402 Min. :53113 Min. :16364
## 1st Qu.:13.75 1st Qu.:21855 1st Qu.:68879 1st Qu.:21477
## Median :26.50 Median :24072 Median :73832 Median :25063
## Mean :26.50 Mean :23819 Mean :73829 Mean :25543
## 3rd Qu.:39.25 3rd Qu.:25476 3rd Qu.:78487 3rd Qu.:28832
## Max. :52.00 Max. :29623 Max. :93313 Max. :34973
## TOI.Costs
## Min. :40946
## 1st Qu.:56265
## Median :65722
## Mean :64858
## 3rd Qu.:73526
## Max. :89363
t.test(Financials$CRU.Boxes)
##
## One Sample t-test
##
## data: Financials$CRU.Boxes
## t = 63.954, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 23071.67 24567.10
## sample estimates:
## mean of x
## 23819.38
t.test(Financials$TOI.Boxes)
##
## One Sample t-test
##
## data: Financials$TOI.Boxes
## t = 37.998, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 24193.91 26893.01
## sample estimates:
## mean of x
## 25543.46
t.test(Financials$CRU.Costs)
##
## One Sample t-test
##
## data: Financials$CRU.Costs
## t = 69.498, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 71696.23 75961.60
## sample estimates:
## mean of x
## 73828.91
t.test(Financials$TOI.Costs)
##
## One Sample t-test
##
## data: Financials$TOI.Costs
## t = 42.452, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 61791.21 67925.67
## sample estimates:
## mean of x
## 64858.44
ANSWER: 95% confidence interval for CRU boxes: 23071.67-24567.10 95% confidence interval for CRU costs: 71696.23-75961.60 95% onfidence interval for TOI boxes: 24193.91-26893.01 95% confidence interval for TOI costs: 61791.21-67925.67
TOI boxes = higher interval and larger mean than CRU boxes. Yet, TOI costs has a smaller interval when compared to CRU costs. So the company would be able to get more boxes from TOI than CRU.
ggplot(Financials) +
aes(x = CRU.Boxes, y = CRU.Costs) +
geom_point(size = 3L, colour = "#ef562d") +
theme_minimal()
ggplot(Financials) +
aes(x = TOI.Boxes, y = TOI.Costs) +
geom_point(size = 3L, colour = "#ef562d") +
theme_minimal()
LM.TOI <- lm(TOI.Costs ~ TOI.Boxes, data = Financials)
summary(LM.TOI)
##
## Call:
## lm(formula = TOI.Costs ~ TOI.Boxes, data = Financials)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6817.8 -1904.7 -374.3 2156.4 7835.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.214e+03 2.382e+03 3.868 0.000318 ***
## TOI.Boxes 2.178e+00 9.164e-02 23.771 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3173 on 50 degrees of freedom
## Multiple R-squared: 0.9187, Adjusted R-squared: 0.9171
## F-statistic: 565 on 1 and 50 DF, p-value: < 2.2e-16
LM.TOI$coefficients
## (Intercept) TOI.Boxes
## 9213.574189 2.178439
confint.lm(LM.TOI)
## 2.5 % 97.5 %
## (Intercept) 4429.380827 13997.767552
## TOI.Boxes 1.994365 2.362512
anova(LM.TOI)
## Analysis of Variance Table
##
## Response: TOI.Costs
## Df Sum Sq Mean Sq F value Pr(>F)
## TOI.Boxes 1 5687137126 5687137126 565.04 < 2.2e-16 ***
## Residuals 50 503253094 10065062
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
LM.CRU <- lm(CRU.Costs ~ CRU.Boxes, data = Financials)
summary(LM.CRU, level = 0.95)
##
## Call:
## lm(formula = CRU.Costs ~ CRU.Boxes, data = Financials)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4246.9 -2016.1 -315.6 1936.4 5096.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.009e+04 3.346e+03 3.015 0.00403 **
## CRU.Boxes 2.676e+00 1.396e-01 19.170 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2677 on 50 degrees of freedom
## Multiple R-squared: 0.8802, Adjusted R-squared: 0.8778
## F-statistic: 367.5 on 1 and 50 DF, p-value: < 2.2e-16
LM.CRU$coefficients
## (Intercept) CRU.Boxes
## 10087.588635 2.676027
confint.lm(LM.CRU, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 3367.505287 16807.671984
## CRU.Boxes 2.395643 2.956411
anova(LM.CRU)
## Analysis of Variance Table
##
## Response: CRU.Costs
## Df Sum Sq Mean Sq F value Pr(>F)
## CRU.Boxes 1 2634390301 2634390301 367.49 < 2.2e-16 ***
## Residuals 50 358431580 7168632
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
a. Write the relevant equation using the point estimates.
ANSWER: Y=b0+b1x CRU: Y_i = 100087.589 + 2.676 +/-0.140 TOI: Y_i = 9213.574 + 2.178 +/- 0.092
b. What is your best guess of one-number fixed costs [for each vendor]?
ANSWER: CRU fixed cost: 10087.589 TOI fixed cost: 9213.574
c. What is your best guess of one-number variable costs [for each vendor?
ANSWER: CRU variable cost: 2.676 TOI variable cost: 2.178
d. Add the regression line to your plot in question 2
ggplot(Financials) +
aes(x = CRU.Boxes, y = CRU.Costs) +
geom_point(size = 3L, colour = "#ef562d") +
geom_smooth(span = 1L) +
labs(x = "CRU Costs", y = "CRU Boxes", title = "CRU regression ") +
theme_minimal()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(Financials) +
aes(x = TOI.Boxes, y = TOI.Costs) +
geom_point(size = 3L, colour = "#ef562d") +
geom_smooth(span = 1L) +
labs(x = "CRU Costs", y = "CRU Boxes", title = "CRU regression ") +
theme_minimal()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
summary().result <- regress(Financials, rvar = "TOI.Costs", evar = "TOI.Boxes")
summary(result, sum_check = "confint")
## Linear regression (OLS)
## Data : Financials
## Response variable : TOI.Costs
## Explanatory variables: TOI.Boxes
## Null hyp.: the effect of TOI.Boxes on TOI.Costs is zero
## Alt. hyp.: the effect of TOI.Boxes on TOI.Costs is not zero
##
## coefficient std.error t.value p.value
## (Intercept) 9213.574 2381.903 3.868 < .001 ***
## TOI.Boxes 2.178 0.092 23.771 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-squared: 0.919, Adjusted R-squared: 0.917
## F-statistic: 565.037 df(1,50), p.value < .001
## Nr obs: 52
##
## coefficient 2.5% 97.5% +/-
## (Intercept) 9213.574 4429.381 13997.768 4784.193
## TOI.Boxes 2.178 1.994 2.363 0.184
plot(result, plots = "scatter", lines = "line", nrobs = -1, custom = FALSE)
## `geom_smooth()` using formula 'y ~ x'
result <- regress(Financials, rvar = "CRU.Costs", evar = "CRU.Boxes")
summary(result, sum_check = "confint")
## Linear regression (OLS)
## Data : Financials
## Response variable : CRU.Costs
## Explanatory variables: CRU.Boxes
## Null hyp.: the effect of CRU.Boxes on CRU.Costs is zero
## Alt. hyp.: the effect of CRU.Boxes on CRU.Costs is not zero
##
## coefficient std.error t.value p.value
## (Intercept) 10087.589 3345.723 3.015 0.004 **
## CRU.Boxes 2.676 0.140 19.170 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-squared: 0.88, Adjusted R-squared: 0.878
## F-statistic: 367.489 df(1,50), p.value < .001
## Nr obs: 52
##
## coefficient 2.5% 97.5% +/-
## (Intercept) 10087.589 3367.505 16807.672 6720.083
## CRU.Boxes 2.676 2.396 2.956 0.280
plot(result, plots = "scatter", lines = "line", nrobs = -1, custom = FALSE)
## `geom_smooth()` using formula 'y ~ x'
a. Are the fixed costs zero with 95% confidence for each vendor?
ANSWER: No the fixed cost does not equal 0. TOI interval is between: 4429.381 and 13997.768 CRU interval is between: 3367.505 and 16807.672
b. Are the variable costs zero with 95% confidence for each vendor?
ANSWER: No the fixed cost does not equal 0. TOI interval is between: 1.994 and 2.363 CRU interval is between: 2.396 and 2.956
c. Which vendor experienced the largest cost overrun?
ANSWER: TOI experienced the largest cost overrun.
d. Which vendor experienced the largest cost underrun?
ANSWER: CRU experienced the largest cost underrun.
CRU has the more predictable cost (2677) because it is a lower number
Which of the two regressions has a higher proportion of explained variance? Provide the relevant statistic for each vendor. ANSWER: CRU R squared: 0.8802 TOI = 0.919 TOI has the higher proportion of explained variance. Simply has a better model.
qf(0.01, 1, 50, 565.037)
## [1] 353.9132
qf(0.01, 1, 50, 367.49)
## [1] 224.585
ANSWER: Value needed… For TOI: 353.9132 For CRU: 224.585
shapiro.test(LM.CRU$residuals)
##
## Shapiro-Wilk normality test
##
## data: LM.CRU$residuals
## W = 0.96171, p-value = 0.09284
qqnorm(LM.CRU$residuals)
shapiro.test(LM.TOI$residuals)
##
## Shapiro-Wilk normality test
##
## data: LM.TOI$residuals
## W = 0.98948, p-value = 0.9247
qqnorm(LM.TOI$residuals)
ANSWER: The shapiro test reveals if the data is normal. Both residuals come out to be greater than 0.05, meaning that normalcy cannot be excluded.
Normal Q plots compare two probability distributions by plotting the quantiles against each other. This again reveals that normalcy cannot be excluded.
In what remains, assume that the residuals you located were normal.
1-(pnorm(c(5000), mean=0, sd=3173, lower.tail=FALSE)+pnorm(c(-5000), mean=0, sd=3173, lower.tail = TRUE))
## [1] 0.8849271
1-pnorm(c(20000), mean=0, sd=2677, lower.tail=FALSE)+pnorm(c(-5000), mean=0, sd=2677, lower.tail=TRUE)
## [1] 1.030898
ANSWER: TOI= 88.49% probability CRU= 96.91% probability
result <- regress(Financials, rvar = "TOI.Costs", evar = "TOI.Boxes")
summary(result)
## Linear regression (OLS)
## Data : Financials
## Response variable : TOI.Costs
## Explanatory variables: TOI.Boxes
## Null hyp.: the effect of TOI.Boxes on TOI.Costs is zero
## Alt. hyp.: the effect of TOI.Boxes on TOI.Costs is not zero
##
## coefficient std.error t.value p.value
## (Intercept) 9213.574 2381.903 3.868 < .001 ***
## TOI.Boxes 2.178 0.092 23.771 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-squared: 0.919, Adjusted R-squared: 0.917
## F-statistic: 565.037 df(1,50), p.value < .001
## Nr obs: 52
pred <- predict(result, pred_cmd = "TOI.Boxes=(2000)")
pred <- predict(result, pred_cmd = "TOI.Boxes=(2000)", interval="prediction", level =0.95)
print(pred, n = 10)
## Linear regression (OLS)
## Data : Financials
## Response variable : TOI.Costs
## Explanatory variables: TOI.Boxes
## Interval : prediction
## Prediction command : TOI.Boxes = (2000)
##
## TOI.Boxes Prediction 2.5% 97.5% +/-
## 2000.000 13570.452 5813.671 21327.233 7756.781
result <- regress(Financials, rvar = "CRU.Costs", evar = "CRU.Boxes")
summary(result)
## Linear regression (OLS)
## Data : Financials
## Response variable : CRU.Costs
## Explanatory variables: CRU.Boxes
## Null hyp.: the effect of CRU.Boxes on CRU.Costs is zero
## Alt. hyp.: the effect of CRU.Boxes on CRU.Costs is not zero
##
## coefficient std.error t.value p.value
## (Intercept) 10087.589 3345.723 3.015 0.004 **
## CRU.Boxes 2.676 0.140 19.170 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-squared: 0.88, Adjusted R-squared: 0.878
## F-statistic: 367.489 df(1,50), p.value < .001
## Nr obs: 52
pred <- predict(result, pred_cmd = "CRU.Boxes=(2000)")
pred <- predict(result, pred_cmd = "CRU.Boxes=(2000)", intervals="prediction", level =0.95)
print(pred, n=10)
## Linear regression (OLS)
## Data : Financials
## Response variable : CRU.Costs
## Explanatory variables: CRU.Boxes
## Interval : confidence
## Prediction command : CRU.Boxes = (2000)
##
## CRU.Boxes Prediction 2.5% 97.5% +/-
## 2000.000 15439.643 9276.550 21602.737 6163.093
ANSWER: 95% confidence interval of 5813.671 - 21327.233 the predicted distribution of TOI is 13570.452.
95% confidence interval of 7260.145 - 23619.142, the predicted distribution for CRU is 15439.643.
result <- regress(
Financials,
rvar = "CRU.Costs",
evar = "CRU.Boxes"
)
summary(result, sum_check = c("rmse", "sumsquares", "confint"))
## Linear regression (OLS)
## Data : Financials
## Response variable : CRU.Costs
## Explanatory variables: CRU.Boxes
## Null hyp.: the effect of CRU.Boxes on CRU.Costs is zero
## Alt. hyp.: the effect of CRU.Boxes on CRU.Costs is not zero
##
## coefficient std.error t.value p.value
## (Intercept) 10087.589 3345.723 3.015 0.004 **
## CRU.Boxes 2.676 0.140 19.170 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-squared: 0.88, Adjusted R-squared: 0.878
## F-statistic: 367.489 df(1,50), p.value < .001
## Nr obs: 52
##
## Prediction error (RMSE): 2625.436
## Residual st.dev (RSD): 2677.43
##
## Sum of squares:
## df SS
## Regression 1 2,634,390,301.486
## Error 50 358,431,580.079
## Total 51 2,992,821,881.565
##
## coefficient 2.5% 97.5% +/-
## (Intercept) 10087.589 3367.505 16807.672 6720.083
## CRU.Boxes 2.676 2.396 2.956 0.280
result <- regress(
Financials,
rvar = "TOI.Costs",
evar = "TOI.Boxes"
)
summary(result, sum_check = c("rmse", "sumsquares", "confint"))
## Linear regression (OLS)
## Data : Financials
## Response variable : TOI.Costs
## Explanatory variables: TOI.Boxes
## Null hyp.: the effect of TOI.Boxes on TOI.Costs is zero
## Alt. hyp.: the effect of TOI.Boxes on TOI.Costs is not zero
##
## coefficient std.error t.value p.value
## (Intercept) 9213.574 2381.903 3.868 < .001 ***
## TOI.Boxes 2.178 0.092 23.771 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-squared: 0.919, Adjusted R-squared: 0.917
## F-statistic: 565.037 df(1,50), p.value < .001
## Nr obs: 52
##
## Prediction error (RMSE): 3110.939
## Residual st.dev (RSD): 3172.548
##
## Sum of squares:
## df SS
## Regression 1 5,687,137,126.315
## Error 50 503,253,094.298
## Total 51 6,190,390,220.613
##
## coefficient 2.5% 97.5% +/-
## (Intercept) 9213.574 4429.381 13997.768 4784.193
## TOI.Boxes 2.178 1.994 2.363 0.184
Contamination is always a concern for food vendors. If contamination levels are too high, the risks are too large and the vendor must be disqualified. In this case, concerns arise if the relevant probabilities are less than 0.05. Outside information about salmonella and Listeria monocytogenes should have no bearing on your decision, only the probabilities.
result <- prob_pois(lambda = 1.6, ub = 4)
summary(result)
## Probability calculator
## Distribution: Poisson
## Lambda : 1.6
## Mean : 1.6
## Variance : 1.6
## Lower bound :
## Upper bound : 4
##
## P(X = 4) = 0.055
## P(X < 4) = 0.921
## P(X <= 4) = 0.976
## P(X > 4) = 0.024
## P(X >= 4) = 0.079
plot(result)
ANSWER:P(X >= 4) = 0.079 2. For Taste of India, Listeria monocytogenes has been a recurring issue. If industry wide, Listeria monocytogenes happen at a rate of 3 per year, what is the probability of the 5 [or more] outbreaks experienced by Curries R Us?
result <- prob_pois(lambda = 3, ub = 5)
summary(result)
## Probability calculator
## Distribution: Poisson
## Lambda : 3
## Mean : 3
## Variance : 3
## Lower bound :
## Upper bound : 5
##
## P(X = 5) = 0.101
## P(X < 5) = 0.815
## P(X <= 5) = 0.916
## P(X > 5) = 0.084
## P(X >= 5) = 0.185
plot(result)
ANSWER: P(X >= 5) = 0.185