Setup

question 1

pizza_Toppings <- c("pineaple","pepper","pepperoni")
toppings <- c(0,1,2,3)
#toppings = (1/pizza_Toppings * 1/toppings)
prob <- (1/3 *1/4)
prob
## [1] 0.08333333

There is a probability of 0.08333333 that a random customer will ask for two toppings. Question 2

x=5
dbinom(x<= 5, size =100, prob = 0.03)
## [1] 0.1470696

There is a 0.1470696 probability that at most five tomatoes are bad.

Question 3

two.dice <- function(){
 dice <- c(3,3,5,5,7,7)
}
more.sims <- replicate(1000, two.dice())
table(more.sims)/length(more.sims)
## more.sims
##         3         5         7 
## 0.3333333 0.3333333 0.3333333
two.dice2 <- function(){
 dice <- c(2,2,4,4,9,9)
}
more.sims <- replicate(1000, two.dice2())
table(more.sims)/length(more.sims)
## more.sims
##         2         4         9 
## 0.3333333 0.3333333 0.3333333
two.dice2 <- function(){
 dice <- c(1,1,6,6,8,8)
}
more.sims <- replicate(1000, two.dice2())
table(more.sims)/length(more.sims)
## more.sims
##         1         6         8 
## 0.3333333 0.3333333 0.3333333

Question 4, binomial distribution probabilities

dbinom(5,10, prob = 0.5)
## [1] 0.2460938
dbinom(50,100, prob = 0.5)
## [1] 0.07958924
dbinom(500,1000, prob=0.5)
## [1] 0.02522502

5 heads in 10 coin flips has the highest likelihood with a 24.60938% chance of occurrence, followed by 5o heads in 100 coin flips with 7.958924% chance and 500 heads has the least chance of occurrence with a 2.522502% chance of occurrence.

Question 5

my.dice.sum <- function(n.dice, n.sides){
  dice <- sample(1:n.sides, size = n.dice, replace = TRUE)
  return(sum(dice))
}
replicate(10, my.dice.sum(3,4))
##  [1] 7 5 6 8 4 7 9 6 8 9

Question 6

experimentA <- function(){
  rolls <- sample(1:6, size = 4, replace = TRUE)
  condition <- sum(rolls == 6) > 0
  return(condition)
}

experimentB <- function(){
  first.die <- sample(1:6, size = 24, replace = TRUE)
  second.die <- sample(1:6, size = 24, replace = TRUE)
  condition <- sum((first.die == second.die) & (first.die == 6)) > 0
  return(condition)
}

simsA <- replicate(100000, experimentA())
sum(simsA)/length(simsA)
## [1] 0.51783
simsB <- replicate(100000, experimentB())
sum(simsB)/length(simsB) 
## [1] 0.49217

There is a 51.63% likelihood of getting at least one 6 when rolling a single fair six-sided die 4 times and a 48.996% for getting at least one pair of sixes when two fair , six-sided dice are thrown 24 times. It is therefore conclusive that option (A) is more likely.

Perform an analysis of the “store24” data set. Importing the data set and converting to a data frame

library(readr)
library(psych)
setwd("~/R codes and pdf/my data")
Store24 <- read_csv(data.frame("Store24.csv"))

Question 7(a) comparing the results of the summary statistics and the psych library data summary

summary <- summary(Store24);summary
##      store          Sales             Profit          MTenure      
##  Min.   : 1.0   Min.   : 699306   Min.   :122180   Min.   :  0.00  
##  1st Qu.:19.5   1st Qu.: 984579   1st Qu.:211004   1st Qu.:  6.67  
##  Median :38.0   Median :1127332   Median :265014   Median : 24.12  
##  Mean   :38.0   Mean   :1205413   Mean   :276314   Mean   : 45.30  
##  3rd Qu.:56.5   3rd Qu.:1362388   3rd Qu.:331314   3rd Qu.: 50.92  
##  Max.   :75.0   Max.   :2113089   Max.   :518998   Max.   :277.99  
##     CTenure              Pop             Comp          Visibility  
##  Min.   :  0.8871   Min.   : 1046   Min.   : 1.651   Min.   :2.00  
##  1st Qu.:  4.3943   1st Qu.: 5616   1st Qu.: 3.151   1st Qu.:3.00  
##  Median :  7.2115   Median : 8896   Median : 3.629   Median :3.00  
##  Mean   : 13.9315   Mean   : 9826   Mean   : 3.788   Mean   :3.08  
##  3rd Qu.: 17.2156   3rd Qu.:14104   3rd Qu.: 4.230   3rd Qu.:4.00  
##  Max.   :114.1519   Max.   :26519   Max.   :11.128   Max.   :5.00  
##     PedCount         Res          Hours24       CrewSkill        MgrSkill    
##  Min.   :1.00   Min.   :0.00   Min.   :0.00   Min.   :2.060   Min.   :2.957  
##  1st Qu.:2.00   1st Qu.:1.00   1st Qu.:1.00   1st Qu.:3.225   1st Qu.:3.344  
##  Median :3.00   Median :1.00   Median :1.00   Median :3.500   Median :3.589  
##  Mean   :2.96   Mean   :0.96   Mean   :0.84   Mean   :3.457   Mean   :3.638  
##  3rd Qu.:4.00   3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:3.655   3rd Qu.:3.925  
##  Max.   :5.00   Max.   :1.00   Max.   :1.00   Max.   :4.640   Max.   :4.622  
##     ServQual     
##  Min.   : 57.90  
##  1st Qu.: 78.95  
##  Median : 89.47  
##  Mean   : 87.15  
##  3rd Qu.: 99.90  
##  Max.   :100.00
library(dplyr)
library(tidyverse)
psych <- describe(Store24);psych
##            vars  n       mean        sd     median    trimmed       mad
## store         1 75      38.00     21.79      38.00      38.00     28.17
## Sales         2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit        3 75  276313.61  89404.08  265014.00  270260.34  90532.00
## MTenure       4 75      45.30     57.67      24.12      33.58     29.67
## CTenure       5 75      13.93     17.70       7.21      10.60      6.14
## Pop           6 75    9825.59   5911.67    8896.00    9366.07   7266.22
## Comp          7 75       3.79      1.31       3.63       3.66      0.82
## Visibility    8 75       3.08      0.75       3.00       3.07      0.00
## PedCount      9 75       2.96      0.99       3.00       2.97      1.48
## Res          10 75       0.96      0.20       1.00       1.00      0.00
## Hours24      11 75       0.84      0.37       1.00       0.92      0.00
## CrewSkill    12 75       3.46      0.41       3.50       3.47      0.34
## MgrSkill     13 75       3.64      0.41       3.59       3.62      0.45
## ServQual     14 75      87.15     12.61      89.47      88.62     15.61
##                  min        max      range  skew kurtosis       se
## store           1.00      75.00      74.00  0.00    -1.25     2.52
## Sales      699306.00 2113089.00 1413783.00  0.71    -0.09 35164.25
## Profit     122180.00  518998.00  396818.00  0.62    -0.21 10323.49
## MTenure         0.00     277.99     277.99  2.01     3.90     6.66
## CTenure         0.89     114.15     113.26  3.52    15.00     2.04
## Pop          1046.00   26519.00   25473.00  0.62    -0.23   682.62
## Comp            1.65      11.13       9.48  2.48    11.31     0.15
## Visibility      2.00       5.00       3.00  0.25    -0.38     0.09
## PedCount        1.00       5.00       4.00  0.00    -0.52     0.11
## Res             0.00       1.00       1.00 -4.60    19.43     0.02
## Hours24         0.00       1.00       1.00 -1.82     1.32     0.04
## CrewSkill       2.06       4.64       2.58 -0.43     1.64     0.05
## MgrSkill        2.96       4.62       1.67  0.27    -0.53     0.05
## ServQual       57.90     100.00      42.10 -0.66    -0.72     1.46

The mean,median,min and max are the statistical information provided by the summary statistics function and the psych library description.

Question 8

Measuring the Mean and standard deviation of the store profit, management tenure (MTenure), and crew tenure (Ctenure).

library(dplyr)
importantvariables <- cbind.data.frame(Store24$Profit, Store24$MTenure, Store24$CTenure)
mean(importantvariables$`Store24$Profit`)
## [1] 276313.6
mean(importantvariables$`Store24$MTenure`)
## [1] 45.29644
mean(importantvariables$`Store24$CTenure`)
## [1] 13.9315

The means of profit,MTenure and CTenure are 276313.6, 45.29644 and 13.9315 respectively.

#Standard deviation
sd(importantvariables$`Store24$Profit`)
## [1] 89404.08
sd(importantvariables$`Store24$MTenure`)
## [1] 57.67155
sd(importantvariables$`Store24$CTenure`)
## [1] 17.69752

The standard deviation of profit,MTenure and CTenure 89404.08, 57.67155 and 17.69752 respectively.

Question 7 (C)

Sorting and Sub-setting data in R. Printing the (storeID, sales, profit, MTenure and CTenure) of the top 10 most profitable and bottom 10 least profitable stores

profitability <- Store24[order(desc(Store24$Profit)),]

#Selecting the top 10 most profitable stores
top10_profitable <- head(profitability, 10)
subset(top10_profitable, select = c("store","Sales","Profit","MTenure","CTenure"))
## # A tibble: 10 × 5
##    store   Sales Profit MTenure CTenure
##    <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
##  1    74 1782957 518998   171.    29.5 
##  2     7 1809256 476355    62.5    7.33
##  3     9 2113089 474725   109.     6.06
##  4     6 1703140 469050   150.    11.4 
##  5    44 1807740 439781   182.   114.  
##  6     2 1619874 424007    86.2    6.64
##  7    45 1602362 410149    47.6    9.17
##  8    18 1704826 394039   240.    33.8 
##  9    11 1583446 389886    44.8    2.04
## 10    47 1665657 387853    12.8    6.64
#Selecting the bottom 10 least profitable stores

bottom10_profitable <- tail(profitability,10)
subset(bottom10_profitable, select = c("store","Sales","Profit","MTenure","CTenure"))
## # A tibble: 10 × 5
##    store   Sales Profit MTenure CTenure
##    <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
##  1    37 1202917 187765  23.2      1.35
##  2    61  716589 177046  21.8     13.3 
##  3    52 1073008 169201  24.1      3.42
##  4    54  811190 159792   6.67     3.88
##  5    13  857843 152513   0.657    1.58
##  6    32  828918 149033  36.1      6.64
##  7    55  925744 147672   6.67    18.4 
##  8    41  744211 147327  14.9     11.9 
##  9    66  879581 146058 115.       3.88
## 10    57  699306 122180  24.3      2.96

Question 7(d) Correlation between different variables.

#Correlation results and scatter plot of correlation between the profit and CTenure variables 

cor.test(Store24$Profit,Store24$CTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  Store24$Profit and Store24$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03262507 0.45786339
## sample estimates:
##       cor 
## 0.2576789

correlation is a statistical measure of the interdependence of variables in a data set. A correlation coefficient of 0.2576789 indicates that there is a weak positive (negative) linear relationship via a shaky linear rule between the profit and the crew tenure.

#Scatter plot
Store24$Profit <- 0.01*(Store24$Profit)
plot(Store24$Profit,Store24$CTenure, main="Scatter plot for correlation between profits and crew tenure", xlab="profit",ylab = "CTenure")

#Correlation results and scatter plot of correlation between the profit and MTenure variables

cor.test(Store24$Profit,Store24$MTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  Store24$Profit and Store24$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2353497 0.6055175
## sample estimates:
##       cor 
## 0.4388692

correlation is a statistical measure of the interdependence of variables in a data set. A correlation coefficient of 0.4388692 indicate a moderate positive (negative) linear relationship via a fuzzy-firm linear rule. between the profit and the management tenure.

plot(Store24$Profit,Store24$MTenure, main="Scatter plot for correlation between profits and management tenure", xlab="profit",ylab = "MTenure")

Question 7(e) correlation matrix

library(Hmisc)

Correlation matrix between all variables

library(round)
matrix <- cor(as.matrix(Store24))
round(matrix, digits=2)
##            store Sales Profit MTenure CTenure   Pop  Comp Visibility PedCount
## store       1.00 -0.23  -0.20   -0.06    0.02 -0.29  0.03      -0.03    -0.22
## Sales      -0.23  1.00   0.92    0.45    0.25  0.40 -0.24       0.13     0.42
## Profit     -0.20  0.92   1.00    0.44    0.26  0.43 -0.33       0.14     0.45
## MTenure    -0.06  0.45   0.44    1.00    0.24 -0.06  0.18       0.16     0.06
## CTenure     0.02  0.25   0.26    0.24    1.00  0.00 -0.07       0.07    -0.08
## Pop        -0.29  0.40   0.43   -0.06    0.00  1.00 -0.27      -0.05     0.61
## Comp        0.03 -0.24  -0.33    0.18   -0.07 -0.27  1.00       0.03    -0.15
## Visibility -0.03  0.13   0.14    0.16    0.07 -0.05  0.03       1.00    -0.14
## PedCount   -0.22  0.42   0.45    0.06   -0.08  0.61 -0.15      -0.14     1.00
## Res        -0.03 -0.17  -0.16   -0.06   -0.34 -0.24  0.22       0.02    -0.28
## Hours24     0.03  0.06  -0.03   -0.17    0.07 -0.22  0.13       0.05    -0.28
## CrewSkill   0.05  0.16   0.16    0.10    0.26  0.28 -0.04      -0.20     0.21
## MgrSkill   -0.07  0.31   0.32    0.23    0.12  0.08  0.22       0.07     0.09
## ServQual   -0.32  0.39   0.36    0.18    0.08  0.12  0.02       0.21    -0.01
##              Res Hours24 CrewSkill MgrSkill ServQual
## store      -0.03    0.03      0.05    -0.07    -0.32
## Sales      -0.17    0.06      0.16     0.31     0.39
## Profit     -0.16   -0.03      0.16     0.32     0.36
## MTenure    -0.06   -0.17      0.10     0.23     0.18
## CTenure    -0.34    0.07      0.26     0.12     0.08
## Pop        -0.24   -0.22      0.28     0.08     0.12
## Comp        0.22    0.13     -0.04     0.22     0.02
## Visibility  0.02    0.05     -0.20     0.07     0.21
## PedCount   -0.28   -0.28      0.21     0.09    -0.01
## Res         1.00   -0.09     -0.15    -0.03     0.09
## Hours24    -0.09    1.00      0.11    -0.04     0.06
## CrewSkill  -0.15    0.11      1.00    -0.02    -0.03
## MgrSkill   -0.03   -0.04     -0.02     1.00     0.36
## ServQual    0.09    0.06     -0.03     0.36     1.00

Pearson’s correlation test between profit, CTenure and MTenure

part(i)

#Pearson's correlation test between profit and MTenure 
cor.test(Store24$Profit, Store24$MTenure, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  Store24$Profit and Store24$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2353497 0.6055175
## sample estimates:
##       cor 
## 0.4388692

A p-value is a probability that the null hypothesis is true. The p-value of the correlation between the profit and management Tenure is 8.193e-05 which is less than the significance level of 0.05, therefore we reject the null hypothesis and accept the alternative hypothesis since the correlation statistically significant.

part(ii)

#Pearsons correlation test between profit ans CTenure 
cor.test(Store24$Profit, Store24$CTenure, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  Store24$Profit and Store24$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03262507 0.45786339
## sample estimates:
##       cor 
## 0.2576789

A p-value is a probability that the null hypothesis is true. The p-value of the correlation between the profit and crew tenure is 0.02562 which is less than the significance level of 0.05, therefore we reject the null hypothesis and accept the alternative hypothesis since the correlation statistically significant.

Question 7(f)

#regression

regression <- subset(Store24, select = c("Profit","MTenure","CTenure","Comp","Pop","PedCount","Res","Hours24","Visibility"))
Model <- lm(Profit ~., data = regression)
summary(Model)
## 
## Call:
## lm(formula = Profit ~ ., data = regression)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1057.89  -359.46   -70.69   337.80  1123.90 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   76.10041  668.21994   0.114 0.909674    
## MTenure        7.60993    1.27086   5.988 9.72e-08 ***
## CTenure        9.44978    4.21687   2.241 0.028400 *  
## Comp        -252.86887   54.91937  -4.604 1.94e-05 ***
## Pop            0.03667    0.01466   2.501 0.014890 *  
## PedCount     340.87359   90.73196   3.757 0.000366 ***
## Res          915.84675  392.31283   2.334 0.022623 *  
## Hours24      632.33307  196.41114   3.219 0.001994 ** 
## Visibility   126.25447   90.87620   1.389 0.169411    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 569.7 on 66 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.594 
## F-statistic: 14.53 on 8 and 66 DF,  p-value: 5.382e-12

part(i)

Explanatory variables whose coefficients are statistically significant are; Management tenure, Crew tenure, Comp, Pop, PedCount, Res,and Hours24.

part(ii)

Explanatory variables whose coefficients are not statistically significant are: Visibility and profit

part(iii)

From the correlations results there is a positive relationship between the profit, management tenure and the crew tenure variables.Therefore, An increase in a months experience would result to an equal proportionate increase in the profits.

Question 8

passadieci <- replicate(100000, my.dice.sum(n.dice = 3, n.sides = 6))
sum(passadieci >= 11)/length(passadieci)
## [1] 0.49951
  1. There is a 50.227% probability of winning the game
sum(passadieci == 11)/length(passadieci)
## [1] 0.1243
sum(passadieci == 12)/length(passadieci)
## [1] 0.11527
  1. An 11 has a greater likelihood with 12.372% to a 11.528% likelihood of a 12.
sum((passadieci <= 7) | (passadieci >= 15))/length(passadieci)
## [1] 0.25479
  1. There is a 25.529% likelihood of getting a sum greater than 7 or no less than q5 when throwing three dice

  2. plot of the simulated probabilities of each possible sum when throwing three fair,six-sided dice.

plot(table(passadieci)/length(passadieci), xlab = 'Sum', 
     ylab = 'Relative Frequency', main = 'Passadieci Simulation: 100000 Throws')