Data Analytics Individual Exercises

Chapter 7: Statistical Inference

Exercise 13: hypothesis testing (Pg. 255)

The Excel file Sales Data provides data on a sample of customers. An industry trade publication stated that the average profit per customer for this industry was at least $4,500. Using a test of hypothesis, do the data support this claim or not?

data <- read.csv("./data/sales_data.csv")
data

summary(data)

##     Customer     Percent.Gross.Profit  Gross.Sales      Gross.Profit    
##  Min.   : 1.00   Min.   :0.0300       Min.   :   170   Min.   :   40.6  
##  1st Qu.:15.75   1st Qu.:0.1400       1st Qu.:  2646   1st Qu.:  435.1  
##  Median :30.50   Median :0.2000       Median :  6760   Median : 1662.4  
##  Mean   :30.50   Mean   :0.2119       Mean   : 25016   Mean   : 4239.2  
##  3rd Qu.:45.25   3rd Qu.:0.2450       3rd Qu.: 32171   3rd Qu.: 5690.4  
##  Max.   :60.00   Max.   :0.6000       Max.   :179101   Max.   :25379.3  
##  Industry.Code   Competitive.Rating
##  Min.   :1.000   Min.   :1         
##  1st Qu.:3.000   1st Qu.:2         
##  Median :5.000   Median :3         
##  Mean   :4.483   Mean   :3         
##  3rd Qu.:6.000   3rd Qu.:4         
##  Max.   :7.000   Max.   :5

H0: Mean(Profit) >= 4500 H1: Mean(Profit) < 4500

m <- 4500
p <- 4239.2
n <- nrow(data)
s2 <- var(data$Gross.Sales)
z <- (p-m)/sqrt(s2/n)
z

## [1] -0.05558322

z = 0.019

In this case, the sample proportion of 4239.2 is 0.056 standard error below the hypothesized value of 4500. Because this is an lower-tailed test, we reject H0 if the value of the test statistic is smaller than the critical value. In other words, if the significance is from 1.9% and above, the data does not support the hypothesis. Otherwise, the data supports the claim if the significance is smaller than 1.9%

Data Analytics Individual Exercises - Linh Vu

Chapter 7: Statistical Inference