Here our objective in this case is to use these data to determine the likelihood of fraud. Is it likely that a random sample of 253 items selected from the population of 3,005 items would yield a mean GPF of at least 50.8%? or, is it likely that two independent, random samples of size 134 and 119 would yield a mean GPF’s of at least 50.6% and 51.0%, respectively? To find out the information corresponding to the objective we will perform some statistical hypothesis testing to check the validity of so called fraud.
Let us first highlight the problem on which we want to discuss further as follows,
A wholesale furniture retailer stores in-stock items at a large warehouse located in Tampa, Florida. Early in the year, a fire destroyed the warehouse and all the furniture in it. After determining the fire was an accident, the retailer sought to recover costs by submitting a claim to its insurance company. As is typical in a fire insurance policy of this type, the furniture retailer must provide the insurance company with an estimate of “lost” profit for the destroyed items. Retailers calculate profit margin in percentage form using the Gross Profit Factor (GPF). By definition, the GPF for a single sold item is the ratio of the profit to the item´s selling price measured as a percentage, that is:
\[Item\ GPF = \frac{Profit}{Sales price} \times 100%\]
Of interest to both the retailer and the insurance company is the average GPF for all of the items in the warehouse. Because these furniture pieces were all destroyed, their eventual selling prices and profit values are obviously unknown. Consequently, the average GPF for all the warehouse items is unknown. One way to estimate the mean GPF of the destroyed items is to use the mean GPF of similar, recently sold items. The retailer sold 3,005 furniture items in the year prior to the fire and kept paper invoices on all sales. Rather tan calculate the mean GPF for all 3,005 items (the data were not computerized), the retailer sampled a total of 253 of the invoices and computed the mean GPF for these items. The 253 items were obtained by first selecting a sample of 134 items and then augmenting this sample with a second sample of 119 items. The mean GPF s for the two subsamples were calculated to be 50.6% and 51.0%, respectively, yielding an overall average GPF of 50.8%. This average GPF can be applied to the costs of the furniture items destroyed in the fire to obtain an estimate of the “lost” profit. According to experienced claims adjusters at the insurance company, the GPF for sale items of the type destroyed in the fire rarely exceeds 48%. Consequently, the estimate of 50.8% appeared to be unusually high. (A 1% increase in GPF for items of this type equates to, approximately, an additional $16,000 in profit.) When the insurance company questioned the retailer on this issue, the retailer responded, “Our estimate was based on selecting two independent, random samples from the population of 3,005 invoices. Because the samples were selected randomly and the total sample size is large, the mean GPF estimate of 50.8% is valid…” A dispute arose between the furniture retailer and the insurance company, and a lawsuit was filed. In one portion of the suit, the insurance company accused the retailer of fraudulently representing their sampling methodology. Rather than selecting the samples randomly, the retailer was accused of selecting an unusual number of “high profit” items from the population in order to increase the average GPF of the overall sample. To support their claim of fraud, the insurance company hired a CPA firm to independently assess the retailer’s Gross Profit Factor. Through the discovery process, the CPA firm legally obtained the paper invoices for the entire population of 3,005 items sold and input the information into a computer.
Here we will use t-test for the hypothesis testing. The t-test is defined is as follows,
The assumptions for t-test are as follows,
The formula for computing the t-value is:
\(\begin{aligned}&\text{T-value} = \frac{ mean - \mu_0 }{\sqrt{ \frac{1}{n_1 - 1} var } } \sim t_{n_1 - 1}, \text{Under}\ H_0 \\&\textbf{where:}\\&mean_1 = \text{Average values of the sample}\\&var_1 = \text{Variance of the sample}\\&n_1 = \text{Number of records in the sample set} \end{aligned}\)
Degrees of Freedom \(= n_1- 1\), where: \(n_1\) is the number of records sample set
Let us first read the data as follows,
FIRE = read.csv("C:/Users/Lenovo/OneDrive/Desktop/FIRE.csv",
header = TRUE)
head(FIRE)
Now drawing the random sample of size 253 out of 3005 furniture as follows,
set.seed(1997)
rand = sample(nrow(FIRE), 253)
df_main = FIRE[rand,]
head(df_main)
Now calculating the GPF using the following formula,
\[Item\ GPF = \frac{Profit}{Sales price} \times 100%\]
Now the highlight of data with GPF is as follows,
gpf = (df_main$Profit/df_main$Sales)*100
df_main = data.frame(cbind(df_main,gpf))
head(df_main)
Now let us first compute the average GPF for the 253 sample observations as follows,
# overall mean GPF
mean(df_main$gpf)
## [1] 49.19778
So, the mean GPF is \(49.19778\).
Now let us compute the average GPF for the 134 sample observations as follows,
# mean GPF for 134 people
mean(df_main$gpf[1:134])
## [1] 49.76665
SO, the mean GPF is \(49.76665\).
Now let us compute the average GPF for the 119 sample observations as follows,
# mean GPF for 119 people
mean(df_main$gpf[135:253])
## [1] 48.55721
So, the mean GPF is \(48.55721\).
Now we want to test whether the mean GPF of at least 50.8% for 253 items as follows,
# t test for mean GPF of at least 50.8%?
t.test(df_main$gpf, alternative = "greater",mu = 50.8)
##
## One Sample t-test
##
## data: df_main$gpf
## t = -1.9938, df = 252, p-value = 0.9764
## alternative hypothesis: true mean is greater than 50.8
## 95 percent confidence interval:
## 47.87113 Inf
## sample estimates:
## mean of x
## 49.19778
Here observe that for \(H_1: true\ mean\ is\ greater\ than\ 50.8\), p-value \(0.9764 > 0.05\), so we accept null hypothesis at level \(0.05\). So, mean GPF is not more than 50.8%.
Now we want to test whether the mean GPF of at least 50.6% for \(134\) items as follows,
# t test for mean GPF of at least 50.6%?
t.test(df_main$gpf[1:134], alternative = "greater",mu = 50.6)
##
## One Sample t-test
##
## data: df_main$gpf[1:134]
## t = -0.79338, df = 133, p-value = 0.7855
## alternative hypothesis: true mean is greater than 50.6
## 95 percent confidence interval:
## 48.0268 Inf
## sample estimates:
## mean of x
## 49.76665
Here observe that for \(H_1: true\ mean\ is\ greater\ than\ 50.8\), p-value \(0.7855 > 0.05\), so we accept null hypothesis at level \(0.05\). So, mean GPF is not more than 50.6%.
Now we want to test whether the mean GPF of at least 51.0% for \(119\) items as follows,
# t test for mean GPF of at least 51.0%?
t.test(df_main$gpf[135:253], alternative = "greater",mu = 51)
##
## One Sample t-test
##
## data: df_main$gpf[135:253]
## t = -1.9781, df = 118, p-value = 0.9749
## alternative hypothesis: true mean is greater than 51
## 95 percent confidence interval:
## 46.50986 Inf
## sample estimates:
## mean of x
## 48.55721
Here observe that for \(H_1: true\ mean\ is\ greater\ than\ 50.8\), p-value \(0.9749 > 0.05\), so we accept null hypothesis at level \(0.05\). So, mean GPF is not more than 51.0%.
Here the conclusions are,
So, the insurance company what suspected is correct i.e. the wholesale retailer was doing fraud.