Final Exam STA6507

Problem A

An engineer is investigating two different types of metering devices for an electronic fuel injection system to determine whether they differ in their fuel mileage performance. The system is installed on 6 different cars, and a test is run with each metering device on each car. The observed fuel mileage performance data are shown in the table below.

##### Problem A #####
Dev1 = c(17.6, 19.4, 19.5, 17.1, 15.3, 15.9)
Dev2 = c(16.8, 20.0, 18.2, 16.4, 16.0, 15.4)

For the both the Sign Test and the Wilcoxon Signed Ranks Test I am testing if:

\(H_o:\) Median fuel mileage performance for Device 1 = Device 2

\(H_a:\) Median fuel mileage performance for Device 1 \(\neq\) Device 2

I chose to do a 2 tailed test because the question asks if the median is the same for the 2 devices. It does speculate a direction. The question did not imply if one device is better than the other. I will use alpha = 0.05.

# Sign Test 
Dev1-Dev2

## [1]  0.8 -0.6  1.3  0.7 -0.7  0.5

There are 4 positives and 2 negatives. I then used the binomial test to test my hypothesis.

##### Need to add test statistic #####
# 2 tailed test since the question asked if the median for devices is the same.
# The question did not state or say if one device is better than the other.
binom.test(4, n=6, p=0.5, alternative = "t")

## 
##  Exact binomial test
## 
## data:  4 and 6
## number of successes = 4, number of trials = 6, p-value = 0.6875
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.2227781 0.9567281
## sample estimates:
## probability of success 
##              0.6666667

# p-value = 0.6875 fail to reject

The p-value is 0.6875 > 0.05 so I fail to reject the null hypothesis. I have evidence in support of the null hypothesis and the median for Device 1 and Device 2 could be the same. There is not a significant difference in the medians.

# Wilcoxon Signed Rank Test
Devtest=wilcox.test(Dev1,Dev2,exact = F,correct = T,paired = T,alternative = "t")
Devtest

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  Dev1 and Dev2
## V = 16, p-value = 0.2945
## alternative hypothesis: true location shift is not equal to 0

# p-value = 0.2945 fail to reject

With the Wilcoxon Signed Rank Test, I tested the same hypothesis stated above:

\(H_o:\) Median fuel mileage performance for Device 1 = Device 2

\(H_a:\) Median fuel mileage performance for Device 1 \(\neq\) Device 2

The test statistic is V = 16 and p-value = 0.2945. The p-value is 0.2945 > 0.05 so I will still fail to reject the null hypothesis. I have evidence in support of the null hypothesis and the median for Device 1 and Device 2 could be the same. There is not a significant difference in the medians.

Problem B

We are interested in the effectiveness of three different car waxes, A, B, and C. The data below present the number of days of protection. Is there a difference in effectiveness of the three different brands of car wax? Consider 5% is the level of significance.

For the both the Median Test and Kruskal-Wallis Test I am testing if:

\(H_o:\) All of the car waxes A, B, and C have the same effectiveness.

\(H_a:\) At least 2 of the car waxes A, B, and C do not have the same effectiveness.

I will use the suggested alpha = 0.05.

# all Data in 1 vector. Then use gl to assign 1, 2, 3 for the levels.
# v = values g = groups
v=c(44, 45, 46, 48, 40, 42, 51, 55, 50, 53, 58, 59)
g=gl(n=3, k=4)

# Need package for Median Test
library(agricolae)
Median.test(v,g) # Reject, Chi Square = 8   DF = 2   p-value 0.01831564 Median = 49

## 
## The Median Test for v ~ g 
## 
## Chi Square = 8   DF = 2   P.Value 0.01831564
## Median = 49 
## 
##   Median r Min Max   Q25   Q75
## 1   45.5 4  44  48 44.75 46.50
## 2   46.5 4  40  55 41.50 52.00
## 3   55.5 4  50  59 52.25 58.25
## 
## Post Hoc Analysis
## 
## Groups according to probability of treatment differences and alpha level.
## 
## Treatments with the same letter are not significantly different.
## 
##      v groups
## 3 55.5      a
## 2 46.5     ab
## 1 45.5      b

kruskal.test(v,g) # FTR, chi-squared = 4.9615, df = 2, p-value = 0.08368

## 
##  Kruskal-Wallis rank sum test
## 
## data:  v and g
## Kruskal-Wallis chi-squared = 4.9615, df = 2, p-value = 0.08368

With alpha of 0.05 we reject the null and the other test we fail to reject.

In the Median Test I have a test statistic = 8, DF = 2, and p-value 0.01831564 which is < 0.05. Based on the Median Test there is a significant difference in the effectiveness of the car wax brands A, B, and C.

With the Kruskal-Wallis Test I have a test statistic of chi-squared = 4.9615, df = 2, p-value = 0.08368 which is > 0.05 so I would fail to reject. Based on the Median Test there is not a significant difference in the effectiveness of the car wax brands A, B, and C. However, if the alpha had been 0.1, I would have rejected for both tests.

Based on the multiple comparison analysis groups 3 and 2 are not significantly different and groups 1 and 2 are not significantly different. However, groups 1 and 3 are significantly different.

Problem C

A manager of a Store believes that consumers from out of state (those with out-of-state license plates on their cars) are more likely to use credit cards for their purchase. He keeps a tally on all of his customers one afternoon. Is the manager correct in his thinking? Run a test both using R and calculations using the normal approximation.

\(H_o:\) Out of State Credit Card usage = In State Credit Card Usage.

\(H_a:\) Out of State Credit Card usage > In State Credit Card Usage.

I will use alpha = 0.05.

I checked by hand that all expected are > 5. Smallest is 20*24/48 = 10

CC = as.table(rbind(c(14, 10), c(6, 18)))

chisq.test(CC)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  CC
## X-squared = 4.2, df = 1, p-value = 0.04042

With the correction term on using a \(\chi^2\) Test, results in the test statistic = 5.4857, df = 1, p-value = 0.01917. I reject the null hypothesis, there is evidence in favor of the alternative hypothesis. Out of state shoppers are more likely to use a credit card rather than cash compared to in state shoppers.

I turned off the correction term and ran the test again to find the test statistic with hand calculations. Below are the 2 ways that I confirmed the hand calculations of the \(\chi^2\) test statistic = 2.34216.

# Test Statstic: Turn off correction term
chisq.test(CC, correct = F)

## 
##  Pearson's Chi-squared test
## 
## data:  CC
## X-squared = 5.4857, df = 1, p-value = 0.01917

sqrt(5.4857)

## [1] 2.342157

# to confirm:
A = sqrt(48)
B = (14*18)-(6*10)
C = sqrt(20*28*24*24)

A*B/C

## [1] 2.34216

Also need to do part c with normal calc.

Problem D

I looked back at the practice midterm problems to help me solve this problem. This number seems too low, but I was not sure how to solve this problem so I completed it last.

# Tolerance Limits
library(tolerance)

distfree.est(alpha = 0.01, P = 0.80, side = 2) # 31 samples

##      0.8
## 0.01  31

Problem E

The pain threshold values for 5 males and 5 females was recorded. Test the hypothesis of equal means. Use the Mann-Whitney test. The Mann-Whitney Test is the same as the Wilcoxon Sum Rank Test for not paired data. This data was collected from 2 independent samples. I am testing:

\(H_o:\) Male pain threshold = Female pain threshold.

\(H_a:\) Male pain threshold \(\neq\) Female pain threshold.

I chose to do a 2 tailed test because the question asks if the mean is equal for the 2 groups. It does speculate a direction. The question did not imply if male or female had a higher or lower pain threshold. I will use alpha = 0.05.

# Wilcoxon Rank Test
# Not Paired sample.  Is there a difference, so 2 tail test. MHR=Men Heart Rate
Male_Pain=c(9, 8.1, 7, 7.7, 7.9)
Female_Pain=c(6.8, 8, 7.5, 8.5, 6.9)

Paintest=wilcox.test(Male_Pain, Female_Pain, exact = F,
                   correct = T,
                   paired = F,
                   alternative = "two.sided")

Paintest # W = 17, p-value = 0.4034, FTR

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Male_Pain and Female_Pain
## W = 17, p-value = 0.4034
## alternative hypothesis: true location shift is not equal to 0

The test statistic is W = 17 and p-value is 0.4034 > 0.05 so I fail to reject the null hypothesis. I have evidence in support of the null hypothesis and the mean pain threshold for males and females could be the same. There is not a significant difference in the means.

Problem F

For this problem I was able to load the package and data. I used the previous bootstrap function that we used in class on 09-15. I then put the data into a data frame. (I could not get the data to go into my function for some reason.)

Next I used the bootstrapping function to run 500 replications. I got a weird error here, but R gave me the option to stop the error and continue so I did. Then I looked at my results to make sure that I had 500 replications of the confidence intervals.

library(carData)
View(Robey)
library(boot)

# Create a function f in R to compute the observed statistic
myf = function(DATA,i){
  d = DATA[i,]
  stat= mean(d)
  return(stat)
}

x = data.frame(Robey$contraceptors)
y = data.frame(Robey$tfr)

mydata = data.frame(y,x)
mydata

##    Robey.tfr Robey.contraceptors
## 1        4.8                  35
## 2        6.5                   9
## 3        5.9                  16
## 4        6.1                  13
## 5        6.5                  27
## 6        6.4                   6
## 7        6.8                   5
## 8        2.2                  75
## 9        7.3                   4
## 10       5.7                   6
## 11       6.4                  12
## 12       4.8                   9
## 13       5.0                  21
## 14       6.1                  10
## 15       6.1                  12
## 16       7.2                   5
## 17       6.3                  15
## 18       5.3                  45
## 19       5.5                  40
## 20       2.5                  72
## 21       4.3                  45
## 22       3.0                  50
## 23       1.7                  77
## 24       5.2                  12
## 25       4.3                  34
## 26       2.7                  62
## 27       2.3                  68
## 28       3.9                  53
## 29       4.5                  47
## 30       4.9                  32
## 31       3.6                  66
## 32       2.8                  66
## 33       3.6                  70
## 34       3.3                  56
## 35       3.8                  53
## 36       4.6                  47
## 37       5.6                  23
## 38       6.0                  10
## 39       2.9                  55
## 40       4.0                  55
## 41       4.0                  58
## 42       4.6                  48
## 43       3.5                  59
## 44       3.1                  54
## 45       4.6                  40
## 46       5.5                  35
## 47       4.0                  42
## 48       4.3                  51
## 49       3.4                  60
## 50       7.0                   7

# bootstrapping with R replications 
results_x = boot(x, myf, R = 500)
results_x

## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = x, statistic = myf, R = 500)
## 
## 
## Bootstrap Statistics :
##     original   bias    std. error
## t1*    37.44 -0.21828    3.237652

results_y = boot(y, myf, R = 500)
results_y

## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = y, statistic = myf, R = 500)
## 
## 
## Bootstrap Statistics :
##     original    bias    std. error
## t1*    4.688 -0.005064   0.2099776

# get confidence interval 
boot.ci(results_x, conf = 0.95, type=c("norm","basic","perc"))

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 500 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results_x, conf = 0.95, type = c("norm", "basic", 
##     "perc"))
## 
## Intervals : 
## Level      Normal              Basic              Percentile     
## 95%   (31.31, 44.00 )   (30.69, 43.99 )   (30.89, 44.19 )  
## Calculations and Intervals on Original Scale

hist(results_x$t, main = "Histogram of Bootstrap CIs with 500 Replications", 
     xlab = "Mean of CIs of Contraceptors")

boot.ci(results_y, conf = 0.95, type=c("norm","basic","perc"))

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 500 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results_y, conf = 0.95, type = c("norm", "basic", 
##     "perc"))
## 
## Intervals : 
## Level      Normal              Basic              Percentile     
## 95%   ( 4.282,  5.105 )   ( 4.304,  5.139 )   ( 4.237,  5.072 )  
## Calculations and Intervals on Original Scale

hist(results_y$t, main = "Histogram of Bootstrap CIs with 500 Replications", 
     xlab = "Mean of CIs of tfr")

lm(Robey$tfr ~ Robey$contraceptors)

## 
## Call:
## lm(formula = Robey$tfr ~ Robey$contraceptors)
## 
## Coefficients:
##         (Intercept)  Robey$contraceptors  
##             6.87509             -0.05842

At this point I have a headache and I am not really sure how to compare. I have spent more than 45 min on this one problem, and I kept getting a weird error. I also did not study this part of our notes well because I thought you said in one of our last lectures that this would not be on the exam. I am not trying to make excuses, I just wanted to let you know why this is not my best work, and I’m sorry if I let you down. I know I let myself down. I was hoping to do well on this exam and maintain my 4.0 GPA. This will be the first class since I have started on my Master’s that will score lower than an A.