MATHEMATICAL STATISTICS ASSIGNMENT

When choosing a car, its real value is not solely determined by the initial purchase cost. Instead, the best value is often found in vehicles that are both reliable and economical to own. However, reliability and affordability alone are not sufficient—a car must also perform well. To assess value comprehensively, Consumer Reports developed the Value Score, a metric based on three key factors: 5-year ownership cost, predicted reliability, and road-test performance. A Value Score of 1.0 represents an average-value car, while scores above or below 1.0 indicate cars that are better or worse values relative to the average.

Let’s begin with importing the dataset.

library(readxl)
CR <- read_excel("C:/Users/hp/Downloads/CarValues - Group 4.xlsx",
sheet = "Data")

Question 1

## 
##  Family Sedan   Small Sedan Upscale Sedan 
##            20            13            21

The data contains the respective values from three different types of cars: 13 Small Sedans, 20 Family Sedans, and 21 Upscale Sedans.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   16419   21923   28918   28340   34102   39850

mean(CR$`Price ($)`[CR$Size == "Small Sedan"])

## [1] 19406

mean(CR$`Price ($)`[CR$Size == "Family Sedan"])

## [1] 26886.2

mean(CR$`Price ($)`[CR$Size == "Upscale Sedan"])

## [1] 35255.86

Car prices show significant variability, with the mean price differing notably between car sizes.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4400  0.5700  0.6700  0.6567  0.7475  0.8300

Operating costs (per mile) differ greatly between cars. This highlights the importance of considering not just the purchase price but also the long-term ownership costs.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   52.00   73.00   78.00   78.07   84.00   95.00

Cars with higher road-test scores (on a 100-point scale) tend to be more expensive, but reliability should also be considered to determine the true value.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   3.000   3.407   4.000   5.000

## 
##  1  2  3  4  5 
##  2  7 19 19  7

##    
##     Family Sedan Small Sedan Upscale Sedan
##   1            0           2             0
##   2            0           3             4
##   3            7           3             9
##   4           11           3             5
##   5            2           2             3

Most cars are rated “Good” or “Very Good”.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.820   1.173   1.335   1.354   1.502   1.990

The distribution of value scores suggests that most cars fall into the “average” or lower-value range. Few cars achieve exceptional value scores.

Most cars are rated “Good” or “Very Good”, but a noticeable number fall into “Fair” or below, indicating reliability is a critical factor to examine. Reliability by Car Size:

Family Sedans show the highest reliability ratings, especially in the “Very Good” and “Excellent” categories, justifying their higher price. Small Sedans have a balanced distribution of reliability and are suitable for buyers seeking affordability with decent performance. Upscale Sedans, despite their higher price, sometimes only achieve “Fair” reliability, suggesting that a high price does not always equate to superior quality.

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

##           
## Groups      1  2  3  4  5 Sum
##   (50,60]   0  0  1  0  1   2
##   (60,70]   2  3  2  1  1   9
##   (70,80]   0  3  7  9  1  20
##   (80,90]   0  1  8  6  3  18
##   (90,100]  0  0  1  3  1   5
##   Sum       2  7 19 19  7  54

##                
##                    1    2    3    4    5
##   Family Sedan   0.0  0.0 35.0 55.0 10.0
##   Small Sedan   15.4 23.1 23.1 23.1 15.4
##   Upscale Sedan  0.0 19.0 42.9 23.8 14.3

Question 2

a, First, import Prices of Small Sedans and Family Sedans

price_small <- CR$`Price ($)`[CR$Size == 'Small Sedan']
price_fam <- CR$`Price ($)`[CR$Size == 'Family Sedan']

These two are independent samples, assume they have equal variances. Test the hypothesis by using F-test.

var.test(price_fam,price_small,ratio = 1, alternative = 'two.sided')

## 
##  F test to compare two variances
## 
## data:  price_fam and price_small
## F = 2.4343, num df = 19, denom df = 12, p-value = 0.1181
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.7879182 6.6203507
## sample estimates:
## ratio of variances 
##           2.434334

Since the P-value < 0.05, it could be concluded that the two variances are unequal. Now, test the differences of two means by using T-test, with H0: Mean of Family and Small Sedan are the same, otherwise Ha.

t.test(price_fam,price_small,alternative = "two.sided",var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  price_fam and price_small
## t = 7.7605, df = 31, p-value = 9.348e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  5514.346 9446.054
## sample estimates:
## mean of x mean of y 
##   26886.2   19406.0

P-value is less than 0.05; therefore, the two means are unequal.

b, Similarly, import the Road-test Scores of Cars which has Good (3) and Very Good (4) ratings. Using F-test to test the difference between the variability of the two samples, with H0: = 1, Ha: .

rts_good <- CR$`Road-Test Score`[CR$`Predicted Reliability` == 3]
rts_verygood <- CR$`Road-Test Score`[CR$`Predicted Reliability` == 4]
var.test(rts_good,rts_verygood,ratio = 1, alternative = 'two.sided')

## 
##  F test to compare two variances
## 
## data:  rts_good and rts_verygood
## F = 1.6946, num df = 18, denom df = 18, p-value = 0.2725
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6528633 4.3984049
## sample estimates:
## ratio of variances 
##           1.694567

Since the P-value > 0.05, there is not enough evidence to conclude that they have unequal variances.

Question 3

ci.mean <- function(sample,alpha){
n <- length(sample)
ME <- qt(1-alpha/2,n-1)*sd(sample)/sqrt(n)
ll <- mean(sample) - ME
ul <- mean(sample) + ME
c(ll, ul)
}

Apply the function with sample is the Prices of Family Sedans, alpha = 1 - 0.95 = 0.05.

ci.mean(price_fam, 0.05)

## [1] 25306.99 28465.41

The 95% confidence interval of the Mean Price of Small Sedans is (25306.99, 28465.41)

MATHEMATICAL STATISTICS ASSIGNMENT

GROUP 4

Question 1

Question 2

Question 3