GROUP MEMBERS

-Duong Nguyen Anh Tuan (leader) 10622062

-Nguyễn Thành Tài 10622036

-Huỳnh Quang Khải 10622017

-Đoàn Trần Bách Việt 10322031

-Lý Lê Phương Dung 10622054

-Trần Minh Quân 10622033

I. INTRODUCTION

1. Motivation

If you want to buy a car, you are wondering what car model to choose, where it is manufactured, and which brand. You don’t know what factors significantly affect the price of cars. In this report, our team will use analytical methods to provide a detailed view of the factors affecting price. Each car model has a different price, and within each car model, there are other factors such as seats, origin or year of manufacture that affect the price of that car. This report will help readers have an easy-to-understand view and understand each data of the dataset.

2. Data analysis

This dataset we get from Kaggel.com. There are 8 columns and 200 rows. In the original data, the author used factors including: origin, model, year of manufacture and specific price of each data. Therefore, we also use those factors to analyze car prices using our reduced dataset.

  • Figure1: Correlation between seat and origin
pacman::p_load(tidyverse, ggplot2)
library(readxl)
dataset <- read_excel("dataset.xlsx")
ggplot(data = dataset) +
   geom_bar(mapping = aes(x = SEAT, fill=ORIGIN),width=0.8)

**** First, the number of seats in two different car types—cars that are built domestically and cars that are imported—is displayed in this bar chart. Overall, the distinctions between the various seat kinds are rather obvious. When examining the specifics, five-seat vehicles make up the majority and the disparity is greater for automobiles that are built domestically. On the other hand, no domestic nor foreign automakers make cars with nine to fifteen seats.

  • Figure 2: Correlation between model and year of manufacture
ggplot(data = dataset) +
   geom_bar(mapping = aes(x = YEAR, fill = MODEL))

**** Next, the year that each of the following car types—Coupe, Hatchback, Pickup truck, SUV, Truck, Van/Minivan, and Wagon—was manufactured is displayed in the second column chart. Based on the details, the majority of car models are produced in 2022 and 2023. Furthermore, the earlier production years were 2017, 2021, 2022, and 2023 for the Truck and SUV lines, respectively.

  • Figure 3: Correlation between seat and transmission
ggplot(data = dataset) +
   geom_bar(mapping = aes(x = SEAT, fill = TRANSMISSION))

Based on the number of seats, the number of Crossover cars produced with automatic and manual transmissions is displayed in the fourth column chart. Examining the specifics, data indicates that Crossover cars with five or seven seats are primarily built with automatic transmissions. On the other hand, manual transmission cars are made in small batches and range in number from 5 to 16 seats.

  • Figure 4:
table(dataset$origin)
## Warning: Unknown or uninitialised column: `origin`.
## < table of extent 0 >
 soluong <- c(114,83)
 tinhtrang <- c("Domestic assembly","Imported" )
 phantram <- round(soluong/ sum(soluong)*100,2)
 tinhtrang <- paste(tinhtrang, phantram)
 tinhtrang <- paste(tinhtrang, "%", sep="" )
 pie(soluong, labels = tinhtrang, col=c("green","blue"), main= "Percentage distribution of vehicle origin")

****The percentage distribution of vehicle origin can be broken down into domestic assembly and imported vehicles. As of the provided figures, approximately 57.87% of vehicles are domestically assembled, while around 42.13% are imported. Domestic assembly refers to vehicles manufactured within the country, often by domestic automakers or international companies with manufacturing plants in that country. On the other hand, imported vehicles are those produced outside the country’s borders and then imported for sale. This distribution highlights a significant portion of vehicles being assembled domestically, indicating a substantial production presence within the country, while also acknowledging the popularity and availability of imported vehicles in the market.

  • Figure 5:
table(dataset$model)
## Warning: Unknown or uninitialised column: `model`.
## < table of extent 0 >
 soluong <- c(1,24,16,20,35,72,18,10,1)
 tinhtrang <- c("Coupe","Crossover","Hatchback","Pickuptruck","Sedan","SUV","Truck","Van/Minivan","Wagon" )
 phantram <- round(soluong/ sum(soluong)*100,2)
 tinhtrang <- paste(tinhtrang, phantram)
 tinhtrang <- paste(tinhtrang, "%", sep="" )
 pie(soluong, labels = tinhtrang, col=c("red","blue","gray","pink","purple","yellow","green","brown"), main= "Percentage distribution of vehicle model")

The percentage of different car kinds in the dataset is displayed in the second pie chart. In general, SUVs make up the largest proportion (36.5%), while Coupe and Wagon make up the lowest percentages (0.51%) of all car models.

  • Figure 6:
table(dataset$year)
## Warning: Unknown or uninitialised column: `year`.
## < table of extent 0 >
 soluong <- c(2,3,87,105)
 tinhtrang <- c("2017","2021","2022","2023" )
 phantram <- round(soluong/ sum(soluong)*100,2)
 tinhtrang <- paste(tinhtrang, phantram)
 tinhtrang <- paste(tinhtrang, "%", sep="" )
 pie(soluong, labels = tinhtrang, col=c("green","blue","gray","yellow"), main= "Percentage distribution of year of manufacture")

The production years as a percentage based on the data set, which includes 2017, 2021, 2022, and 2023, are then shown in the third pie chart. Examining the specifics, automobiles manufactured in 2023 comprise the proportion 53.3% is the greatest percentage, and 44.16% is a little lower in 2022. In contrast, 2017 had the lowest rate (1.02%).

  • Figure 7: Relationship between car model and price
boxplot(log(dataset$PRICE)~dataset$MODEL,col = "pink", xlab="model", ylab="price")

****According to the statistics, coupes at 20.5, hatchbacks and sedans at 20.1 each, SUVs at 20.8, trucks at 19.8, and wagons at 20.6. These figures outline an estimated price range for each type of vehicle, suggesting that trucks tend to be priced slightly lower compared to other categories, while SUVs have a slightly higher price point. SUVs generally tend to have higher price tags compared to other vehicle types.

  • Figure 8: Relationship between year of manufacture and price
boxplot(log(dataset$PRICE)~dataset$YEAR, col="red",xlab="year", ylab="price")

****The price of cars in Vietnam has changed markedly in recent years. However, in 2021, the price of cars will decrease. According to experts, the covid epidemic situation has caused Vietnam’s economy to decline rapidly. That causes businesses to be affected and forced to reduce prices to get customers to buy cars from brands. In 2022 and 2023, the economic situation will be more stable than in 2021, so car prices will increase again. Especially due to this crisis, car prices will be lower than in 2017. According to experts, after 2023, Vietnam’s economic situation will develop again. That could cause car prices to rise again or higher in the following years.

Figure 9: relationship between number of seat and price

plot(log(PRICE) ~ SEAT, data = dataset, col="purple", ylab="price")

The relationship between the number of seats and the price of a car can vary based on several factors. Generally, larger vehicles with more seating capacity tend to have higher prices due to increased manufacturing costs, larger size, and potentially more features. However, within a specific category, other factors like brand, model, trim level, technology, and additional amenities also influence the price. Additionally, certain high-end or luxury vehicles might have fewer seats but come with a higher price due to their advanced features and exclusivity. Ultimately, the correlation between the number of seats and the price can vary significantly depending on the specific car and its market segment.

  • Figure 10: relationship between transmission and price
boxplot(log(dataset$PRICE)~dataset$TRANSMISSION, col="pink",xlab="transmission", ylab="price")

****The average price of a manual transmission car is 19.8 and the average price of a car with an automatic transmission is only slightly more than a manual transmission car at 20.5. According to the data in the figure, automatic cars have a higher average price because the machines have a more modern design than manual cars. That’s why when automatic cars are sold on the market, they have a higher price. On the contrary, manual transmission cars can save a lot of fuel, so many people still choose them for transportation, especially in the service industry. In addition, for manual transmission vehicles, the driver is the one who decides when to change gears to suit the type of road, speed, and driving purpose, not dependent on pre-programmed algorithms like an automatic transmission car.

  • Figure 11: relationship between origin and price
boxplot(log(dataset$PRICE)~dataset$ORIGIN, col="yellow",xlab="origin", ylab="price")

****The average price of domestically assembled cars is about 23, while the price of imported cars has an average price of 22. According to my analysis, currently, domestically assembled cars have the selling price is higher than imported cars partly because the cost of producing cars in Vietnam is about 20% higher than in foreign countries, according to data from industry experts. The current Vietnamese automobile market is still small, with up to a few dozen car brands and each unit has dozens of models. Therefore, each car line sold has a limited quantity, leading to increased production costs. However, imported cars need to have quite high taxes when returning to the country including transportation. That is a reason why the price of imported cars is slightly lower than the price of domestic cars.

II. FURTHER ANALYSIS

1. General

We will now apply One way Anova and Tukey method to our data set with the aim of finding out whether factors such as seat, place of assembly, year of manufacture or model affect price.

A. Car’s model

  • We look at whether car models affect prices

* One way Anova and Tukey

  • ANOVA, which stands for Analysis of Variance, is a statiscal test used to analyze the difference between the means of more than two groups. ANOVA tells you if the dependent variable changes according to the level of the independent variable. The null hypothesis (H0) of ANOVA is that there is no difference among group means. The alternative hypothesis (Ha) is that at least one group differs significantly from the overall mean of the dependent variable

Hypothesis testing at a 5% significance level ( = 0.05): H0: µ1= µ2 = µ3 = µ4 =…= µ9 versus HA: at least one pair of mean are different from each other

µ1 : the mean for the price of Coupe

µ2 : the mean for the price of Crossover

µ3: the mean for the price of Hatchback

µ4 : the mean for the price of Pickup Truck

µ5 : the mean for the price of Sedan

µ6 : the mean for the price of SUV

µ7: the mean for the price of Truck

µ8: the mean for the price of Van/ Minivan

µ9: the mean for the price of Wagon

We use R , a programming language, to compute calculations. The code is as follows:

library(readxl)
dataset <- read_excel("dataset.xlsx")
one.way <- aov(PRICE ~ factor(MODEL), data = dataset)
summary(one.way)
##                Df    Sum Sq   Mean Sq F value   Pr(>F)    
## factor(MODEL)   8 6.301e+19 7.876e+18   3.976 0.000225 ***
## Residuals     188 3.724e+20 1.981e+18                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • From the table, we have:

    F-statistic:

     F = 3.976

    P-value:

     p-value = 0.000225 < 0.05
  • Since the p-value is smaller than 0.05, the null hypothesis is rejected at the 5% significance level. Thus, there is sufficient evidence that it is likely that the model of car does have a significant effect on price.

We further conduct the Tukey multiple comparison procedure to discover which μi are different and by how much. The R code is as follows:

 TukeyHSD(one.way)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = PRICE ~ factor(MODEL), data = dataset)
## 
## $`factor(MODEL)`
##                                diff          lwr         upr     p adj
## Crossover-Coupe         -6016666667 -10524636014 -1508697319 0.0014020
## Hatchback-Coupe         -6192812500 -10745638369 -1639986631 0.0010228
## Pickuptruck-Coupe       -5911600000 -10437565305 -1385634695 0.0019811
## Sedan-Coupe             -5564514286 -10044058207 -1084970365 0.0041910
## SUV-Coupe               -5163569444  -9611026392  -716112497 0.0102793
## Truck-Coupe             -6025055556 -10562978518 -1487132594 0.0015264
## Van/Minivan-Coupe       -5301300000  -9933773178  -668826822 0.0122574
## Wagon-Coupe             -4029000000 -10275425559  2217425559 0.5285438
## Hatchback-Crossover      -176145833  -1601690909  1249399243 0.9999852
## Pickuptruck-Crossover     105066667  -1232213152  1442346485 0.9999996
## Sedan-Crossover           452152381   -718432562  1622737324 0.9529946
## SUV-Crossover             853097222   -187973704  1894168149 0.2057938
## Truck-Crossover            -8388889  -1385596273  1368818496 1.0000000
## Van/Minivan-Crossover     715366667   -947090286  2377823619 0.9144206
## Wagon-Crossover          1987666667  -2520302681  6495636014 0.9027255
## Pickuptruck-Hatchback     281212500  -1200257400  1762682400 0.9996136
## Sedan-Hatchback           628298214   -704632715  1961229144 0.8638559
## SUV-Hatchback            1029243056   -191520815  2250006926 0.1751000
## Truck-Hatchback           167756944  -1349851679  1685365567 0.9999938
## Van/Minivan-Hatchback     891512500   -888992729  2672017729 0.8190960
## Wagon-Hatchback          2163812500  -2389013369  6716638369 0.8583069
## Sedan-Pickuptruck         347085714   -890994820  1585166249 0.9937802
## SUV-Pickuptruck           748030556   -368393636  1864454747 0.4746368
## Truck-Pickuptruck        -113455556  -1548472796  1321561685 0.9999996
## Van/Minivan-Pickuptruck   610300000  -1100354091  2320954091 0.9706383
## Wagon-Pickuptruck        1882600000  -2643365305  6408565305 0.9287187
## SUV-Sedan                 400944841   -509195133  1311084815 0.9031742
## Truck-Sedan              -460541270  -1741644809   820562269 0.9692546
## Van/Minivan-Sedan         263214286  -1320543656  1846972227 0.9998571
## Wagon-Sedan              1535514286  -2944029635  6015058207 0.9770252
## Truck-SUV                -861486111  -2025438792   302466569 0.3339281
## Van/Minivan-SUV          -137730556  -1628317280  1352856169 0.9999985
## Wagon-SUV                1134569444  -3312887503  5582026392 0.9967453
## Van/Minivan-Truck         723755556  -1018289303  2465800414 0.9291787
## Wagon-Truck              1996055556  -2541867406  6533978518 0.9039384
## Wagon-Van/Minivan        1272300000  -3360173178  5904773178 0.9945886
 plot( TukeyHSD(one.way), las= 1, col="brown")  

The 95% confidence interval for the difference between Crossover and Coupe:

µ2 - µ1 ∈ (-7776720690, -4256612643)

Since the confidence interval does not contain 0, we can conclude that Crossover and Coupe have differen price results with 95% confidence level. In addition, the price of Coupe is larger than that of Crossover.

The 95% confidence interval for the difference between Hatchback and Coupe:

µ3 - µ1 ∈ (-7970379930, -4415245070)

Since the confidence interval does not contain 0, it is plausible that Coupe and Hatchback have different price with 95% confidence level. Additionally, Coupe may have larger price than Hatchback.

The 95% confidence interval for the difference between Pickup Truck and Coupe:

µ4 - µ1 ∈ (-7678680215 -4144519785)

Since the confidence interval does not contain 0, we can conclude that Coupe and Pickup Truck yield different price results with 95% confidence level. In addition, the price of Coupe is larger than that of Pickup Truck model.

Because tukey table has a lot of data, We do the same method for the following figures. After analyzing the whole table, we realized that the Coupe model has the most expensive price among the car models.

B. Year of manufacture

* One way Anova and Tukey

Hypothesis testing at a 5% significance level ( = 0.05):

H0: µ1= µ2 = µ3 = µ4 versus HA: at least one pair of mean are different from each other

µ1 : the mean for the price of cars which is manufactured in 2017

µ2 : the mean for the price of cars which is manufactured in 2021

µ3: the mean for the price of cars which is manufactured in 2022

µ4 : the mean for the price of cars which is manufactured in 2023

We use R , a programming language, to compute calculations. The code is as follows:

one.way <- aov(PRICE~factor(YEAR), data = dataset)
 summary(one.way)
##               Df    Sum Sq   Mean Sq F value Pr(>F)  
## factor(YEAR)   3 1.626e+19 5.421e+18   2.496 0.0611 .
## Residuals    193 4.191e+20 2.172e+18                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • From the table, we have:

    –F-statistic:

    F = 2.496

    –P-value:

    p-value = 0.0611> 0.05
  • Since the p-value is larger than 0.05, the null hypothesis is accepted at the 5% significance level. Thus, there is sufficient evidence that it is likely that the year of manufacture of car does not have a significant effect on price.

2. Specific situation

Suppose a customer, after reviewing, has chosen a satisfactory car model and wants to learn specifically about the factors that affect the price of that car model. For example, we will choose a Crossover model to analyze whether the seating factor affects the price and from there we can apply it to other models and other factors.

  • First let’s get to the CROSSOVER car model

  • Figure 12: Relationship between seat and origin of Crossover model

library(readxl)
CROSSOVERMODEL <- read_excel("CROSSOVERMODEL.xlsx")
ggplot(data = CROSSOVERMODEL) +
   geom_bar(mapping = aes(x = SEAT, fill = ORIGIN))

****Statistics on the number of seats in Crossover automobiles that are imported and assembled domestically are shown in the third column chart. Crossover automobiles are typically produced with a 5-seat configuration in mind. Only crossover cars with seven seats are imported in limited numbers. The only Crossover car line that is constructed locally and produced in greater quantities than the others is the 8-seat line, as opposed to the 7-seat line.

-Figure 13: Relationship between seat and transmission of Crossover model

ggplot(data = CROSSOVERMODEL) +
   geom_bar(mapping = aes(x = SEAT, fill = TRANSMISSION)) 

Based on the number of seats, the fifth column chart displays the quantity of Crossover cars made with automatic and manual transmissions. With the exception of a limited number of cars with seven seats, the number of automobiles with automatic transmissions is often far higher than that of cars with manual transmissions in models with five and eight seats. In contrast, only eight-seat models of manual transmission cars are manufactured.

  • Figure 14: Relationship between seat and brand of Crossover model
ggplot(data = CROSSOVERMODEL) +
   geom_bar(mapping = aes(x = SEAT, fill = BRAND))

Based on the number of seats, the final column table displays the number of crossover vehicle models from Hyundai and Toyota. Overall, there are significant differences in the production objectives of the two automakers: Hyundai prioritizes the mass production of crossover vehicles with a 5-seat configuration, whereas Toyota primarily builds vehicles with a 7-seat configuration.

  • Figure 15: Relationship between brand and price of Crossover model
boxplot(log(CROSSOVERMODEL$PRICE)~CROSSOVERMODEL$BRAND, col="green",xlab="brand", ylab="price")

  • Figure 16: Relationship between transmission and price of Crossover model
boxplot(log(CROSSOVERMODEL$PRICE)~CROSSOVERMODEL$TRANSMISSION, col="black",xlab="brand", ylab="price")

  • Figure 17: Relationship between origin and price of Crossover model
boxplot(log(CROSSOVERMODEL$PRICE)~CROSSOVERMODEL$ORIGIN, col="gray",xlab="brand", ylab="price")

  • Figure 18: Relationship between seat and price of Crossover model
boxplot(log(CROSSOVERMODEL$PRICE)~CROSSOVERMODEL$SEAT, col="orange",xlab="brand", ylab="price")

A. Seat

* One way Anova and Tukey

Hypothesis testing at a 5% significance level ( = 0.05):

H0: µ1= µ2 = µ3 versus HA: at least one pair of mean are different from each other

µ1 : the mean for the price of Crossover model cars with 5 seats

µ2 : the mean for the price of Crossover model cars with 7 seats

µ3: the mean for the price of Crossover model cars with 8 seats

We use R , a programming language, to compute calculations. The code is as follows:

library(readxl)
CROSSOVERMODEL <- read_excel("CROSSOVERMODEL.xlsx")
one.way <- aov(PRICE~factor(SEAT), data = CROSSOVERMODEL)
 summary(one.way)
##              Df    Sum Sq   Mean Sq F value Pr(>F)  
## factor(SEAT)  2 3.017e+17 1.509e+17   4.442 0.0246 *
## Residuals    21 7.132e+17 3.396e+16                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • From the table, we have:

    F-statistic:

    F = 0.0246

    P-value:

    p-value = 0.0246< 0.05
  • Since the p-value is smaller than 0.05, the null hypothesis is rejected at the 5% significance level. Thus, there is sufficient evidence that it is likely that the seat of car does have a significant effect on price.

We further conduct the Tukey multiple comparison procedure to discover which μi are different and by how much. The R code is as follows:

TukeyHSD(one.way)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = PRICE ~ factor(SEAT), data = CROSSOVERMODEL)
## 
## $`factor(SEAT)`
##          diff        lwr       upr     p adj
## 7-5 -56000000 -539463693 427463693 0.9542005
## 8-5 219636364   25744420 413528307 0.0246460
## 8-7 275636364 -209514817 760787544 0.3431588
 plot(TukeyHSD(one.way, conf.level=.95), las = 2)

The 95% confidence interval for the difference between 7 seats and 5 seats:

µ2 - µ1 ∈ (-539463693 427463693)

Since the confidence interval contains 0, there is evidence that 7 seats car and 5 seats car have the same effect on price with 95% confidence interval.

The 95% confidence interval for the difference between 8 seats and 5 seats:

µ3 - µ1 ∈ (25744420 413528307)

Since the confidence interval does not contain 0, it is plausible that 8 seats and 5 seats have different price with 95% confidence level. Addi-tionally, 8 seats car may have larger price than the 5 seats one.

The 95% confidence interval for the difference between 4 seats and 2 seats:

µ4 – µ1 ∈ (-209514817 760787544)

Since the confidence interval contains 0, there is evidence that 8 seats car and 7 seats car have the same effect on price with 95% confidence interval.

  • To conclude, a one-way ANOVA revealed that there was a statistically significant difference in price between the number of seats of Crossover model . The Tukey multiple comparison procedure found that the mean value of the price of not only 5 seats and 7 seats, but also 8 seats and 7 seats. Additionally, we found that consider crossover cars specifically, the 8 seats car has largest price.

B. Origin

* Two sample T-Test (two tailed)

It is a statistical hypothesis that investigates if there is a significant difference between the mean of two independent groups that may have unequal variance. The test is comparing the means of two groups while considering the variability within each group.

There are two hypotheses for the t-test: - H0: µ1 = µ2: the mean for the price of two types of origin are equal. - HA: µ1 ≠µ2: the mean for the price of two types of origin are not equal.

In our comparison of price of car (Crossover model) by origin, we decide to perform your t test using R. The code looks like this:

t.test(PRICE ~ ORIGIN, data= CROSSOVERMODEL)
## 
##  Welch Two Sample t-test
## 
## data:  PRICE by ORIGIN
## t = 2.2196, df = 19.309, p-value = 0.0386
## alternative hypothesis: true difference in means between group Domestic assembly and group Imported is not equal to 0
## 95 percent confidence interval:
##    7873426 263319851
## sample estimates:
## mean in group Domestic assembly          mean in group Imported 
##                       741882353                       606285714

• data: the data used in Two Sample t-test (Domestic assembly and Imported) t: t test-statistic. The positive t-value of 2.2196 indicates that the Domestic assembly sample mean is significantly larger than Imported.

• df: it is the degree of freedom associated with the t-test value.

• p-value: indicates the statistical significance of the result. The p-value is 0.0386 which is lower than alpha (0.005), indicating that the probability of obtaining such a large difference between the two groups by chance is very small.

• alternative hypothesis: we can set the alternative hypothesis. In our case, it was set to check if the true difference in means is not equal to zero. 95 percent confidence interval: 95% confident that the true population means the difference between the two groups lies within the range of (7873426 263319851)

• sample estimates: it tells us the sample means of each group where Domestic asembly and Imported are 741882553 and 606285714, respectively. It means that, on average, Domestic has a higher value than Imported.

In conclusion, because the p-value( 0.0386) is smaller than significant level (0.05), we rejected the null hypothesis( H0). And the results of the Welch Two Sample t-test suggest that there is strong evidence that there is a statistically significant difference between Domestic assembly and Imported.

C. Transmission

* Two sample T-Test (two tailed)

There are two hypotheses for the t-test: - H0: µ1 = µ2: the mean for the price of two types of transmission are equal. - HA: µ1 ≠µ2: the mean for the price of two types of origin are not equal.

In our comparison of price of car (Crossover model) by transmission, we decide to perform your t test using R. The code looks like this:

t.test(PRICE~TRANSMISSION, data=CROSSOVERMODEL)
## 
##  Welch Two Sample t-test
## 
## data:  PRICE by TRANSMISSION
## t = -0.94959, df = 20.204, p-value = 0.3535
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
##  -158805974   59405974
## sample estimates:
## mean in group Automatic    mean in group Manual 
##               694050000               743750000

• data: the data used in Two Sample t-test (Automatic and Manual) t: t test-statistic. The negative t-value of -0.94959 indicates that the Automatic sample mean is significantly smaller than Manual.

• df: it is the degree of freedom associated with the t-test value.

• p-value: indicates the statistical significance of the result. The p-value is 0.3535 which is larger than alpha (0.005)

• alternative hypothesis: we can set the alternative hypothesis. In our case, it was set to check if the true difference in means is not equal to zero. 95 percent confidence interval: 95% confident that the true population means the difference between the two groups lies within the range of (-158805974 59405974)

• sample estimates: it tells us the sample means of each group where Automatic and Manual are 694050000 and 743750000, respectively. In conclusion, because the p-value( 0.3535) is larger than significant level (0.05), we acepted the null hypothesis( H0). And the results of the Welch Two Sample t-test suggest that the evidence is not strong enough so that there is no statistically significant difference between Automatic and Manual.

III. Conclusion

Through methods such as Anova, Tukey and T test, we have helped us analyze the data of the data set, thereby knowing which factors directly affect the price of a car. For example, the car model is an important factor in deciding whether the car is expensive or not. If you want to buy a low-segment car, the Coupe model is definitely not a reasonable choice. Furthermore, if you have chosen a suitable car model, other factors such as the place of assembly as well as the number of seats also greatly affect the choice of a suitable car within the price range.

IV. Apendix

View(dataset)
View(CROSSOVERMODEL)