Final Project

Introduction:

This is the Fuel economy data for all cars sold in the US from 1984 to 2015 from the source of Enviromental protection agency. And this project are to investigate the change in fuel econony and relationships between fuel ecomomy and other car types.

https://cran.r-project.org/web/packages/fueleconomy/fueleconomy.pdf

Main questions includes:

  1. Does any change the fuel economy of vehicles from 1984 to 2015?
  2. What is the relationships between fuel economy and engines with different cylinders?
  3. What is the relationships for different fuel types?

Data: Write about the data from your proposal in text form. Address the following points:

Data collection: Describe how the data were collected.

library(fueleconomy)
## Warning: package 'fueleconomy' was built under R version 3.3.3
library(DT)
## Warning: package 'DT' was built under R version 3.3.3
fuel <- fueleconomy::vehicles
head(fuel)
##      id       make               model year                       class
## 1 27550 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5  1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6  1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
##             trans            drive cyl displ    fuel hwy cty
## 1 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 2 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 3 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 4 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 5 Automatic 3-spd Rear-Wheel Drive   4   2.5 Regular  17  16
## 6 Automatic 3-spd Rear-Wheel Drive   6   4.2 Regular  13  13

The fuel economy data were collected from the EPA by observational data.

str(fuel)
## Classes 'tbl_df', 'tbl' and 'data.frame':    33442 obs. of  12 variables:
##  $ id   : int  27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
##  $ make : chr  "AM General" "AM General" "AM General" "AM General" ...
##  $ model: chr  "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
##  $ year : int  1984 1984 1984 1984 1985 1985 1987 1997 1997 1997 ...
##  $ class: chr  "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" ...
##  $ trans: chr  "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
##  $ drive: chr  "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" ...
##  $ cyl  : int  4 4 6 6 4 6 6 4 4 6 ...
##  $ displ: num  2.5 2.5 4.2 4.2 2.5 4.2 3.8 2.2 2.2 3 ...
##  $ fuel : chr  "Regular" "Regular" "Regular" "Regular" ...
##  $ hwy  : int  17 17 13 13 17 13 21 26 28 26 ...
##  $ cty  : int  18 18 13 13 16 13 14 20 22 18 ...

id Unique EPA identifier make Manufacturer model Model name year Model year class EPA vehicle size class, http://www.fueleconomy.gov/feg/ws/wsData.shtml#VClass** trans Transmission drive Drive train cyl Number of cylinders displ Engine displacement, in litres fuel Fuel type hwy Highway fuel economy, in mpg cty City fuel economy, in mpg**

Cases:

33,442 cases from 1984 to 2015 with different manufacturer, model name, year, epa vehicle size class, transmission, drive train, fuel type, highway fuel economy and city fuel economy.

Variables:

The type of study is an observational study. In order to understand the relationships between explanatory variables and reponse variable, I study the dataset collected by years, numbers of cylinders and fuel type in different ways: explanatory data analysis with both summary statistics and data visulization, inference analysis and applying for Machine learning knowledge.

Scope of inference - generalizability:

Identify the population of interest, and whether the findings from this analysis can be generalized to that population, or, if not, a subsection of that population. Explain why or why not. Also discuss any potential sources of bias that might prevent generalizability.

The findings from this analysis can be generlized to that 33,442 cases, because the sample size are large enough for identification and the fuel economy are supposed independently to be estimated, exclusive other miner factor, like human driving improperly.

Scope of inference - causality:

The cause (e.g. fuel economy ) and effect (e.g. number of cylinders) are related,there are no plausible alternative explanations for the observed covariation. Though observation study, the causal connection can be investigated between cause and effect.

Exploratory data analysis:

The single value of predicated variables for fuel economy should be used, so the combined fuel economy will be calculated for data analysis, the ratio informaton as following link:

https://www.fueleconomy.gov/feg/label/learn-more-gasoline-label.shtml#fuel-economy

Combined fuel economy is a weighted average of City and Highway MPG values that is calculated by weighting the City value by 55% and the Highway value by 45%

Tranformation and summary of dataset

fuel$Combined_mpg <-  fuel$cty*0.55 + fuel$hwy*0.45
summary(fuel)
##        id            make              model                year     
##  Min.   :    1   Length:33442       Length:33442       Min.   :1984  
##  1st Qu.: 8361   Class :character   Class :character   1st Qu.:1991  
##  Median :16724   Mode  :character   Mode  :character   Median :1999  
##  Mean   :17038                                         Mean   :1999  
##  3rd Qu.:25265                                         3rd Qu.:2008  
##  Max.   :34932                                         Max.   :2015  
##                                                                      
##     class              trans              drive                cyl        
##  Length:33442       Length:33442       Length:33442       Min.   : 2.000  
##  Class :character   Class :character   Class :character   1st Qu.: 4.000  
##  Mode  :character   Mode  :character   Mode  :character   Median : 6.000  
##                                                           Mean   : 5.772  
##                                                           3rd Qu.: 6.000  
##                                                           Max.   :16.000  
##                                                           NA's   :58      
##      displ           fuel                hwy              cty        
##  Min.   :0.000   Length:33442       Min.   :  9.00   Min.   :  6.00  
##  1st Qu.:2.300   Class :character   1st Qu.: 19.00   1st Qu.: 15.00  
##  Median :3.000   Mode  :character   Median : 23.00   Median : 17.00  
##  Mean   :3.353                      Mean   : 23.55   Mean   : 17.49  
##  3rd Qu.:4.300                      3rd Qu.: 27.00   3rd Qu.: 20.00  
##  Max.   :8.400                      Max.   :109.00   Max.   :138.00  
##  NA's   :57                                                          
##   Combined_mpg   
##  Min.   :  7.80  
##  1st Qu.: 16.70  
##  Median : 19.70  
##  Mean   : 20.22  
##  3rd Qu.: 22.70  
##  Max.   :123.15  
## 

Years and Combined fuel economy mpg are used

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
fuel_year <- fuel %>%
  group_by(year) %>%
  summarise(n = n(), mean = mean(Combined_mpg), median = median(Combined_mpg), sd = sd(Combined_mpg))
fuel_year
## # A tibble: 32 ⊙ 5
##     year     n     mean median       sd
##    <int> <int>    <dbl>  <dbl>    <dbl>
## 1   1984   784 17.15944  17.25 4.182516
## 2   1985  1701 20.20212  19.60 5.320747
## 3   1986  1210 19.93054  19.60 5.253975
## 4   1987  1247 19.62097  19.35 5.135072
## 5   1988  1130 19.74969  19.25 5.041844
## 6   1989  1153 19.53877  19.15 5.175750
## 7   1990  1078 19.42032  19.05 4.955587
## 8   1991  1132 19.28101  18.70 4.916046
## 9   1992  1121 19.34095  19.05 4.894614
## 10  1993  1093 19.60018  19.05 4.869317
## # ... with 22 more rows
hist(fuel$Combined_mpg)

Remove outliners

fuel <- fuel[!fuel$Combined_mpg %in% boxplot.stats(fuel$Combined_mpg)$out, ]

Visaulization of Combined fuel economies

hist(fuel$Combined_mpg)

Graph analysis for different years

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
plot(fuel_year$year,fuel_year$mean, type="b", lty=6, col="blue", bg="green", xlab="year", ylab="Mean of fuel") + abline(lm(fuel_year$mean ~ fuel_year$year))

## numeric(0)
library(ggplot2)
ggplot(fuel, aes(y = Combined_mpg, x = as.factor(year), fill = as.factor(year))) + geom_boxplot(alpha = 0.5) + scale_x_discrete("", labels = NULL, breaks = NULL) + scale_y_continuous("", limits = c(5,35)) + guides(fill=guide_legend(title="Year")) + theme(legend.justification = c(0, 0.75), legend.position = c(0, 0.75), legend.background = element_rect(fill=NA), legend.title = element_text(face="bold"), legend.title.align = 0.5) + stat_summary(fun.y = mean, geom = "point", shape = 4, size = 2) + coord_flip() + ggtitle("Distribution of Fuel Economy by Year\n")

ggplot(fuel, aes(x = Combined_mpg, fill = as.factor(year))) + geom_density(alpha = 0.5) + scale_y_continuous("", labels = NULL, breaks = NULL) + scale_x_continuous("Fuel Economy (Combined_mpg)", limits = c(5,35)) + guides(fill=guide_legend(title="Year")) + theme(legend.justification = c(0, 0.75), legend.position = c(0, 0.75), legend.background = element_rect(fill=NA), legend.title = element_text(face="bold"), legend.title.align = 0.5)

Number of cylinders and Combined fuel economy mpg are used

unique(fuel$cyl)
##  [1]  4  6  5  8 12 10 16 NA  3  2

Remove NA number of cylinder

library(dplyr)
fuel_cyl <- fuel %>% filter(!is.na(fuel$cyl))

unique(fuel_cyl$cyl)
## [1]  4  6  5  8 12 10 16  3  2

Comparsion for the mean, median and sd for different number of cylinder

grp_cyl <- fuel_cyl %>%
  group_by(cyl) %>%
  summarise(n = n(), mean = mean(Combined_mpg), median = median(Combined_mpg), sd = sd(Combined_mpg))
grp_cyl
## # A tibble: 9 ⊙ 5
##     cyl     n     mean median        sd
##   <int> <int>    <dbl>  <dbl>     <dbl>
## 1     2    45 18.37000  18.60 0.5289784
## 2     3    29 29.09310  29.80 1.5587726
## 3     4 11764 23.51209  23.25 3.3083728
## 4     5   718 20.85578  20.60 2.7252662
## 5     6 11884 18.90852  19.05 2.6248608
## 6     8  7550 15.48340  15.25 2.6219897
## 7    10   138 14.46920  14.60 1.8040640
## 8    12   478 13.36056  13.70 1.7624723
## 9    16     7 10.95714  11.15 0.2405351

Graph analysis for different number of cylinders

plot(grp_cyl$cyl ,grp_cyl$mean, type="b", lty=6, col="blue", bg="green", xlab="number of cylinder", ylab="Mean of fuel") + abline(lm(grp_cyl$mean ~ grp_cyl$cyl))

## numeric(0)

Visaulization of number of cylinders with Combined fuel economies

ggplot(fuel_cyl, aes(y = Combined_mpg, x = as.factor(cyl), fill = as.factor(cyl))) + geom_boxplot(alpha = 0.5) + scale_x_discrete("Cylinders") + scale_y_continuous("Fuel Economy (mpg)\n") +  theme(legend.position = 'none') + stat_summary(fun.y = mean, geom = "point", shape = 4, size = 2) + ggtitle("Fuel Economy by Number of Engine Cylinders\n")

Fuel types and Combined fuel economy mpg are used

unique(fuel$fuel)
##  [1] "Regular"                    "Premium"                   
##  [3] "Premium or E85"             "Diesel"                    
##  [5] "Gasoline or E85"            "Gasoline or natural gas"   
##  [7] "CNG"                        "Electricity"               
##  [9] "Midgrade"                   "Premium Gas or Electricity"
## [11] "Gasoline or propane"        "Premium and Electricity"

It is no meaning to compare the item of “or / and”, it should be removed.

library(dplyr)
fuel_clean <- fuel %>% filter(!grepl("or|and",fuel$fuel))

unique(fuel_clean$fuel)
## [1] "Regular"     "Premium"     "Diesel"      "CNG"         "Electricity"
## [6] "Midgrade"
fuel_type <- fuel_clean %>%
  group_by(fuel) %>%
  summarise(n = n(), mean = mean(Combined_mpg), median = median(Combined_mpg), sd = sd(Combined_mpg))
fuel_type
## # A tibble: 6 ⊙ 5
##          fuel     n     mean median       sd
##         <chr> <int>    <dbl>  <dbl>    <dbl>
## 1         CNG    55 17.81364  14.25 7.047579
## 2      Diesel   699 20.40572  19.15 4.156685
## 3 Electricity     1 28.00000  28.00      NaN
## 4    Midgrade    43 18.02907  18.05 1.645384
## 5     Premium  8575 19.51809  19.60 3.785541
## 6     Regular 22091 19.89339  19.70 4.516686

Inference:

Years and Combined fuel economy mpg are used

boxplot(fuel$Combined_mpg ~ fuel$year)

by(fuel$Combined_mpg, fuel$year, mean)
## fuel$year: 1984
## [1] 17.15944
## -------------------------------------------------------- 
## fuel$year: 1985
## [1] 19.77042
## -------------------------------------------------------- 
## fuel$year: 1986
## [1] 19.45336
## -------------------------------------------------------- 
## fuel$year: 1987
## [1] 19.27115
## -------------------------------------------------------- 
## fuel$year: 1988
## [1] 19.40289
## -------------------------------------------------------- 
## fuel$year: 1989
## [1] 19.09172
## -------------------------------------------------------- 
## fuel$year: 1990
## [1] 19.05786
## -------------------------------------------------------- 
## fuel$year: 1991
## [1] 18.96407
## -------------------------------------------------------- 
## fuel$year: 1992
## [1] 18.98776
## -------------------------------------------------------- 
## fuel$year: 1993
## [1] 19.28198
## -------------------------------------------------------- 
## fuel$year: 1994
## [1] 19.25253
## -------------------------------------------------------- 
## fuel$year: 1995
## [1] 19.1124
## -------------------------------------------------------- 
## fuel$year: 1996
## [1] 19.88937
## -------------------------------------------------------- 
## fuel$year: 1997
## [1] 19.78054
## -------------------------------------------------------- 
## fuel$year: 1998
## [1] 19.71139
## -------------------------------------------------------- 
## fuel$year: 1999
## [1] 19.60817
## -------------------------------------------------------- 
## fuel$year: 2000
## [1] 19.52597
## -------------------------------------------------------- 
## fuel$year: 2001
## [1] 19.45185
## -------------------------------------------------------- 
## fuel$year: 2002
## [1] 19.29201
## -------------------------------------------------------- 
## fuel$year: 2003
## [1] 19.10971
## -------------------------------------------------------- 
## fuel$year: 2004
## [1] 19.2832
## -------------------------------------------------------- 
## fuel$year: 2005
## [1] 19.4293
## -------------------------------------------------------- 
## fuel$year: 2006
## [1] 19.32516
## -------------------------------------------------------- 
## fuel$year: 2007
## [1] 19.42826
## -------------------------------------------------------- 
## fuel$year: 2008
## [1] 19.61594
## -------------------------------------------------------- 
## fuel$year: 2009
## [1] 20.14529
## -------------------------------------------------------- 
## fuel$year: 2010
## [1] 20.8785
## -------------------------------------------------------- 
## fuel$year: 2011
## [1] 20.97287
## -------------------------------------------------------- 
## fuel$year: 2012
## [1] 21.32059
## -------------------------------------------------------- 
## fuel$year: 2013
## [1] 22.0118
## -------------------------------------------------------- 
## fuel$year: 2014
## [1] 22.31938
## -------------------------------------------------------- 
## fuel$year: 2015
## [1] 23.45281

This combined fuel of difference is statistically significant.

Check conditions

Before using ANOVA test to check the means across multiple groups, the condition are checked as follow:

  1. Independent condition: the data are independent observation. Each group should be independent of each other and select by randomization.

Fuel economies for years

  1. The hypotheses test:
  • \(H_0\): All the means of fuel economies from 1984 to 2015 are the same.
  • \(H_A\): All the means of fuel economies from 1984 to 2015 are not all the same.

To perform the ANOVA test, a linear regression is performed, and an ANOVA table is created using the anova function

test_year <- lm(Combined_mpg ~  as.factor(year), data = fuel)
summary(test_year)
## 
## Call:
## lm(formula = Combined_mpg ~ as.factor(year), data = fuel)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8704  -3.1579  -0.1019   2.6471  14.5406 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          17.1594     0.1504 114.089   <2e-16 ***
## as.factor(year)1985   2.6110     0.1826  14.300   <2e-16 ***
## as.factor(year)1986   2.2939     0.1942  11.814   <2e-16 ***
## as.factor(year)1987   2.1117     0.1928  10.955   <2e-16 ***
## as.factor(year)1988   2.2435     0.1966  11.411   <2e-16 ***
## as.factor(year)1989   1.9323     0.1960   9.859   <2e-16 ***
## as.factor(year)1990   1.8984     0.1985   9.562   <2e-16 ***
## as.factor(year)1991   1.8046     0.1964   9.189   <2e-16 ***
## as.factor(year)1992   1.8283     0.1969   9.287   <2e-16 ***
## as.factor(year)1993   2.1225     0.1978  10.729   <2e-16 ***
## as.factor(year)1994   2.0931     0.2024  10.342   <2e-16 ***
## as.factor(year)1995   1.9530     0.2029   9.625   <2e-16 ***
## as.factor(year)1996   2.7299     0.2142  12.743   <2e-16 ***
## as.factor(year)1997   2.6211     0.2149  12.198   <2e-16 ***
## as.factor(year)1998   2.5520     0.2117  12.054   <2e-16 ***
## as.factor(year)1999   2.4487     0.2096  11.682   <2e-16 ***
## as.factor(year)2000   2.3665     0.2101  11.264   <2e-16 ***
## as.factor(year)2001   2.2924     0.2062  11.116   <2e-16 ***
## as.factor(year)2002   2.1326     0.2029  10.512   <2e-16 ***
## as.factor(year)2003   1.9503     0.1998   9.761   <2e-16 ***
## as.factor(year)2004   2.1238     0.1967  10.798   <2e-16 ***
## as.factor(year)2005   2.2699     0.1952  11.627   <2e-16 ***
## as.factor(year)2006   2.1657     0.1971  10.988   <2e-16 ***
## as.factor(year)2007   2.2688     0.1961  11.570   <2e-16 ***
## as.factor(year)2008   2.4565     0.1943  12.645   <2e-16 ***
## as.factor(year)2009   2.9859     0.1944  15.356   <2e-16 ***
## as.factor(year)2010   3.7191     0.1974  18.844   <2e-16 ***
## as.factor(year)2011   3.8134     0.1972  19.333   <2e-16 ***
## as.factor(year)2012   4.1612     0.1975  21.072   <2e-16 ***
## as.factor(year)2013   4.8524     0.1974  24.581   <2e-16 ***
## as.factor(year)2014   5.1599     0.1967  26.234   <2e-16 ***
## as.factor(year)2015   6.2934     0.3363  18.713   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.211 on 32585 degrees of freedom
## Multiple R-squared:  0.05216,    Adjusted R-squared:  0.05126 
## F-statistic: 57.84 on 31 and 32585 DF,  p-value: < 2.2e-16
library(StMoSim)
## Warning: package 'StMoSim' was built under R version 3.3.3
## Loading required package: RcppParallel
## Loading required package: Rcpp
## Warning: package 'Rcpp' was built under R version 3.3.3
## 
## Attaching package: 'Rcpp'
## The following object is masked from 'package:RcppParallel':
## 
##     LdFlags
qqnormSim(test_year$residuals)
qqline(test_year$residuals)

The P-value is almost equal to 0, the null hypothesis rejected, that mean there is statiscally significant evidence that the fuel economy of vehicles is different from 1984 to 2015.

Fuel economies for fuel

  1. The hypotheses test:
  • \(H_0\): All the types of fuel for fuel economies are the same.
  • \(H_A\): All the types of fuel for fuel economies are not all the same.

To perform the ANOVA test, a linear regression is performed, and an ANOVA table is created using the anova function

test_fuel <- lm(Combined_mpg ~  fuel, data = fuel_clean)
summary(test_fuel)
## 
## Call:
## lm(formula = Combined_mpg ~ fuel, data = fuel_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9934  -3.0934  -0.0934   2.7066  12.0819 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      17.8136     0.5831  30.547  < 2e-16 ***
## fuelDiesel        2.5921     0.6057   4.280 1.88e-05 ***
## fuelElectricity  10.1864     4.3639   2.334 0.019589 *  
## fuelMidgrade      0.2154     0.8804   0.245 0.806681    
## fuelPremium       1.7045     0.5850   2.914 0.003576 ** 
## fuelRegular       2.0798     0.5839   3.562 0.000369 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.325 on 31458 degrees of freedom
## Multiple R-squared:  0.002626,   Adjusted R-squared:  0.002468 
## F-statistic: 16.57 on 5 and 31458 DF,  p-value: 2.249e-16
qqnormSim(test_fuel$residuals)
qqline(test_fuel$residuals)

The P-value is almost equal to 0, the null hypothesis rejected, that mean there is statiscally significant evidence that fuel economy of vehicles is different for cars using different fuel types.

Fuel economies for cylinder

  1. The hypotheses test:
  • \(H_0\): All the number of cylinders for fuel economies are the same.
  • \(H_A\): All the number of cylinders for fuel economies are not the same.

To perform the ANOVA test, a linear regression is performed, and an ANOVA table is created using the anova function

test_cyl <- lm(Combined_mpg ~  as.factor(cyl), data = fuel_cyl)
summary(test_cyl)
## 
## Call:
## lm(formula = Combined_mpg ~ as.factor(cyl), data = fuel_cyl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5121  -2.1085  -0.2085   2.0879  12.4915 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       18.3700     0.4289  42.834  < 2e-16 ***
## as.factor(cyl)3   10.7231     0.6851  15.652  < 2e-16 ***
## as.factor(cyl)4    5.1421     0.4297  11.967  < 2e-16 ***
## as.factor(cyl)5    2.4858     0.4421   5.623 1.90e-08 ***
## as.factor(cyl)6    0.5385     0.4297   1.253     0.21    
## as.factor(cyl)8   -2.8866     0.4301  -6.711 1.97e-11 ***
## as.factor(cyl)10  -3.9008     0.4939  -7.898 2.91e-15 ***
## as.factor(cyl)12  -5.0094     0.4486 -11.167  < 2e-16 ***
## as.factor(cyl)16  -7.4129     1.1689  -6.342 2.30e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.877 on 32604 degrees of freedom
## Multiple R-squared:  0.5573, Adjusted R-squared:  0.5572 
## F-statistic:  5131 on 8 and 32604 DF,  p-value: < 2.2e-16
qqnormSim(test_cyl$residuals)
qqline(test_cyl$residuals)

The P-value is almost equal to 0, the null hypothesis rejected, that mean there is statiscally significant evidence that fuel economy of vehicles is different for engines with different numbers of cylinders.

Conclusion: