<- function(hp, wt) {
getType <- rep("economy/compact", length(hp))
type > 3.0] <- "midsized/muscle"
type[wt > 4.5] <- "luxury/superheavy"
type[wt / wt > 60] <- "sports/muscle (hp)"
type[hp return(as.factor(type))
}
Causal Analysis:
|
|
|
https://en.m.wikipedia.org/wiki/Overdrive_(mechanics) https://themotorguy.com/what-is-axle-ratio-in-trucks/ https://www.ecklers.com/understanding-rear-gear-ratios-tech-guide.html https://www.youtube.com/watch?v=NCibZWA46Bo&ab_channel=DustorBust
“Is an automatic or manual transmission better for MPG”
“Quantify the MPG difference between automatic and manual transmissions”
We note both the Mazda RX4s in the data set have engines mis-reported as V-shaped. They are incact wankle rotary engines.
Data Sources
Data acquisition
<- as.data.table(mtcars,keep.rownames = "name")
dt :=factor(vs, labels = c("V-shaped", "straight"))]
dt[,vs:=factor(am, labels = c("automatic", "manual"))]
dt[,am:=as.factor(gear)]
dt[,gear:= getType(hp,wt)] dt[,type
Effect of Transmission Type on MPG
Weight
automatic gearboxes are used in bigger, heavier less economic cars
Weight is a confounder: Cars designed with automatic transmission have poorer mileage because automatic transmission is used on heavier cars while manual tends to be used on lighter more economical cars.
Weight appears to act as an effect modifier with mileage decreasing more sharply with weight for manuals than for automatics. However, the limited overlap between the two groups along with the influence of very economical and very heavy luxury cars does seems to exaggerate this effect.
muscle and mid-size sports cars have both low mpg and manual transmission
Here we are at (or far beyond) the limits of the data. Nevertheless, we can still glimps something which I do believe exists namely, withing the midsized/muscle category we see cars with manual transmission having poorer mileage.
Automatics practical midsized cars (with a few exceptions) indicated by a high quarter mile time: \[\operatorname{qsec}_{auto}\sim N(18.32,3.82^2)\]
Manuals are more heavily punctuated with muscle and high performance sports cars with a 92% quarter mile time on average: \[\operatorname{qsec}_{stick}\sim N(16.83,3.20^2)\] (This design choice saved weight and made gear changes quicker and acceleration higher.)
This suggests that manuals are concentrated among higher performance cars, which explains why their fuel efficiency appears lower in this subset even when controlling for weight.
no. carburetor
carburetors themselves are not a statistically significant predictor of fuel efficiency
The relationship between transmission type and mileage was examined while adjusting for the number of carburettors. We observe that the primary effect of transmission on mileage remains largely unchanged after this adjustment. Additionally, there is a negative correlation between mileage and the number of carburettors, indicating that cars with more carburettors tend to be less fuel-efficient. This aligns with the observation that more wasteful, bulkier or sportier vehicles typically use more carburettors. However, there are a few high-influence sports cars that deviate from this general trend.
my preconceived theory regarding carburettors wasting fuel is not evidenced
<- lm(mpg ~ wt+cyl+carb, data = dt)
model anova(model)
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
wt 1 847.73 847.73 133.8014 3.526e-12 ***
cyl 1 87.15 87.15 13.7554 0.000912 ***
carb 1 13.77 13.77 2.1738 0.151536
Residuals 28 177.40 6.34
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vif(model)
wt cyl carb
2.581453 2.920519 1.385647
I noticed that the number of carburetors appears to affect mileage in the additive model mpg ~ wt + disp + carb
. However, the variance inflation factors for weight and displacement were high indicating collinearity between the two. The number of carburetors does not appear to be overly associated with the bulk (PC1) score but it does contribute positively to the sportiness score. This aligns with our expectation that vehicles with more carburetors tend to be overpowered and less fuel-efficient. When we model mpg with cylinders instead, we see that previously the number of carburettors was simply acting as a proxy for performance-oriented cars.
Adjustment for drat
Fuel efficiency increases with the final drive ratio. This is likely because larger vehicles generally require bigger more powerful engines as well as lower output-to-input gearing ratios. These lower ratios help provide the torque needed for acceleration and we suppose compensate for the lower engine RPMs resulting from design constraints imposed by the larger centrifugal forces in bigger engines. There is insufficient overlap between the automatic and manual transmission groups to allow a meaningful analysis.
<- dt[drat>3.5 & drat<4.0]
dt2
#| echo: false
<- lm(mpg ~ am+drat, data = dt2)
model ggplot(dt2, aes(x = drat, y = mpg, color = am)) +
geom_point(alpha=0.4) +
geom_line(aes(y = predict(model, newdata = dt2)), linewidth = 1) + # use linewidth instead of size
geom_hline(aes(yintercept = mean(dt2[am=="automatic"]$mpg)), linetype = "dashed", size = 1,colour = "coral") +
geom_hline(aes(yintercept = mean(dt2[am=="manual"]$mpg)), linetype = "dashed", size = 1, colour="cyan") +
labs(
title = "MPG vs Rear Axle Ratio by Transmission",
x = "Final Drive Ratio",
y = "Miles Per Gallon (MPG)",
color = "Transmission"
+
) theme_minimal()
Appendices
A0) Statistical variation in car design
To explore statistical variation in available car design in 1970’s USA we can look at how variance is distributed in the mtcars
dataset using PCA. It reveals there are three orthogonal directions containing most of the variance; 60%, 24% and 6%.
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 2.5707 1.6280 0.79196 0.51923 0.47271 0.46000 0.3678
Proportion of Variance 0.6008 0.2409 0.05702 0.02451 0.02031 0.01924 0.0123
Cumulative Proportion 0.6008 0.8417 0.89873 0.92324 0.94356 0.96279 0.9751
PC8 PC9 PC10 PC11
Standard deviation 0.35057 0.2776 0.22811 0.1485
Proportion of Variance 0.01117 0.0070 0.00473 0.0020
Cumulative Proportion 0.98626 0.9933 0.99800 1.0000
Car bulkiness
The first principal component - the direction of greatest variance in the data is driven mainly by weight and engine size, with mileage (closely linked to engine size) and to a lesser extent the final drive ratio also contributing. This dimension approximates “bulk” i.e. large, heavy cars with powerful engines and poor mileage score high while smaller, lighter, more efficient cars score low.
#! echo: false
<- pca_result$rotation[, 1]
pc1_loadings <- pc1_loadings[abs(pc1_loadings) > 0.25]
filtered_pc1 order(-abs(filtered_pc1))] filtered_pc1[
cyl.V1 disp.V1 mpg.V1 wt.V1 hp.V1 vs.V1 drat.V1
0.3739160 0.3681852 -0.3625305 0.3461033 0.3300569 -0.3065113 -0.2941514
<-pca_result$x[,1] bulk_scores
Car sportiness
The second principal component - the direction of greatest variance orthogonal to bulkiness lies in the direction of drag ability, transmission design (gears and type), the no. of carburettors and to a lesser extent the final drive ratio.
This second measure align the sportiness controlled for bulk. Cars with V-6 and V-8 instead of inline engines and slightly bigger power to weight ratios score higher in terms of sportiness than cars of similar bulk which do not.
#! echo: false
<- pca_result$rotation[, 2]
pc2_loadings <- pc2_loadings[abs(pc1_loadings) > 0.25]
filtered_pc2 order(-abs(filtered_pc2))] filtered_pc2[
drat.V1 hp.V1 vs.V1 wt.V1 disp.V1 cyl.V1
0.27469408 0.24878402 -0.23164699 -0.14303825 -0.04932413 0.04374371
mpg.V1
0.01612440
<-pca_result$x[,2] sport_scores
A1) U.S. car type classification in 1970s
The dataset (mtcars
) spans a wide variety of car types. I applied a basic ruleset to classify cars into distinct types. Classification boundaries are shown below, along with rough efficiency strata estimates1, the bulk direction in the hp ~ wt
plane at the data mean (arrow indicating the principal component direction), and the actual sportiness measure for each car (point size).
Here’s the ruleset (in the spirit of the course, we could have identified these points as outliers, high-leverage or influential but I believe this approach is clearer and more transferable)…
<- function(hp, wt) {
getType <- rep("economy/compact", length(hp))
type > 3.0] <- "midsized/muscle"
type[wt > 4.5] <- "luxury/superheavy"
type[wt / wt > 60] <- "sports/muscle (hp)"
type[hp return(as.factor(type))
}
- sports/muscle (hp) (any car producing \(60+\) hp per half-ton): Sports cars of the 1970s as today, tended to be light for manoeuvrability and speed while maintaining a high power-to-weight ratio. Even lower-powered roadsters such as the Lotus Europa were stripped down and very basic inside to maximise performance through a high power-to-weight ratio.
- luxury/superheavy: any other car exceeding \(4.5\) half-tons. This class is exemplified by the Cadillac - heavy steel-bodied cars with big-block V8s that prioritised comfort over performance, resulting in low power-to-weight ratios.
- midsized/muscle: any other car exceeding \(3\) half-tons. Cars in this range are midsized saloon or two door cars fitted with big-block engines, creating the classic American muscle car. These vehicles were tuned for straight-line acceleration, emphasising horsepower and torque over handling or efficiency while still being practical enough to serve as everyday cars.
- economy/compact: few small, economical cars were produced domestically so this category mostly consisted of European and Japanese imports. These cars were lightweight, fuel-efficient and designed for practicality with smaller engines and simple interiors.
A1) Engine and drive system effect on mileage
A car’s efficiency curve is convex due to two competing effects;
- fuel consumption increases generally increases with power (speed)
- power (speed) is useful output which increases efficiency
The mechanical efficiency of various drive system components detailed below:
Engine Efficiency: Fuel consumption increases with engine RPM
A four-stroke (Otto) cycle:
- aspiration
- compression
- ignition/combustion (not a stroke)
- expansion
- exhaust
Ideally, engine RPM would match wheel RPM (avoiding gearbox losses) while still providing torque for acceleration in a lightweight, compact design. Engines use less fuel at lower RPM but face efficiency limits at idle. Lubrication relies on an oil sump where sloshing creates resistance and energy losses.
Note: engine power mtcars$hp
is measured directly at the crank prior to any transmission losses (so called brake horsepower).
Carburettor Efficiency: carburettor reduce mileage
In the days before fuel injection, engines drew air through a carburettor where it was mixed with fuel. The air-fuel ratio was determined by airflow (via the Bernoulli principle) and by prior tuning. For cold starts, a choke allowed the driver to temporarily enrich the mixture.
An enriched mixture made starting easier but reduced fuel economy. Sometimes unburned fuel would ignite in the hot exhaust, causing the familiar backfire of older cars. A lean mixture on the other hand, could make the engine stall or cause sluggish throttle response. Striking the right balance was always a challenge, especially in engines with multiple carburettors which tended to favour performance and enrichment over efficiency.
Transmission Efficiency:
Gearboxes introduce mechanical losses. In top gear, the ratio is typically close to 1:1, so the crankshaft and driveshaft turn in unison, matching engine and wheel RPM. Lower gears provide torque for acceleration, while higher gears enable efficient cruising.
Fuel consumption decreases with number of gears
Progressing sequentially through the gears allows the car to reach cruising speed quickly while keeping the engine near its optimal RPM. Too low a gear wastes fuel by over-revving; too high a gear lacks torque, slows acceleration and may even stall the engine. More gears provide finer steps improving efficiency and fuel economy.
automatics of the 1970s were less efficient than manuals
In the 1970s, manual gear changes were more efficient than automatics, which were heavy, mechanically complex, and relied on inefficient clutch designs. Modern electronic control has largely eliminated this disadvantage.
Differential Efficiency: lower final drive ratio increases efficiency
The rear axle ratio is the gearing between the transmission output and the average rear wheels with a higher ratio meaning more engine turns per wheel rotation. In top gear the gearbox ratio is typically close to 1:1 so the rear axle (aka final drive) ratio defines the overall gearing at cruising speed.
\[\boxed{\operatorname{drat}=\frac{\text{no. ring gear (output) teeth}}{\text{no. pinion (input) teeth}}}\]
In the mockup the gearing ratio between there worm gear and the gear on the outside of the differential housing determines the final gearing ratio 2.
A2) Causal pathway between MPG and transmission type
flowchart LR Engine[<b>Confounder</b>:<br>Engine Displacement] Gears{<i>Influencer</i>:<br>No. Gears} subgraph CausalPathway [<b><i>The Causal Pathway</i></b>] direction LR Transmission[<b>Exposure</b>:<br>Transmission<br>Manual/Automatic] Differential[<b>Mediator</b>:<br>Rear axel ratio] Weight{<i>Influencer</i>:<br>Weight} MPG[<b>Output</b>:<br>MPG] end Emissions[<b>Collider</b>:<br>Emissions Rating] Engine --> |some other pathway| Emissions Transmission --> |some other pathway| Emissions Differential --> |some other pathway| Emissions Engine --> Gears --> Transmission --> Differential --> Weight --> MPG --> Emissions
carb
|
v
sportiness —-> mpg ^ | cyl —-> mpg | v wt —-> mpg
transmission —-> mpg
- Carburetors influence mpg indirectly via sportiness.
- Cylinders and weight influence mpg directly.
- Transmission influences mpg directly, independent of carburetors.
- The non-significant ANOVA result for carb reflects that, once you control for cyl and wt, the direct effect of carburetors is minimal.
A3) Exploratory Analysis - Transmission Type (Adjusted)
Initial thoughts observations:
- we don’t have a huge volume of data here
- mpg, weight and displacement look like great general predictors for mpg as all three as they are all colinear (remember to watch model stability if regressing on these covariates - better avoid weight and displacement.
- The linearity is great however the variance is too low to fit a stable multivariable regression model (i.e. these points sit in a subspace on the plane - we’ve hit the ceiling of the available data). Since mpg is what we want to study we must avoid wt and disp when it comes to regression models
- the final columns clearly shows that automatics use a lower final drive ratio and manuals use a higher final drive ratio - I suspect the is down to non-negligible gearing in the transmission at top gear.
propensity scores - lack of data overlap VIF variance inflation factorv- colinear data F-value is we add interaction terms outliers? residuals cooks.dist?
mpg~hp+am (model) mpg~carb+am (model)
mpg=drat+am (explain design, propensity score)
A5)
<- "https://www.fueleconomy.gov/feg/EPAGreenGuide/xls/all_alpha_25.xlsx"
url <- "./all_alpha_25.xlsx"
destfile
if (!file.exists(destfile)) {
download.file(url, destfile, mode = "wb")
}
<- as.data.table(read_excel(destfile))
dModern :=as.factor(Fuel)]
dModern[,Fuel<-dModern[Fuel=="Gasoline"|Fuel=="Diesel"] # avoid LPG and interesting hybrid/elec.
dModern:=as.factor(Drive)]
dModern[,Drive:=as.factor(`Veh Class`)]; dModern[,`Veh Class` := NULL]
dModern[,class:=as.factor(Stnd)] # emission standard
dModern[,Stnd:=as.factor(`Underhood ID`)]; dModern[,`Underhood ID` := NULL]
dModern[,Engine:=as.factor(Trans)]
dModern[,TransClass:= fifelse(grepl("^Man-", Trans), "Manual",
dModern[, Trans fifelse(grepl("^SemiAuto-", Trans), "Semi-Automatic",
fifelse(Trans == "CVT", "Automatic (CVT)",
fifelse(grepl("^AutoMan|^AMS-", Trans), "Automated Manual (AMT)",
"Automatic"))))]
:=as.factor(Trans)] dModern[,Trans
References
Footnotes
Predicted efficiency strata were calculated using a simple multivariable regression model (mpg ~ hp * wt * disp). I acknowledge that both these predicted strata and the car type decision boundaries are highly dataset-specific and somewhat subjective. The coplanar nature of the predictors makes the model fit extremely unstable, and the large number of parameters leaves few residual degrees of freedom. I include this model here mainly as a note: introducing interaction terms means the response surface is no longer a flat plane, and taking a lower-dimensional cross-section (as I have done) produces non-linear contours that illustrate how efficiency varies across combinations of power, weight, and displacement.↩︎
in the lego mockup the worm gear to turn the differential and three fixed there are 4 ring gears housed inside the rotating casing
↩︎