ContextBase Logo



Synopsis

This document explores the dataset, “mtcars”, via univariate, bivariate, and multivariate analysis. The dataset that was selected is a 1974 Motor Trend magazine survey of 11 characteristics of 32 automobile models. The survey allows for the exploration of the relationship between automobile configurations, performance, and gas mileage. The Univariate Analysis involves exploring individual mtcars data categories in order to determine if any trends relate to the subject car models. The Bivariate Analysis involves comparing two different observations in order to determine possible correlation of variables, for example car weight effecting miles per gallon. Finally, Multivariate Analysis involves combining 3 or more categories of observation, in order to determine if the combination of variables increase the reliability of predicting the effect on a dependent variable, for example miles per gallon.


Univariate Plots Section

Exploratory Data Analysis

Data Structure:

## 'data.frame':    2649 obs. of  25 variables:
##  $ file                  : chr  NA NA NA NA ...
##  $ year                  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ manufacturer          : chr  NA NA NA NA ...
##  $ model                 : chr  NA NA NA NA ...
##  $ description           : chr  NA NA NA NA ...
##  $ euro_standard         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ tax_band              : chr  NA NA NA NA ...
##  $ transmission          : chr  NA NA NA NA ...
##  $ transmission_type     : chr  NA NA NA NA ...
##  $ engine_capacity       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ fuel_type             : chr  NA NA NA NA ...
##  $ urban_metric          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ extra_urban_metric    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ combined_metric       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ urban_imperial        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ extra_urban_imperial  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ combined_imperial     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ noise_level           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ co2                   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ thc_emissions         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ co_emissions          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nox_emissions         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ particulates_emissions: chr  NA NA NA NA ...
##  $ fuel_cost_6000_miles  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ date_of_change        : chr  NA NA NA NA ...

Automobile manufactuers within the survey:

##  [1] NA              "BMW"           "Fiat"          "Hyundai"      
##  [5] "Citroen"       "Daihatsu"      "Honda"         "Mercedes-Benz"
##  [9] "Mitsubishi"    "Peugeot"       "Ford"          "Ferrari"      
## [13] "Audi"          "Chrysler Jeep"

AutoData’s variables:

##  [1] "file"                   "year"                   "manufacturer"          
##  [4] "model"                  "description"            "euro_standard"         
##  [7] "tax_band"               "transmission"           "transmission_type"     
## [10] "engine_capacity"        "fuel_type"              "urban_metric"          
## [13] "extra_urban_metric"     "combined_metric"        "urban_imperial"        
## [16] "extra_urban_imperial"   "combined_imperial"      "noise_level"           
## [19] "co2"                    "thc_emissions"          "co_emissions"          
## [22] "nox_emissions"          "particulates_emissions" "fuel_cost_6000_miles"  
## [25] "date_of_change"

Univariate Data Exploration Plots


Univariate Analysis

The above plots of the observations of the mtcars data reveals the data categories have continous measurements, and discreet measurements. The univariate plots order automobile models according to the variable being examined. The main feature of interest in the dataset is the miles per gallon of the automobile models as a response variable, that varies according to the predictor variables of amount of weight, horsepower, amount of cylinders, etc. Almost all the variables in the mtcars dataset have an effect on the miles per gallon of the automobile models. Therefore, the univariate plots have demonstrated that correlating automobile model’s characteristics to miles per gallon is very possible.



Bivariate Plots Section


Bivariate Data Exploration Plots


Bivariate Analysis

The analysis of the mtcars data, via bivariate plotting above, shows that that characteristics of the automobile models are usually correlated to other characteristics, and correlated to mileage.

The first plot, “MPG vs Cylinders Plot”, shows that miles per gallon decreases as the amount of cylinders increases. The second plot, “Cylinders vs Horsepower Plot”, shows that as expected, horsepower increases as the number of cylinders increases. The third plot, “MPG vs Horsepower Plot”, demonstrates that as the horsepower of the automobile models decreases, the MPG increases and is therefore correlated. The fourth plot, “MPG vs Transmission Type Plot”, examines the discrete variable, “am”, where the value “0” indictes an automatic transmission, and the value “1” indicates a manual transmission. The “am” data indicates that automobiles with automatic transmissions have better gas mileage than automobiles with manual transmissions, except in the mileage range around 21 mpg. The final bivariate analysis plot, “MPG vs Transmission Gears Plot”, demonstrates that MPGs decrease with an increase in amount of transmission gears.



Multivariate Plots Section

Multivariate Data Exploration Plots

## 
##  Spearman's rank correlation rho
## 
## data:  AutoData$combined_metric and AutoData$fuel_cost_6000_miles
## S = 4.5055, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.997774

## 
## Call:
## lm(formula = combined_metric ~ fuel_cost_6000_miles + fuel_type, 
##     data = AutoData)
## 
## Coefficients:
##          (Intercept)  fuel_cost_6000_miles       fuel_typePetrol  
##              0.92129               0.01136               0.39101
##               Fuel Cost Fuel Type
## Coefficients 0.01136108 0.3910083
## For an Automobile Model where Fuel Cost = 2000, and Fuel Type = 2, the predicted Mileage = 24.43.


Multivariate Analysis

The third part of this analysis involves quantitative visualization of mtcars variables with segmenting based on continuous vs discreet formats. The first multivariate plot, “MPG vs HP per Automobile Model”, shows the individual “MPG vs HP” ratios of the car models. The second plot, “MPG vs HP by Amount of Cylinders” shows the amount of correlation of MPG to HP for the three different engine cylinder configurations of the models. The third plot, “MPG vs Quarter Mile Time by Transmission Type” shows that manual transmissions having a higher correlation of “MPG to Quarter Mile Time”. The fourth plot, “MPG vs Displacement by Cylinders”, shows that models with 4 cylinder engines have the greatest correlation of “MPG to Displacement”. The fifth plot, “MPG vs Weight by Amount of Cylinders” shows that the 3 different engine types have even correlation of “MPG to Weight”.

Therefore, the characteristics of “Weight + Horsepower” were chosen to build a predictive model for the response variable of “MPG”. The bivariate plots also verify this choice of dependent to independent variables in the prediction model. For a final evaluation of the model’s data parameters, a Spearman’s rank correlation is run for “MPG to HP”, and “MPG to Weight”. Those correlations are visualized in the plots, “MPG to Weight Correlation” and “MPG to Horsepower Correlation”. The coefficients of the correlation is then printed out, with a 4 x 4 plot of the Linear Modeling fit.


Final Plots and Summary

Plot One

Description One

“Plot One” was chosen to begin a summary of my above findings, because this Univariate Plot demonstrates that Miles Per Gallon seems to have a dependency on specific Automobile Model types.


Plot Two

Description Two

“Plot Two” is a Bivariate Plot, that demostrates that “Weight” seems to have a direct correlation with the response variable, “MPG”.


Plot Three

Description Three

“Plot Three” is a Multivariate Plot, derived from MPG, HP, and Model Name data. This plot is useful in showing the relationship of MPG/HP per individual Automobile Model. Thereby, verifying that cars with high horsepower have low mileage, and cars with low horsepower have high mileage, and allows for selection on individual car models.


Reflection

The author’s final conclusions to this report begin with confirming the usefullness of Exploratory Data Analysis via Univariate, Bivariate, and Multivariate Analysis of data sets. This method of analysis allows for thorough examination of Data, and therefore an easier route to new discoveries of data relationships. This method of examination is possibly time consuming when in univariate analysis, however eventually the inticate examination of univariate data has the benefit of creating a more comprehensive basis for multivariate analysis.

Because mtcars’ observations are highly inter-related, prediction of MPG is reliably determined by only two variables, weight and horsepower. Increasing the observation set in the linear modeling predictive algorithm, possibly increases predictive accuracy. However, weight and horsepower seem to have the greastest correlation to miles per gallon.