data(mtcars)
This data set shows us 32 cars from 1973-1974 and everything about their fuel consumption and design. I chose this data set because there are many interesting variables to analyze and to find the correlation between different characteristics of cars and their fuel consumption.
The source of the data is from the built-in R data sets package. There are a total of 11 variables (all numeric, some continuous and some discrete) and and 32 observations. I would like to point out that although some of the variables (i.e transmission type) appear as numeric (0 or 1) they really are categorical as they are either one of two categories (a car can either be automatic or manual)
The data was collected from the 1974 Motor Trend US magazine. This is clearly observational data as there was no interference in the data collection process. The only thing I would be concerned about is if the conditions under which the cars were evaluated was the same for all the cars. For example, miles per gallon (mpg) is one of the factors in this data set, and this factor depends on whether you are driving in the city or on a freeway.
1) How does increased number of cylinders effect the fuel consumption of cars? Explanatory variable is the number of cylinders a car has (cyl) and the outcome variable is the miles per gallon a car has (mpg)
2) Does increased horsepower decrease fuel efficiency? Explanatory variable is the gross horsepower a car has (hp) and the outcome variable is the miles per gallon a car has (mpg)
# Create a summary table
stargazer(mtcars, type = "text", title = "Summary of mtcars dataset", summary.stat = c("n", "mean", "sd", "min", "max"))
##
## Summary of mtcars dataset
## ============================================
## Statistic N Mean St. Dev. Min Max
## --------------------------------------------
## mpg 32 20.091 6.027 10.400 33.900
## cyl 32 6.188 1.786 4 8
## disp 32 230.722 123.939 71.100 472.000
## hp 32 146.688 68.563 52 335
## drat 32 3.597 0.535 2.760 4.930
## wt 32 3.217 0.978 1.513 5.424
## qsec 32 17.849 1.787 14.500 22.900
## vs 32 0.438 0.504 0 1
## am 32 0.406 0.499 0 1
## gear 32 3.688 0.738 3 5
## carb 32 2.812 1.615 1 8
## --------------------------------------------
For the first distribution we see that the majority of the cars have 17.5 mpg and as the mpg increases and decreases there is a drop off of density. For the second distribution we see that the majority of the cars have about 100 hp and as the horsepower increases and decreases there is a drop off of density.For the third distribution we see that the majority of the cars have 8 cylinders and that there is a drop in the number of 6 cylinder cars. It is worth noting that there are no 5 and 7 cylinder engines which could account for that dip.
(a) What do we know about the targeted relationships?
We know that as the number of cylinders in a car increase so does the capability of the engine for generating more power. More cylinders usually result in a larger engine which also creates more weight. Generating more power at a heavier weight takes more fuel and so fuel efficiency (in the form of mpg) decreases as the number of engine cylinders increase.
As the horsepower of cars increase we see a trend of a decrease of fuel efficiency. This is typical seeing as higher horsepower cars produce more power and thus require more fuel to power them.
(b) If there exists prior work on the targeted relationships, how does the data used in the literature different from the one you are working with?
There does exist prior work about these two relationships. Here is a specific study which uses regression analysis to test the relationship between horsepower and fuel efficiency. The data used in this study originated from the StatLib Library (managed by Carnegie Mellon University). The data consists of 398 observations of foreign and domestic automobiles from the model years 1970 to 1982. This is significantly more data than I had in my data set as well as data from a wider time frame.
(c) If there exists prior work on the targeted relationships, what about it is incomplete/unconvincing and how can your work improve upon it?
The study is quite thorough and uses concepts like p-value to determine if the relationships between the variables are statically significant. Looking at the study there is only one graph showing the negative relationship between horsepower and mile per gallon. Seeing as this project looks weight and the number of cylinders, those graphs should be included as well.