Lab Homework #4
1.What is the most popular fuel type in this data set?
ggplot(data = mpg) + geom_bar(mapping = aes(x = fl, fill = fl))

As we can see, the most use fuel type is r, which mean it is the most
popular fuel type in this data set.
2. Regarding the fuel type variable, the value “d” represents
diesel, “p” represents premium (petrol) and “r” represents regular
(petrol). Do you think there is an effect of fuel type on how many miles
a vehicle can run on average per gallon of fuel?
ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = fl), position = 'jitter')
From this graph, we can see that the d type fuel appear to be the most
effect fuel among the other because they can run more miles per gallon
both on high way and city compared to other. However, both engine
displacement and number of cylinders could also affects the fuel
efficiency.
ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = fl, size = displ), position = 'jitter')
Here we find out that the engine displacement does have some affact but
with the same engine displacement, d type of fuel still more
efficiency.
ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = fl, size = cyl), position = 'jitter')
Same to the factor of the number of cylinders, we find that with the
same engine displacement, the d type of fuel is still more efficiency.
So we can conclude that the d type fuel is an effect of fuel type.
3. Do you think there is a difference in fuel economy for vehicles
made in 1999 and 2008? (When plotting with “year” variable, use
as.factor(year) to convert it to categorical variables.
This will be explained in future classes.)
ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = as.factor(year)), position = 'jitter')
As we can see, the major part of the car whether made in 1999, or 2008
almost has no significant difference in this graph. Both data are pretty
disperse between 15 to 35. Based on the graph before, the reason for the
data be so disperse is probably because the engine they use.
4. What happens if you make a scatter plot of class vs
drv? Do you think this plot is useful or not?
ggplot(data = mpg) + geom_point(mapping = aes(x = class, y = drv))
If we make a normal scatter plot of ‘class’ vs ‘drv’ we can find it most
of the points are stay at the same point. This mean, there are many data
repeated.
ggplot(data = mpg) + geom_point(mapping = aes(x = class, y = drv), position = 'jitter')
Even if we separated them, which makes it more useful than before
because we can roughly tell the comparison between class and drv, but it
is better to just use other plots like bar plots. In conclusion, this
plot is not really useful.