Lab Homework #4

2. Regarding the fuel type variable, the value “d” represents diesel, “p” represents premium (petrol) and “r” represents regular (petrol). Do you think there is an effect of fuel type on how many miles a vehicle can run on average per gallon of fuel?

ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = fl), position = 'jitter')

From this graph, we can see that the d type fuel appear to be the most effect fuel among the other because they can run more miles per gallon both on high way and city compared to other. However, both engine displacement and number of cylinders could also affects the fuel efficiency.

ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = fl, size = displ), position = 'jitter')

Here we find out that the engine displacement does have some affact but with the same engine displacement, d type of fuel still more efficiency.

ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = fl, size = cyl), position = 'jitter')

Same to the factor of the number of cylinders, we find that with the same engine displacement, the d type of fuel is still more efficiency. So we can conclude that the d type fuel is an effect of fuel type.

3. Do you think there is a difference in fuel economy for vehicles made in 1999 and 2008? (When plotting with “year” variable, use as.factor(year) to convert it to categorical variables. This will be explained in future classes.)

ggplot(data = mpg) + geom_point(mapping = aes(x = hwy, y = cty, colour = as.factor(year)), position = 'jitter')

As we can see, the major part of the car whether made in 1999, or 2008 almost has no significant difference in this graph. Both data are pretty disperse between 15 to 35. Based on the graph before, the reason for the data be so disperse is probably because the engine they use.

4. What happens if you make a scatter plot of class vs drv? Do you think this plot is useful or not?

ggplot(data = mpg) + geom_point(mapping = aes(x = class, y = drv))

If we make a normal scatter plot of ‘class’ vs ‘drv’ we can find it most of the points are stay at the same point. This mean, there are many data repeated.

ggplot(data = mpg) + geom_point(mapping = aes(x = class, y = drv), position = 'jitter')

Even if we separated them, which makes it more useful than before because we can roughly tell the comparison between class and drv, but it is better to just use other plots like bar plots. In conclusion, this plot is not really useful.