Group 1
Arielle King Augustina Odediran Ebitu Ukiwe Tiera Whitehead Aderonke Adetunji
ggplot makes it easy to edit labels and make a
title.
For example, let’s look at the midwest data set, which
contains demographic information of midwest counties.
library(dplyr); library(ggplot2)
head(midwest)
dim(midwest)
We should be comfortable making a scatter plot. Here’s one of
percbelowpoverty by percollege.
ggplot(midwest, aes(x = percollege, y = percbelowpoverty)) +
geom_point()
Unfortunately, while it is clear what they are representing, the
variable labels are not pleasing to read on a plot. We can change the
labels using the xlab() and ylab() commands,
and can add a title using ggtitle().
ggplot(midwest, aes(x = percollege, y = percbelowpoverty)) +
geom_point() +
xlab("Percent college educated") +
ylab("Percent below the poverty line") +
ggtitle("Education % versus poverty % among 437 midwest counties")
Describe what the following graph is showing, and add appropriate labels and a title Answer The graph illustrated the Percent college educated on x axis and midwest states from 2000 US census
?midwest
ggplot(midwest, aes(x = percollege, colour = state)) +
geom_density() +
xlab("% College Educated") +
ylab("Percent") +
ggtitle("Proportion of College Educated Population by State")
Refer to the power point presentation examples on the mpg data set. Reproduce the five charts shown in slides 11 TO 25. Typeset the code you see in the slide and generate the plot. Add a main title and customize the labels on the axes to improve each plot. Finally, write a short interpretation of each plot, and type one weakness(apart from the labels and theme) that hinder how one may interpret the visualization if applicable.
Slide 11
ggplot(data = mpg, aes(x = cty))+
geom_histogram(bins = 15)+
ggtitle("Distribution of City MPG") +
xlab("City MPG")
#theme(title = "Distribute to City MPG")
A weakness of this graph is that we are only given bins and cannot see discreet values.
SLIDE 13 Answer The boxplot shows the 5 number summary distribution of highway mpg of each class of cars. A weakness of box plots is we cant see clusters of data.
ggplot(data = mpg, aes(x = class, y= hwy))+
geom_boxplot()+
xlab("Class") +
ylab("MPG") +
ggtitle("Highway MPG by Class")
SLIDE 15 Answer The boxplot shows the 5 number summary distribution of highway mpg of each class of cars. A weakness of box plots is we cant see clusters of data.
ggplot(data = mpg, aes(x = class, y= hwy))+
geom_violin()+
xlab("Class") +
ylab("MPG") +
ggtitle("Highway MPG by Class")
SLIDE 19 Answer : This density plot shows the differenct classes and their high mile per gallon. This graph is kind of busy with 6 of the classes with similar distributions.
ggplot(data = mpg, aes(x = hwy, colour = class))+
geom_density()+
xlab("Hight MPG") +
ggtitle("Highway MPG")
SLIDE 22 Answer : This graph has a weakness and you can not identify the groups within the curve.
ggplot(data = mpg, aes(x = displ, y = hwy))+
geom_point() +
geom_smooth(method = "loess")+
xlab("Engine Displacement") +
ylab("MPG")+
ggtitle("Highway MPG")
Answer : This graph has a weakness where some of the confidence intervals might be influence by specific data points.
ggplot(data = mpg, aes(x = displ, y = hwy, colour = class))+
geom_smooth(method = "loess")+
xlab("Engine Displacement") +
ylab("MPG")+
ggtitle("Highway MPG by Class")
Answer : This graph has a weakness because it may cause
any overfitting values to be hidden.
ggplot(data = mpg, aes(x = displ, y = hwy))+
geom_jitter()+
xlab("Engine Displacement") +
ylab("MPG")+
ggtitle("Highway MPG")