This assignment is done with the help of RMarkdown
tinytex::install_tinytex() library(tinytext)
ggplot(mpg, aes(displ, cty)) + geom_boxplot()
library(ggplot2)
ggplot(mpg, aes(displ, cty)) +
geom_boxplot()
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
I think here we missed the group displ part as we can see in the warning.
library(ggplot2)
ggplot(mpg, aes(displ, cty)) +
geom_boxplot(aes(group = displ))
In ggplot we need to manuyally specify the grouping. Specifying group = 1 implies that we are using a single line connecting all the points.
e.g.for using the group=1 command
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
without using the group=1 command. It gives the wrong representation of the plot and matches all values to 1.
library(ggplot2)
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop..))
group =1 helps creates a dummy variable to counter effect the denominator issue while calculating proportions. There will not be any difference for group =1 and group =2 unless 2 is a category.
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 2))
## Q3: How many bars are in each of the following plots? (Hint: try adding an outline around each bar with color = "white")
library(ggplot2)
ggplot(mpg, aes(drv)) +
geom_bar()
library(ggplot2)
ggplot(mpg, aes(drv, fill = hwy, group = hwy)) +
geom_bar()
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mpg2 <- mpg %>% arrange(hwy) %>% mutate(id = seq_along(hwy))
ggplot(mpg2, aes(drv, fill = hwy, group = id)) +
geom_bar()
All plots has 3 bars
geom_smooth: adds standard error bands to plot lines. The function geom_point() makes scatter plots.
Plotting population vs year with data from gapminder
using geom_point() before geom_smooth
library(ggplot2)
library("gapminder")
ggplot(gapminder, aes(pop,year)) +
geom_point()+
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
using geom_point() after geom_smooth
library(ggplot2)
library("gapminder")
ggplot(gapminder, aes(pop,year)) +
geom_smooth()+
geom_point()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
There is no difference on data data smoothing line neither location of points. The only difference I can see is overlay properties. In the first case where geom_point() is used before geom_smooth(), geom_point() are behind the geom_smooth() layer. This tells us that ggplot creates layers in sequence on top of each other.
library(ggplot2)
library("gapminder")
ggplot(gapminder, aes(pop,year)) +
geom_point()+
geom_smooth()+
scale_x_log10()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
library(ggplot2)
library("gapminder")
ggplot(gapminder, aes(pop,year)) +
geom_point()+
geom_smooth()+
scale_x_sqrt()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
library(ggplot2)
library("gapminder")
ggplot(gapminder, aes(pop,year)) +
geom_point()+
geom_smooth()+
scale_x_reverse()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
library(ggplot2)
library("gapminder")
ggplot(gapminder, aes(pop,year)) +
geom_point()+
geom_smooth()+
scale_y_reverse()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
We should use log scales when we have skewness towards large values and when we want to show percent change or multiplicative factors. Sometime reversing axis makes more sense for certain types of plots and data.
##Q6)Experiment with different ways to facet the data. What happens when you try plotting population and per capita GDP while faceting on year, or even on country? Experimentwith the height and width of the figure.
population and per capita GDP while faceting on year:
library(ggplot2)
library(gapminder)
library(dplyr)
ggplot(gapminder) +
geom_point(aes(x = gdpPercap, y = pop, color = continent)) +
scale_x_log10() +
facet_wrap(~ year)
population and per capita GDP while faceting on country:
library(ggplot2)
library(gapminder)
library(dplyr)
ggplot(gapminder) +
geom_point(aes(x = gdpPercap, y = pop, color = continent)) +
scale_x_log10() +
facet_wrap(~ country)
population and per capita GDP while faceting on country:
library(ggplot2)
library(gapminder)
library(dplyr)
ggplot(gapminder) +
geom_point(aes(x = gdpPercap, y = pop, color = continent)) +
scale_x_log10() +
facet_wrap(~ year)+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Faceting on year is not feasible as there are too many countries for one plot. Faceting with year is managable.
Experiments with height and width of figure:
I am not sure for fixing width and height as we have to submit assignment as an .RMD file. But we can export the graphs in desired width and height in R. e.g. png(filename=“DataVizClass.png”, width=600, height=600)
Or we can play with axis to change height and width of plot:
library(ggplot2)
library(gapminder)
library(dplyr)
ggplot(gapminder) +
geom_point(aes(x = gdpPercap, y = pop, color = continent)) +
scale_y_log10() +
facet_wrap(~ year)
library(ggplot2)
library(gapminder)
library(dplyr)
ggplot(gapminder) +
geom_point(aes(x = gdpPercap, y = pop, color = continent)) +
facet_wrap(~ year)
facet_grid(sex ~ race) will facet the relationship by sex and race of the respondent.
install.packages(“socviz”)
library(ggplot2)
library(socviz)
p <- ggplot(data = gss_sm,
mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) +
geom_smooth() +
facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
library(ggplot2)
library(socviz)
p <- ggplot(data = gss_sm,
mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) +
geom_smooth() +
facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
library(ggplot2)
library(socviz)
p <- ggplot(data = gss_sm,
mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) +
geom_smooth() +
facet_grid(~sex + race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
library(ggplot2)