library(ggplot2)
ggplot(data = diamonds, mapping = aes(x = price)) + geom_freqpoly(mapping = aes(color = cut), bindwith = 500)
## Warning in geom_freqpoly(mapping = aes(color = cut), bindwith = 500): Ignoring
## unknown parameters: `bindwith`
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Its hard to see the differnce in distrubution because the counts differ so much
ggplot(diamonds) + geom_bar(mapping = aes(x = cut))
to make the comparison easier, we need to swap the display on y-axis. Instead od displaying count, we’ll display density, which is count that area under the curve
ggplot(data = diamonds, mapping = aes(x = price, y = ..density..)) + geom_freqpoly(mapping = aes(color = cut),bindwidth = 500)
## Warning in geom_freqpoly(mapping = aes(color = cut), bindwidth = 500): Ignoring
## unknown parameters: `bindwidth`
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
the fair diamonds have the highest average price. Thats because frequnecy polygons are la little hard to interpret.
Another alternative is the boxplot. A boxplot is a type of visual shorthand
ggplot(data = diamonds, mapping = aes(x = cut, y = price)) + geom_boxplot()
We see much less information about the distrubution, but the boxplots are much more compact, so we can more easily compare them.Supports the counterintuitive finding the better quaility diamonds
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot()
ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = reorder(class, hwy, FUN = median), y = hwy))
ggplot(data = mpg) + geom_boxplot(mapping = aes(x = reorder(class, hwy, FUN = median), y = hwy)) + coord_flip()
visualize the correlation between to continuos variable, use a scatter plot
ggplot(data = diamonds) + geom_point(mapping = aes(x = carat, y = price))
Scatterplots becomes less useful as the size of your dataset grows, because we get overplot. We can fix this using the alpha aesthetic
ggplot(data = diamonds) + geom_point(mapping = aes(x = carat, y = price), alpha = 1/100)