title: “Mini Class 3”
Student Name: Senet Manandhar
Data received: Built in R studi0 - Diamond Dataset
Density Plot in Various forms:
A Density Plot visualises the distribution of data over a continuous interval or time period. In our case, diamond dataset is used.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
-Density plot using one continuous variable.
-The peaks of a Density Plot help display where values are concentrated over the interval.An advantage Density Plots have over Histograms is that they’re better at determining the distribution shape because they’re not affected by the number of bins used
g <- ggplot(diamonds, aes(x=price))
g + geom_density()+
labs(title="Density plot",
subtitle="Density Plot for the Price of Diamonds",
caption="Source: In R studio",
x="Price")
g + geom_density(adjust = .5 )+ggtitle("adjust = .5")
g + geom_density(adjust = .1 )+ggtitle("adjust = .1")
How about if we have to do it in categorical variable?
g <- ggplot(diamonds, aes(price))
g + geom_density(aes(fill=diamonds$color), color = NA, alpha=.35) +
labs(title="Density plot",
subtitle="Density Plot Grouped by Number of Color",
caption="Source: In R studio",
x="Price",
fill="# Color")
# Individual densities
ggplot(diamonds, aes(x=price, fill= color))+geom_density(col = "red", alpha = .3) +
scale_x_continuous(limits = c(0,20000))+coord_cartesian(ylim = c(0, .0004)) +
facet_wrap(~color, nrow = 3)
Similary if we have continous bivariate distribution can we vizualize the density as follows:
m <- ggplot(diamonds, aes(x = price, y = carat)) +
geom_point() +
xlim(0, 19000) +
ylim(0, 6)
m + geom_density_2d()
m <- ggplot(diamonds, aes(x = price, y = carat)) +
geom_point() +
xlim(0, 10000) +
ylim(0, 3)
m + geom_density_2d()
## Warning: Removed 5225 rows containing non-finite values (stat_density2d).
## Warning: Removed 5225 rows containing missing values (geom_point).
m + stat_density_2d(geom = "tile", aes(fill= ..density..), contour = FALSE)
## Warning: Removed 5225 rows containing non-finite values (stat_density2d).
## Warning: Removed 5225 rows containing missing values (geom_point).