In this lab we will be using the ‘diamonds’ dataset which is included in the ggplot2 package. You’ll need to load the library!
#install.packages("ggplot2")
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 4.0.3
head(diamonds)
diamonds[c(1, 2, 3),]
dim(diamonds)[1]
## [1] 53940
print("percent fair:")
## [1] "percent fair:"
length(which(diamonds[2] == "Fair"))/c(dim(diamonds))[1]*100
## [1] 2.984798
print("percent ideal:")
## [1] "percent ideal:"
length(which(diamonds[2] == "Ideal"))/c(dim(diamonds))[1]*100
## [1] 39.95365
ggplot(diamonds, aes(x = color, fill = cut)) + geom_bar()
2. Now use ggplot2 to create a histogram of the carat variable. Try changing the bin size with the ‘binwidth’ command, and color by cut. What are your observations?
ggplot(diamonds, aes(x = carat, fill = cut)) + geom_histogram(binwidth = 5)
ggplot(diamonds, aes(x = carat, fill = cut)) + geom_histogram(binwidth = 10)
ggplot(diamonds, aes(x = carat, fill = cut)) + geom_histogram(binwidth = 15)
ggplot(diamonds, aes(x = carat, fill = cut)) + geom_histogram(binwidth = 1)
ggplot(diamonds, aes(x = carat, fill = cut)) + geom_histogram(binwidth = 0.01)
ggplot(diamonds, aes(x = carat, fill = cut)) + geom_histogram(binwidth = 0.1)
3. Get side-by-side boxplots of the carat variable, grouped by cut. Do the boxplots tell you anything that the histogram does not?
ggplot(diamonds, aes(x = cut, y = carat, fill = cut)) + geom_boxplot()
4. Use an R command to subset the diamonds weigh over 2 carat? How many are there? Which cut do they correspond to? Get side by side boxplots of this set to get an idea how the cuts are dispersed.
length((which(diamonds[,1] >2)))
## [1] 1889
heavyDiamonds <- subset(diamonds, carat>2, select=c("carat", "cut"))
ggplot(heavyDiamonds, aes(x = cut, y =carat), fill = cut) + geom_boxplot()
print("there are 1889 of them. they correspond to cuts as shown in the boxplot.")
## [1] "there are 1889 of them. they correspond to cuts as shown in the boxplot."
ggplot(heavyDiamonds, aes(x = cut, y =carat), fill = cut) + geom_violin()
print("if a diamond is in this set, it is most likely to be around 2 carats")
## [1] "if a diamond is in this set, it is most likely to be around 2 carats"