Question

Read in the iris dataset as a dataframe. Using ggplot2,

  1. Create a scatterplot of sepal length with sepal width, facet by species. Run a regression on the plot using geom_smooth.
  2. Create a histogram of sepal length, change the bin size as convenient. Add a frequency polygon.
  3. Create a Boxplot of petal length with species
  4. Use par mfrow to create 2 histograms for petal length and petal width, side by side. Comment on the distributions.

Answer

First we read the iris dataset as a dataframe

data("iris")

Now we call the required packages

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Now we create the charts.

a) Scatterplot

a<-ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, col=Species))

a1<- a +  geom_point(size=1) + 
  geom_smooth(method="lm",col="firebrick", se=FALSE) +  # added regression line
  facet_wrap(~Species, nrow=2) +  # facet by species
  labs(title="Sepal length vs sepal width", 
       subtitle="From iris dataset", 
       y="Sepal Width", x="Sepal Length") # added chart labels
# a1
ggplotly(a1)
## `geom_smooth()` using formula 'y ~ x'
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

b) Histogram

b<-ggplot(iris, aes(x=Sepal.Length))

b1<- b +  geom_histogram (fill="steel blue", bins=10) + # Selected 10 bins
  geom_freqpoly(col='black', bins=10) +
  labs(title="Histogram and Frequency Polygon of Sepal length", 
       subtitle="From iris dataset", 
       y="Count", x="Sepal Length") + # added chart labels
  scale_x_continuous(breaks=seq(4.2, 8.2, 0.4)) +
  scale_y_continuous(breaks=seq(0, 30, 5)) +
  theme_classic()
# b1
ggplotly(b1)

c) Boxplot

c<-ggplot(data=iris, aes(Species, Petal.Length, col=Species))

c1<-c+geom_boxplot()+
    labs(title="Box plot of Petal Length with Species", 
       subtitle="From iris dataset", 
       y="Petal Length", x="Species")  # added chart labels
ggplotly(c1)

d) Two Histograms Compared

par(mfrow=c(1,2))
hist(iris$Petal.Length, xlab="Petal Length", 
     main="Histogram of Petal Length", col="steel blue")
hist(iris$Petal.Width, xlab="Petal Width", 
     main="Histogram of Petal Width", col="steel blue")

Comments on the Distributions:

Both the Petal Length and the Petal Width have very similar distributions. Both of them have two separate points of concentration - one at the lower end of the spectrum and another at the higher end. In both cases, the distribution of the observations concentrated at the lower end of the spectrum appears leptokurtic and positively skewed; and the observations concentrated at the higher end of the spectrum seem to have lower kurtosis and also display moderate positive skeweness.


End of Assignment