Bidhan Subedi

1/14/2022

Gather iris dataset and load it!

df <- read.csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/iris.csv')
cols <- c('SepalLength','SepalWidth','PedalLength','PedalWidth','Class')
colnames(df) <- cols
df %>% head
##   SepalLength SepalWidth PedalLength PedalWidth       Class
## 1         5.1        3.5         1.4        0.2 Iris-setosa
## 2         4.9        3.0         1.4        0.2 Iris-setosa
## 3         4.7        3.2         1.3        0.2 Iris-setosa
## 4         4.6        3.1         1.5        0.2 Iris-setosa
## 5         5.0        3.6         1.4        0.2 Iris-setosa
## 6         5.4        3.9         1.7        0.4 Iris-setosa
df <- df %>% mutate(Class = as.factor(Class))

Boxplot of the categorical variable Class

boxplot(df$Class)

## Some Other Visiualization

hist(df$PedalLength)

boxplot(df$PedalLength)

ggplot(df, aes(x=Class, y=PedalLength)) + 
  geom_violin(trim=FALSE)

On to the stats. I will compute statistics of PedalLength.

summary(df$PedalLength)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.600   4.350   3.759   5.100   6.900

I should include a mean and standard deviation

mean(df$PedalLength)
## [1] 3.758667
sd(df$PedalLength)
## [1] 1.76442

The dplyr package is awesome for combining different things.

df %>% 
  group_by(Class) %>%
  summarise(PedalLength = mean(PedalLength), SepalLength = mean(SepalLength))
## # A tibble: 3 × 3
##   Class           PedalLength SepalLength
##   <fct>                 <dbl>       <dbl>
## 1 Iris-setosa            1.46        5.01
## 2 Iris-versicolor        4.26        5.94
## 3 Iris-virginica         5.55        6.59