Gather iris dataset and load it!
df <- read.csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/iris.csv')
cols <- c('SepalLength','SepalWidth','PedalLength','PedalWidth','Class')
colnames(df) <- cols
df %>% head
## SepalLength SepalWidth PedalLength PedalWidth Class
## 1 5.1 3.5 1.4 0.2 Iris-setosa
## 2 4.9 3.0 1.4 0.2 Iris-setosa
## 3 4.7 3.2 1.3 0.2 Iris-setosa
## 4 4.6 3.1 1.5 0.2 Iris-setosa
## 5 5.0 3.6 1.4 0.2 Iris-setosa
## 6 5.4 3.9 1.7 0.4 Iris-setosa
df <- df %>% mutate(Class = as.factor(Class))
boxplot(df$Class)
## Some Other Visiualization
hist(df$PedalLength)
boxplot(df$PedalLength)
ggplot(df, aes(x=Class, y=PedalLength)) +
geom_violin(trim=FALSE)
On to the stats. I will compute statistics of PedalLength.
summary(df$PedalLength)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.600 4.350 3.759 5.100 6.900
I should include a mean and standard deviation
mean(df$PedalLength)
## [1] 3.758667
sd(df$PedalLength)
## [1] 1.76442
The dplyr package is awesome for combining different things.
df %>%
group_by(Class) %>%
summarise(PedalLength = mean(PedalLength), SepalLength = mean(SepalLength))
## # A tibble: 3 × 3
## Class PedalLength SepalLength
## <fct> <dbl> <dbl>
## 1 Iris-setosa 1.46 5.01
## 2 Iris-versicolor 4.26 5.94
## 3 Iris-virginica 5.55 6.59