The Iris dataset consists of 150 observations of 5 variables.
- Sepal length
- Sepal width
- Petal length
- Petal width
- Species
2023-12-29
The Iris dataset consists of 150 observations of 5 variables.
My aims for this project are to provide a nice visualisation for how the variables impact the variable we want to predict, and to also provide a prediction of this.
To do this I am using an ensemble of multiple machine learning models, with a voting system to determine the final prediction. The models I am using are: random forest, k-nearest-neighbours, gradient boost, linear determinent analysis, and a neural network.
models <- readRDS("project/model.rds")
df <- data.frame(
Sepal.Length = 7,
Sepal.Width = 3,
Petal.Length = 3.7,
Petal.Width = 1.2
)
predict(models, newdata = df)
## rf knn gbm lda nnet ## [1,] "versicolor" "versicolor" "versicolor" "versicolor" "setosa"
The ensemble model was successful, giving an accuracy of 98.33% on 40% of the data that was kept aside, though this is a relatively small dataset, it would be interesting to see how this kind of ensemble fairs on a more commplex dataset with more variables and observations.
For a visualisation I used a combination of two plotly plots, showing the Sepal width x length and Petal width x length, coloured by the species. I also add a point to show the user where their selection would be.