pitch

2023-12-29

Iris dataset

The Iris dataset consists of 150 observations of 5 variables.

Sepal length
Sepal width
Petal length
Petal width
Species

Aims

My aims for this project are to provide a nice visualisation for how the variables impact the variable we want to predict, and to also provide a prediction of this.

Ensemble

To do this I am using an ensemble of multiple machine learning models, with a voting system to determine the final prediction. The models I am using are: random forest, k-nearest-neighbours, gradient boost, linear determinent analysis, and a neural network.

models <- readRDS("project/model.rds")
df <- data.frame(
    Sepal.Length = 7,
    Sepal.Width = 3,
    Petal.Length = 3.7,
    Petal.Width = 1.2
)
predict(models, newdata = df)

##      rf           knn          gbm          lda          nnet    
## [1,] "versicolor" "versicolor" "versicolor" "versicolor" "setosa"

Results

The ensemble model was successful, giving an accuracy of 98.33% on 40% of the data that was kept aside, though this is a relatively small dataset, it would be interesting to see how this kind of ensemble fairs on a more commplex dataset with more variables and observations.

For a visualisation I used a combination of two plotly plots, showing the Sepal width x length and Petal width x length, coloured by the species. I also add a point to show the user where their selection would be.