ak@dived.me
April 2016
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres.
But how much measurements do you need to predict what kind of iris is new one? Dataset contains only 150 measurements. Is it enough for machine learning methods?
https://hokumski.shinyapps.io/predicting-iris-species-by-sepals/
GBM is R package for Generalized Boosted Regression Modeling. Let's use only two parameters from Iris dataset: sepal width and length.
This is classification question: Species ~ Sepal.Length + Sepal.Width
gbmmodel <- gbm(Species ~ Sepal.Length+Sepal.Width, distribution="multinomial", data=irisTrain, n.trees=20, shrinkage=0.1, cv.folds=5, n.minobsinnode = 2, verbose=FALSE, n.cores=1)
pred<-predict(gbmmodel, irisTest, type="response"); pred
Using 20 trees...
, , 20
setosa versicolor virginica
[1,] 0.67395608 0.23598358 0.09006033
[2,] 0.91568790 0.05751980 0.02679230
[3,] 0.71543686 0.20596069 0.07860245
[4,] 0.41606202 0.22415966 0.35977832
[5,] 0.41606202 0.22415966 0.35977832
[6,] 0.93598431 0.04093702 0.02307867
[7,] 0.95103306 0.02968722 0.01927971
[8,] 0.79983750 0.15130156 0.04886094
[9,] 0.93555089 0.04194339 0.02250571
[10,] 0.85227398 0.10485617 0.04286985
[11,] 0.84371386 0.11391033 0.04237581
[12,] 0.82698086 0.12280927 0.05020987
[13,] 0.69249118 0.20571433 0.10179448
[14,] 0.71543686 0.20596069 0.07860245
[15,] 0.93598431 0.04093702 0.02307867
[16,] 0.26357595 0.67762392 0.05880012
[17,] 0.84371386 0.11391033 0.04237581
[18,] 0.71543686 0.20596069 0.07860245
[19,] 0.84371386 0.11391033 0.04237581
[20,] 0.07057198 0.16971222 0.75971580
[21,] 0.03692297 0.90334957 0.05972746
[22,] 0.02552216 0.24153717 0.73294068
[23,] 0.03193718 0.33323251 0.63483031
[24,] 0.06567396 0.39744336 0.53688267
[25,] 0.13200724 0.37465771 0.49333505
[26,] 0.04007096 0.38092899 0.57900005
[27,] 0.04296960 0.25887829 0.69815211
[28,] 0.04655007 0.48788653 0.46556340
[29,] 0.05458295 0.42601774 0.51939932
[30,] 0.04838338 0.37763037 0.57398625
[31,] 0.14448525 0.62179200 0.23372275
[32,] 0.06567396 0.39744336 0.53688267
[33,] 0.04457614 0.34791498 0.60750889
[34,] 0.07318610 0.57121465 0.35559924
[35,] 0.09385246 0.46632003 0.43982751
[36,] 0.06519569 0.56101408 0.37379023
[37,] 0.43529502 0.44659318 0.11811181
[38,] 0.05851996 0.45674600 0.48473403
[39,] 0.04296960 0.25887829 0.69815211
[40,] 0.08886494 0.25108438 0.66005068
[41,] 0.03932795 0.20166346 0.75900859
[42,] 0.11083070 0.35252586 0.53664344
[43,] 0.01395790 0.75434855 0.23169355
[44,] 0.05201904 0.12509591 0.82288505
[45,] 0.03830939 0.40814411 0.55354650
[46,] 0.03292186 0.35074599 0.61633215
[47,] 0.25999649 0.27461650 0.46538701
[48,] 0.11083070 0.35252586 0.53664344
[49,] 0.08886494 0.25108438 0.66005068
[50,] 0.06567396 0.39744336 0.53688267
https://github.com/hokumski/ddp-predicting-iris-by-sepals