Introduction

In the dynamic landscape of social media, staying at the forefront of innovation is crucial for any company seeking to engage and captivate its users. Our machine learning project delves into the captivating world of image recognition, with a singular focus on its application and business implications for our esteemed client, a leading social media company.

The primary objective of this project is to construct a diverse array of ten machine learning models designed to perform predictive classifications through cutting-edge image recognition techniques. By harnessing the power of artificial intelligence and advanced algorithms, our project aims to elevate our client’s platform by providing intelligent image recognition capabilities.

As a social media company, our client understands the pivotal role that visuals play in capturing the attention of its users. By integrating image recognition technology into their platform, our client seeks to offer its users an enhanced and personalized experience. The ability to accurately recognize and classify images opens up a myriad of exciting business implications that can revolutionize user engagement and content discovery.

One of the most significant impacts of our machine learning project is its potential to transform content discovery on the social media platform. By empowering the platform with image recognition capabilities, users can effortlessly find relevant content and connect with like-minded individuals, leading to increased user retention and satisfaction.

Moreover, the predictive classifications generated by our machine learning models enable our client to tailor content recommendations to individual preferences and interests. This level of personalization enhances user satisfaction and fosters stronger user-brand relationships, ultimately driving user loyalty and platform growth.

In the realm of content moderation, image recognition technology offers invaluable support in identifying and flagging inappropriate or harmful content. By automating the content moderation process, our client can ensure a safe and respectful online community, maintaining a positive user experience and protecting its brand reputation.

Beyond the user experience, image recognition also unlocks innovative marketing opportunities for our client. By analyzing user-generated content and trends, our machine learning models enable targeted and data-driven advertising strategies, optimizing ad placement and increasing ad relevance for users and advertisers alike.

While the potential benefits of image recognition are immense, we are also mindful of the ethical implications associated with AI and machine learning. Our project is committed to addressing privacy concerns and ensuring unbiased image analysis, safeguarding user data and maintaining the trust of the social media community.

In conclusion, this image recognition machine learning project is a transformative endeavor for our social media client. By deploying ten powerful predictive classification models, we aim to revolutionize content discovery, enhance user engagement, optimize content moderation, and unlock new marketing opportunities. Together, we embark on a journey that not only elevates the client’s social media platform but also sets new industry standards for innovation and user-centric experiences in the ever-evolving world of social media.

Model 1: Neural Network

A Neural Network model is a deep learning model that takes inspiration from the biological neuron. It’s consisted of networks of neurons, taking in an input layer, one or more hidden layers, and a output layer. With our dataset of 7 x 7 images, which is 49 pixels, there will be 49 inputs in the neural network model.

Advantages:
General & adaptive algorithm; work directly on raw data; capture complex non-linear relationships

Disadvantages:
Less flexibility & complexity; tricky hyper-parameters, challenging to interpret the learned weights and understand the decision-making process

The nnet() package & model is a basic neural network algorithm that only allows for a single hidden layer, which is the layer between the input layer and the output layer where the actual learning and feature extraction occur. There are several parameters to consider for the model.

The size parameter sets the number of hidden nodes or neurons in the neural network. The decay parameter, also known as weight decay or L2 regularization, is a regularization technique used to prevent over-fitting in neural network models. A higher value of decay will result in a stronger regularization effect, leading to smaller weights in the network. It prevents the model from memorizing noise in the training data and encourages it to learn more general patterns. Max weight is used to set an upper limit on the magnitude of the weights in the neural network model. It restricts the maximum absolute value of the weights during training. Max iteration, or maxit, controls the maximum number of iterations for training. Increasing the maxit allows the neural network more opportunities to learn from the data, potentially leading to better convergence and a more accurate model. However, setting the maxit too high also risk overfitting the model on the training data.

Here, we set the parameters for a single layer neural network as follows:

size: 10 decay: 0.2 max weight: 10000 max iteration: 300

# weights:  610
initial  value 2774.484680 
iter  10 value 1329.090586
iter  20 value 1016.719348
iter  30 value 798.867900
iter  40 value 701.260546
iter  50 value 668.863613
iter  60 value 649.390439
iter  70 value 641.597423
iter  80 value 637.368077
iter  90 value 635.687325
iter 100 value 634.912516
iter 110 value 633.777295
iter 120 value 633.051445
iter 130 value 632.338245
iter 140 value 631.650511
iter 150 value 630.947008
iter 160 value 630.127352
iter 170 value 629.578567
iter 180 value 628.976374
iter 190 value 628.168483
iter 200 value 626.696657
iter 210 value 625.164084
iter 220 value 624.043113
iter 230 value 623.499843
iter 240 value 622.949543
iter 250 value 622.596924
iter 260 value 622.107510
iter 270 value 621.635305
iter 280 value 621.451368
iter 290 value 621.334965
iter 300 value 621.241916
final  value 621.241916 
stopped after 300 iterations
# weights:  610
initial  value 2541.578810 
iter  10 value 1204.115043
iter  20 value 909.266756
iter  30 value 761.330009
iter  40 value 685.286542
iter  50 value 650.041716
iter  60 value 637.638572
iter  70 value 631.696833
iter  80 value 626.439889
iter  90 value 618.039066
iter 100 value 609.251913
iter 110 value 606.285055
iter 120 value 604.420625
iter 130 value 603.469319
iter 140 value 602.757696
iter 150 value 602.337337
iter 160 value 601.898210
iter 170 value 601.580838
iter 180 value 601.378824
iter 190 value 601.263140
iter 200 value 601.172908
iter 210 value 601.109213
iter 220 value 601.068639
iter 230 value 600.280899
iter 240 value 599.379587
iter 250 value 599.165985
iter 260 value 599.018728
iter 270 value 598.988761
iter 280 value 598.976992
iter 290 value 598.971842
iter 300 value 598.967052
final  value 598.967052 
stopped after 300 iterations
# weights:  610
initial  value 2469.686353 
iter  10 value 1326.675917
iter  20 value 1088.694244
iter  30 value 822.477444
iter  40 value 720.975405
iter  50 value 694.705253
iter  60 value 669.965182
iter  70 value 660.123526
iter  80 value 653.836705
iter  90 value 649.164322
iter 100 value 646.282243
iter 110 value 644.755048
iter 120 value 643.549542
iter 130 value 642.451061
iter 140 value 641.760442
iter 150 value 641.424934
iter 160 value 641.197800
iter 170 value 641.108973
iter 180 value 641.061569
iter 190 value 641.051512
iter 200 value 641.049701
iter 210 value 641.048996
iter 220 value 641.048827
final  value 641.048814 
converged
# weights:  610
initial  value 11944.075818 
iter  10 value 5238.876264
iter  20 value 3959.623375
iter  30 value 3348.637150
iter  40 value 3103.399513
iter  50 value 2949.568342
iter  60 value 2870.811898
iter  70 value 2795.601551
iter  80 value 2750.655431
iter  90 value 2722.180377
iter 100 value 2698.132381
iter 110 value 2670.254783
iter 120 value 2651.038405
iter 130 value 2623.302225
iter 140 value 2598.587694
iter 150 value 2573.550642
iter 160 value 2551.967908
iter 170 value 2535.063444
iter 180 value 2519.107747
iter 190 value 2511.738653
iter 200 value 2505.325546
iter 210 value 2501.560330
iter 220 value 2498.402393
iter 230 value 2491.430967
iter 240 value 2481.529004
iter 250 value 2477.659258
iter 260 value 2475.784902
iter 270 value 2474.627328
iter 280 value 2473.517060
iter 290 value 2472.585357
iter 300 value 2471.934098
final  value 2471.934098 
stopped after 300 iterations
# weights:  610
initial  value 13106.564617 
iter  10 value 6731.584566
iter  20 value 4444.178759
iter  30 value 3742.758405
iter  40 value 3372.433788
iter  50 value 3143.220850
iter  60 value 2984.345057
iter  70 value 2875.821216
iter  80 value 2812.192743
iter  90 value 2772.456516
iter 100 value 2749.472392
iter 110 value 2705.617200
iter 120 value 2651.143643
iter 130 value 2604.909519
iter 140 value 2566.879847
iter 150 value 2539.529403
iter 160 value 2515.373549
iter 170 value 2498.340371
iter 180 value 2482.199920
iter 190 value 2472.050449
iter 200 value 2465.749218
iter 210 value 2460.832484
iter 220 value 2455.497100
iter 230 value 2450.557398
iter 240 value 2446.194898
iter 250 value 2443.561051
iter 260 value 2439.777076
iter 270 value 2435.435337
iter 280 value 2432.589206
iter 290 value 2427.917390
iter 300 value 2420.869194
final  value 2420.869194 
stopped after 300 iterations
# weights:  610
initial  value 13017.066746 
iter  10 value 6144.303536
iter  20 value 4152.415934
iter  30 value 3469.701318
iter  40 value 3147.357579
iter  50 value 2908.462279
iter  60 value 2760.576852
iter  70 value 2669.946676
iter  80 value 2615.328608
iter  90 value 2592.065020
iter 100 value 2580.864458
iter 110 value 2573.007661
iter 120 value 2560.262108
iter 130 value 2542.869800
iter 140 value 2530.383731
iter 150 value 2523.044035
iter 160 value 2517.479586
iter 170 value 2513.375432
iter 180 value 2505.031053
iter 190 value 2497.753511
iter 200 value 2492.698363
iter 210 value 2486.328661
iter 220 value 2481.217007
iter 230 value 2478.670339
iter 240 value 2476.373062
iter 250 value 2473.390624
iter 260 value 2470.332410
iter 270 value 2465.842106
iter 280 value 2460.882564
iter 290 value 2456.744115
iter 300 value 2452.688000
final  value 2452.688000 
stopped after 300 iterations
# weights:  610
initial  value 23927.905259 
iter  10 value 9716.776783
iter  20 value 7243.170353
iter  30 value 6076.117489
iter  40 value 5639.087305
iter  50 value 5359.385414
iter  60 value 5188.309096
iter  70 value 5083.511536
iter  80 value 5027.346681
iter  90 value 4978.107830
iter 100 value 4923.449119
iter 110 value 4863.969706
iter 120 value 4832.275330
iter 130 value 4809.628392
iter 140 value 4790.207231
iter 150 value 4777.718390
iter 160 value 4764.846090
iter 170 value 4743.893177
iter 180 value 4723.001151
iter 190 value 4711.687296
iter 200 value 4699.027214
iter 210 value 4687.618987
iter 220 value 4679.733212
iter 230 value 4672.234396
iter 240 value 4663.988711
iter 250 value 4652.406962
iter 260 value 4644.981505
iter 270 value 4636.521302
iter 280 value 4631.171033
iter 290 value 4627.597051
iter 300 value 4623.936775
final  value 4623.936775 
stopped after 300 iterations
# weights:  610
initial  value 25293.058922 
iter  10 value 11406.879075
iter  20 value 7706.171200
iter  30 value 6621.099743
iter  40 value 5869.173017
iter  50 value 5581.810415
iter  60 value 5384.811144
iter  70 value 5240.329143
iter  80 value 5151.030926
iter  90 value 5090.621592
iter 100 value 5042.088298
iter 110 value 5002.565401
iter 120 value 4968.229573
iter 130 value 4936.411656
iter 140 value 4918.401043
iter 150 value 4908.728469
iter 160 value 4901.411477
iter 170 value 4893.068574
iter 180 value 4884.180612
iter 190 value 4877.490637
iter 200 value 4868.845397
iter 210 value 4858.569797
iter 220 value 4851.312011
iter 230 value 4843.097700
iter 240 value 4837.552522
iter 250 value 4832.944672
iter 260 value 4828.282441
iter 270 value 4820.775420
iter 280 value 4813.915120
iter 290 value 4809.933231
iter 300 value 4806.795720
final  value 4806.795720 
stopped after 300 iterations
# weights:  610
initial  value 25365.583754 
iter  10 value 11943.466995
iter  20 value 8994.377853
iter  30 value 7875.040983
iter  40 value 7034.759331
iter  50 value 6305.302975
iter  60 value 6081.669126
iter  70 value 5703.922349
iter  80 value 5515.280179
iter  90 value 5315.665641
iter 100 value 5189.547129
iter 110 value 5112.061071
iter 120 value 5045.369499
iter 130 value 4981.216107
iter 140 value 4906.769359
iter 150 value 4825.263094
iter 160 value 4750.646753
iter 170 value 4704.653864
iter 180 value 4671.777300
iter 190 value 4653.218229
iter 200 value 4642.306360
iter 210 value 4628.073029
iter 220 value 4617.075628
iter 230 value 4607.537535
iter 240 value 4599.682218
iter 250 value 4592.495691
iter 260 value 4583.236723
iter 270 value 4573.548930
iter 280 value 4568.383321
iter 290 value 4564.881621
iter 300 value 4561.074490
final  value 4561.074490 
stopped after 300 iterations

Model 2: Neural Network (altered parameters)

For this second nnet neural netwowrk model, we will still use the neural Network nnet package and the defined function above, but with different parameters set.

For this model, we are increasing the size of the single hidden layer with 25 neurons. Adding more nodes to the hidden layer increases the model’s capacity to learn complex patterns in the data. However, increasing the size too much can lead to over-fitting on the training data.

size: 25 decay: 0.2 max weight: 10000 max iteration: 100

# weights:  1510
initial  value 2811.245073 
iter  10 value 1178.171741
iter  20 value 816.414641
iter  30 value 708.873097
iter  40 value 656.810631
iter  50 value 629.719173
iter  60 value 613.289735
iter  70 value 602.686541
iter  80 value 597.698447
iter  90 value 593.712441
iter 100 value 590.271050
iter 110 value 587.832663
iter 120 value 586.442261
iter 130 value 585.123959
iter 140 value 583.865685
iter 150 value 583.166047
iter 160 value 582.627114
iter 170 value 582.310072
iter 180 value 581.942816
iter 190 value 581.301585
iter 200 value 580.964042
iter 210 value 580.784885
iter 220 value 580.627945
iter 230 value 580.511718
iter 240 value 580.420677
iter 250 value 580.371496
iter 260 value 580.332088
iter 270 value 580.292320
iter 280 value 580.268074
iter 290 value 580.249403
iter 300 value 580.228353
final  value 580.228353 
stopped after 300 iterations
# weights:  1510
initial  value 2986.547943 
iter  10 value 1216.212637
iter  20 value 817.295117
iter  30 value 706.314112
iter  40 value 637.248354
iter  50 value 604.961232
iter  60 value 591.810822
iter  70 value 585.111764
iter  80 value 580.647664
iter  90 value 574.443771
iter 100 value 571.002553
iter 110 value 568.420935
iter 120 value 567.228833
iter 130 value 566.571855
iter 140 value 566.145112
iter 150 value 565.620084
iter 160 value 565.087583
iter 170 value 564.769160
iter 180 value 564.601948
iter 190 value 564.423659
iter 200 value 564.292667
iter 210 value 564.182097
iter 220 value 564.073673
iter 230 value 564.018629
iter 240 value 563.910100
iter 250 value 563.838850
iter 260 value 563.789166
iter 270 value 563.717950
iter 280 value 563.668343
iter 290 value 563.645119
iter 300 value 563.619537
final  value 563.619537 
stopped after 300 iterations
# weights:  1510
initial  value 2802.979759 
iter  10 value 1196.501008
iter  20 value 913.911395
iter  30 value 804.490919
iter  40 value 711.485553
iter  50 value 654.871543
iter  60 value 632.152386
iter  70 value 621.495392
iter  80 value 615.311504
iter  90 value 611.198086
iter 100 value 609.090105
iter 110 value 607.560611
iter 120 value 605.869160
iter 130 value 603.463955
iter 140 value 602.292083
iter 150 value 601.879836
iter 160 value 601.588756
iter 170 value 601.340529
iter 180 value 601.207810
iter 190 value 601.114824
iter 200 value 600.999297
iter 210 value 600.782700
iter 220 value 600.599393
iter 230 value 600.378393
iter 240 value 600.231993
iter 250 value 600.120747
iter 260 value 600.043350
iter 270 value 600.000019
iter 280 value 599.951830
iter 290 value 599.894717
iter 300 value 599.842184
final  value 599.842184 
stopped after 300 iterations
# weights:  1510
initial  value 14856.388037 
iter  10 value 6204.200735
iter  20 value 3756.624764
iter  30 value 3196.988499
iter  40 value 2874.371097
iter  50 value 2695.617188
iter  60 value 2594.250974
iter  70 value 2504.135574
iter  80 value 2431.483354
iter  90 value 2388.413001
iter 100 value 2349.948863
iter 110 value 2322.434104
iter 120 value 2298.645554
iter 130 value 2278.917473
iter 140 value 2263.231760
iter 150 value 2250.019741
iter 160 value 2238.474576
iter 170 value 2228.508246
iter 180 value 2220.822370
iter 190 value 2215.633557
iter 200 value 2210.229227
iter 210 value 2202.448250
iter 220 value 2195.603458
iter 230 value 2190.342129
iter 240 value 2184.843778
iter 250 value 2180.554768
iter 260 value 2176.847913
iter 270 value 2173.618258
iter 280 value 2170.624899
iter 290 value 2167.427004
iter 300 value 2164.676386
final  value 2164.676386 
stopped after 300 iterations
# weights:  1510
initial  value 14165.935593 
iter  10 value 5746.559874
iter  20 value 3713.743844
iter  30 value 2966.391696
iter  40 value 2684.800362
iter  50 value 2560.624910
iter  60 value 2478.281106
iter  70 value 2395.022142
iter  80 value 2321.905133
iter  90 value 2281.815265
iter 100 value 2249.293021
iter 110 value 2219.745780
iter 120 value 2196.691388
iter 130 value 2180.746210
iter 140 value 2168.973238
iter 150 value 2160.549385
iter 160 value 2154.555491
iter 170 value 2149.803358
iter 180 value 2144.097372
iter 190 value 2139.587114
iter 200 value 2134.842284
iter 210 value 2130.334383
iter 220 value 2124.617937
iter 230 value 2120.187880
iter 240 value 2116.813244
iter 250 value 2114.848720
iter 260 value 2113.351543
iter 270 value 2111.829479
iter 280 value 2110.216555
iter 290 value 2108.165635
iter 300 value 2106.440977
final  value 2106.440977 
stopped after 300 iterations
# weights:  1510
initial  value 13161.805032 
iter  10 value 5768.201103
iter  20 value 3915.976675
iter  30 value 3261.216984
iter  40 value 3049.705016
iter  50 value 2730.548153
iter  60 value 2555.835557
iter  70 value 2484.112966
iter  80 value 2392.398612
iter  90 value 2334.611604
iter 100 value 2300.836341
iter 110 value 2272.861335
iter 120 value 2251.022639
iter 130 value 2232.272001
iter 140 value 2216.478870
iter 150 value 2201.476062
iter 160 value 2189.346137
iter 170 value 2179.050405
iter 180 value 2172.350759
iter 190 value 2168.329186
iter 200 value 2165.235664
iter 210 value 2162.928980
iter 220 value 2160.909086
iter 230 value 2158.264811
iter 240 value 2156.242336
iter 250 value 2154.174104
iter 260 value 2151.373707
iter 270 value 2149.720071
iter 280 value 2148.473546
iter 290 value 2147.357747
iter 300 value 2145.821039
final  value 2145.821039 
stopped after 300 iterations
# weights:  1510
initial  value 26503.379979 
iter  10 value 10628.303280
iter  20 value 7081.807392
iter  30 value 5916.309905
iter  40 value 5402.207533
iter  50 value 5058.756371
iter  60 value 4833.125574
iter  70 value 4690.070780
iter  80 value 4607.034552
iter  90 value 4555.558389
iter 100 value 4492.093718
iter 110 value 4447.044112
iter 120 value 4415.927287
iter 130 value 4378.905908
iter 140 value 4343.443519
iter 150 value 4309.850435
iter 160 value 4274.773583
iter 170 value 4238.383409
iter 180 value 4198.713163
iter 190 value 4167.130649
iter 200 value 4144.218970
iter 210 value 4126.370295
iter 220 value 4109.160823
iter 230 value 4087.733164
iter 240 value 4064.297313
iter 250 value 4044.426288
iter 260 value 4028.137060
iter 270 value 4016.476377
iter 280 value 4006.958397
iter 290 value 3998.368268
iter 300 value 3990.123621
final  value 3990.123621 
stopped after 300 iterations
# weights:  1510
initial  value 32308.858163 
iter  10 value 15984.108836
iter  20 value 9434.584349
iter  30 value 7716.123954
iter  40 value 6794.370700
iter  50 value 6105.193403
iter  60 value 5753.743633
iter  70 value 5451.267671
iter  80 value 5243.297364
iter  90 value 5005.466716
iter 100 value 4875.125826
iter 110 value 4783.364933
iter 120 value 4695.407978
iter 130 value 4635.274327
iter 140 value 4582.453403
iter 150 value 4525.760331
iter 160 value 4476.896131
iter 170 value 4437.253655
iter 180 value 4406.174676
iter 190 value 4375.017372
iter 200 value 4349.520394
iter 210 value 4317.538587
iter 220 value 4292.399336
iter 230 value 4268.317887
iter 240 value 4248.257757
iter 250 value 4229.286511
iter 260 value 4213.596588
iter 270 value 4196.365844
iter 280 value 4180.942912
iter 290 value 4165.238823
iter 300 value 4150.685336
final  value 4150.685336 
stopped after 300 iterations
# weights:  1510
initial  value 28011.577632 
iter  10 value 12000.606643
iter  20 value 7331.698935
iter  30 value 6027.776139
iter  40 value 5381.475872
iter  50 value 5021.984316
iter  60 value 4770.699281
iter  70 value 4614.987520
iter  80 value 4486.846474
iter  90 value 4392.306235
iter 100 value 4308.090884
iter 110 value 4231.811193
iter 120 value 4170.213732
iter 130 value 4122.667721
iter 140 value 4083.196671
iter 150 value 4047.690645
iter 160 value 4017.697563
iter 170 value 3994.035507
iter 180 value 3977.567081
iter 190 value 3964.202759
iter 200 value 3952.988618
iter 210 value 3943.508283
iter 220 value 3933.477389
iter 230 value 3925.220875
iter 240 value 3919.870931
iter 250 value 3915.071037
iter 260 value 3910.459970
iter 270 value 3906.714559
iter 280 value 3903.309702
iter 290 value 3899.822186
iter 300 value 3895.184365
final  value 3895.184365 
stopped after 300 iterations

Model 3: Random Forest

Random Forest is a popular ensemble learning method used for both classification and regression tasks in machine learning. It belongs to the family of bagging algorithms, which combine multiple weak learners (usually decision trees) to create a strong and robust predictive model. The fundamental idea behind Random Forest is to aggregate the predictions of multiple individual models to achieve better generalization and reduce overfitting. Random Forest creates multiple subsets (bootstrapped samples) of the original training data by randomly sampling with replacement. Each subset is used to train a separate decision tree. During the tree-building process, at each split, the algorithm considers only a random subset of features rather than all the features. This randomness helps to introduce diversity among the individual trees and reduces correlation between them. For classification tasks, the final prediction is obtained by taking a majority vote from all the individual trees. For regression tasks, the predictions are averaged. Overall, Random Forest is a powerful and versatile algorithm that can be effective in a wide range of machine learning tasks. It’s especially useful when dealing with complex data with many features, but it may not be the best choice for applications where interpretability is crucial or when dealing with extremely large datasets.

Advantages: High Accuracy: Random Forest often yields highly accurate predictions due to the combination of multiple decision trees, which reduces overfitting and improves generalization. Robustness: Random Forest is robust against noisy data and outliers since it aggregates the predictions from multiple models. Feature Importance: Random Forest can provide an estimate of feature importance, helping in feature selection and understanding the most influential variables in the data. No Data Splitting: Random Forest does not require a separate validation set for hyperparameter tuning, as it internally uses the out-of-bag samples (data not used during bootstrapped training) for validation. Parallelization: The training of individual trees can be done in parallel, making it computationally efficient for large datasets.

Disadvantages: Model Size: The ensemble of decision trees can lead to a large model size, especially when dealing with numerous trees or deep trees, which can increase memory requirements. Interpretability: While Random Forest can provide feature importance, the overall model can be challenging to interpret compared to individual decision trees. Bias in Imbalanced Datasets: Random Forest tends to favor the majority class in imbalanced datasets, which may require additional techniques to handle class imbalances. Hyperparameter Tuning: Although Random Forests have fewer hyperparameters to tune compared to individual decision trees, finding the optimal values for these hyperparameters can still be time-consuming.

Model 4: KNN

K-Nearest Neighbors (KNN) is a simple and widely used supervised machine learning algorithm for both classification and regression tasks. In KNN, the output (class label or value) of an unseen data point is determined based on the majority class or average of its ‘k’ nearest neighbors in the feature space. The algorithm relies on the assumption that similar data points tend to have similar output values.

Advantages: Simple and Easy to Implement: KNN is straightforward to understand and implement, making it a good starting point for beginners in machine learning. No Training Phase: KNN is a lazy learning algorithm, meaning there is no explicit training phase. The model quickly adapts to new data as it arrives. Non-parametric: KNN makes no assumptions about the underlying data distribution, making it suitable for both linear and nonlinear relationships. No Model Complexity: Since KNN does not learn an explicit model, it can be more interpretable than complex models like neural networks. Versatile: KNN can handle multi-class classification, regression, and even outlier detection tasks.

Disadvantages: Computationally Intensive: Predicting the class or value for a new data point can be computationally expensive, especially with large datasets, as it requires calculating distances for all data points. Curse of Dimensionality: KNN’s performance can degrade when dealing with high-dimensional data, as the concept of distance becomes less meaningful in higher dimensions. Choosing the Right ‘k’: Selecting an appropriate value for ‘k’ is critical. A small ‘k’ can lead to noisy predictions, while a large ‘k’ may cause loss of local patterns. Sensitive to Outliers: KNN is sensitive to outliers since the nearest neighbors might be heavily influenced by them. Imbalanced Data: In classification tasks with imbalanced classes, KNN may be biased towards the majority class due to the voting mechanism.

Model 5: Elastic Net

Elastic Net is a linear regression model that combines L1 (Lasso) and L2 (Ridge) regularization to address multicollinearity and perform feature selection. It adds both the absolute value of the coefficients (L1 regularization) and the squared value of the coefficients (L2 regularization) to the loss function, allowing it to simultaneously perform variable selection and shrinkage.

Advantages: Elastic Net is particularly useful when dealing with datasets that have a large number of predictor variables and potential multicollinearity issues. It helps prevent overfitting, provides stable and interpretable model coefficients, and automatically selects relevant features by shrinking less important coefficients towards zero.

Disadvantages: While Elastic Net is effective in many cases, it may not perform well if the relationships between the predictors and the response are highly nonlinear. Additionally, finding the optimal hyperparameters for Elastic Net can be challenging.

Model 6: Lasso

Lasso, short for “Least Absolute Shrinkage and Selection Operator,” is a linear regression model that uses L1 regularization to perform feature selection and shrinkage. Lasso adds the absolute value of the coefficients to the loss function, penalizing large coefficient values and encouraging some coefficients to be exactly zero.

Advantages: Lasso is particularly useful when dealing with high-dimensional datasets, as it can handle large numbers of predictor variables and identify the most relevant features. It helps prevent overfitting by shrinking less important coefficients towards zero, leading to a simpler and more interpretable model. Additionally, the sparsity introduced by Lasso makes the model suitable for feature selection, as it naturally identifies and excludes irrelevant features from the model.

Disadvantages: Lasso may not perform well if the relationships between the predictors and the response variable are highly nonlinear. Furthermore, choosing the optimal regularization parameter (lambda) for Lasso can be challenging, similar to other regularization techniques, as it requires tuning hyperparameters.

Model 7: Ridge Regression

Ridge Regression, also known as L2 regularization, is a linear regression technique that adds the squared values of the coefficients to the loss function. It is used to prevent overfitting and reduce the impact of multicollinearity in the dataset. Ridge Regression introduces a penalty term that encourages smaller coefficients, effectively shrinking them towards zero but not exactly to zero.

Advantages: Ridge Regression is beneficial when dealing with datasets that have multicollinearity issues, where predictor variables are highly correlated. The L2 regularization helps stabilize the model by shrinking correlated coefficients, reducing the impact of collinearity, and providing more reliable predictions. Ridge Regression can also handle high-dimensional datasets with many predictors, making it suitable for scenarios where feature selection is not the primary goal.

Disadvantages: Similar to Lasso and Elastic Net, Ridge Regression has hyperparameters that need to be tuned, such as the regularization parameter (lambda or alpha) and selecting the optimal value for the regularization parameter can be challenging, requiring cross-validation or other techniques. Additionally, Ridge Regression does not perform feature selection like Lasso does, as it rarely sets coefficients exactly to zero, which might be a limitation when feature reduction is desired.

Model 8: XGBoost

XGBoost combines multiple decision trees, to create an ensemble of models that can predict whether an instance of a dataset belongs to a certain class. At each iteration, a new tree is built that tries to correct the mistakes of the previous ensemble using gradient descent. The prediction of the final ensemble is then the weighted sum of the predictions of each individual tree.

Advantages: XGBoost is highly optimized, efficient and has a high predictive power because of its robust handling of different types of predictor variables, its regularization properties, and its tree building algorithm.

Disadvantages: XGBoost can be slow, overfit and is more complex to tune because there are many hyperparameters that can be adjusted.

Details of selected parameters: The parameters were tuned using a 300 sample size and fitted here as final model. nrounds - The number of boosting rounds or trees to fit. early_stopping_rounds - Training will stop if the validation error does not decrease for 10 consecutive rounds. objective - “multi:softmax” is used for multiclass classification problems. num_class - For multiclass problems, this is set to the number of classes in the dataset.

Model 9: SVM

SVM separates the classes by hyperplanes. The model tries to find the hyperplane which has the maximum margin, i.e., the maximum distance between data points of the classes.

Advantages: SVM works well with high-dimensional data and efficient.

Disadvantages: The decision of choosing a kernel function can be tricky, and the results are sensitive to the choice of kernel parameters. Noise and overlapping classes can affect the performance of SVM.

Details of selected parameters: The parameters were tuned using a 300 sample size and fitted here as final model. kernel - The radial basis function kernel is one of the most popular choices as it can handle complex, nonlinear classification problems. type - ‘C-classification’ refers to the type of SVM for classification tasks. gamma - It defines how far the influence of a single training example reaches. The lower it is the further points will be considered. cost - A larger cost creates a narrower margin so as not to misclassify training data, but it might also overfit the data.

Model 10: Rpart

Rpart works by recursively partitioning the data based on certain criteria to create a decision tree.

Advantages: The tree structure and decision rules are easy to understand and provide information about feature importance, i.e., which variables are most influential in making predictions.

Disadvantages: Decision trees can easily overfit the data. It is also very sensitive to changes in the data. A small change can result in a very different tree.

Details of selected parameters: method - The type of prediction problem to solve and “class” indicates a classification problem. control - The complexity parameter is used to control the size of the decision tree and to prevent overfitting. The tree-building process will stop if splitting a node would result in the improvement of the model’s fit being less than 0.01.

Scoreboard

An overall scoreboard of average results for the 30 combinations of Model and Sample Size

      Model Sample_Size      A      B      C Points
 1:  model1        1000 0.0167 0.0288 0.2079 0.1613
 2:  model1        5000 0.0833 0.1397 0.1752 0.1579
 3:  model1       10000 0.1667 0.3403 0.1667 0.1841
 4:  model2        1000 0.0167 0.2007 0.2040 0.1756
 5:  model2        5000 0.0833 0.4016 0.1601 0.1727
 6:  model2       10000 0.1667 0.8085 0.1508 0.2189
 7:  model3        1000 0.0167 0.0170 0.2145 0.1651
 8:  model3        5000 0.0833 0.1036 0.1735 0.1530
 9:  model3       10000 0.1667 0.2634 0.1593 0.1708
10:  model4        1000 0.0167 0.0146 0.2716 0.2077
11:  model4        5000 0.0833 0.0570 0.2150 0.1794
12:  model4       10000 0.1667 0.1472 0.1921 0.1838
13:  model5        1000 0.0167 0.5259 0.2115 0.2137
14:  model5        5000 0.0833 0.0357 0.1943 0.1618
15:  model5       10000 0.1667 1.0000 0.1870 0.2653
16:  model6        1000 0.0167 0.6984 0.2276 0.2430
17:  model6        5000 0.0833 0.3654 0.1942 0.1947
18:  model6       10000 0.1667 1.0000 0.1857 0.2643
19:  model7        1000 0.0167 0.0933 0.2335 0.1869
20:  model7        5000 0.0833 0.4601 0.2244 0.2268
21:  model7       10000 0.1667 0.0187 0.2227 0.1939
22:  model8        1000 0.0167 0.0566 0.2168 0.1708
23:  model8        5000 0.0833 0.4565 0.1626 0.1801
24:  model8       10000 0.1667 0.9929 0.1456 0.2335
25:  model9        1000 0.0167 0.0228 0.2321 0.1789
26:  model9        5000 0.0833 0.1563 0.1752 0.1595
27:  model9       10000 0.1667 0.3846 0.1564 0.1807
28: model10        1000 0.0167 0.0041 0.3583 0.2716
29: model10        5000 0.0833 0.0174 0.3430 0.2715
30: model10       10000 0.1667 0.0308 0.3441 0.2862
      Model Sample_Size      A      B      C Points

Discussion

Based on the findings from our dataset and the training of 90 models using the different modeling methods, it seems that neural network, random forest, support vector machine (SVM), and XGBoost have higher accuracy in classification prediction with less time to run. We have concluded some possible reasons for this observation:

  1. Neural Networks:
    • Non-linear representation: Neural networks can capture complex patterns and non-linear relationships in the data, making them effective for tasks with intricate decision boundaries.
    • Parallel processing: With the right hardware and implementation, neural networks can be parallelized, which significantly speeds up training and prediction times.
    • GPU acceleration: Training neural networks on GPUs can provide a substantial performance boost, especially for large-scale datasets and complex architectures.
  2. Random Forest:
    • Ensemble method: Random forests are an ensemble learning technique that combines multiple decision trees, providing robust and accurate predictions by reducing overfitting.
    • Parallel processing: Random forests can be easily parallelized since each tree in the forest can be trained independently, leading to faster training times.
    • Feature importance: Random forests can offer insights into feature importance, allowing for feature selection and identifying key predictors in the dataset.
  3. Support Vector Machine (SVM):
    • Effective in high-dimensional spaces: SVM performs well in datasets with a high number of features, which can be challenging for other algorithms.
    • Kernel trick: By using the kernel trick, SVM can implicitly operate in higher-dimensional feature spaces, making it useful for non-linear classification tasks.
    • Efficient decision boundaries: SVM aims to find the optimal decision boundary that maximizes the margin between classes, leading to accurate predictions.
  4. XGBoost:
    • Gradient Boosting: XGBoost is an ensemble method that combines weak learners (decision trees) to create a strong predictive model, leading to high accuracy.
    • Regularization: XGBoost includes regularization techniques to control overfitting, allowing it to generalize well to new data.
    • Hardware optimization: XGBoost is designed to be computationally efficient, making use of parallel processing and other optimizations to reduce training time.

It’s important to note that the superiority of these models is based on our specific dataset and the experimental setup used in training the 90 models. Different datasets may yield different results, and the performance of each model can be influenced by factors like hyperparameter tuning and data preprocessing. Therefore, it’s always a good practice to experiment with multiple models and assess their performance on different datasets to select the most suitable model for a given task.

While we have implemented fine tuning to some models including neural network and XGBoost, we expect to further adjust parameter settings to achieve better performance in the future.

    Model Sample_Size      A      B      C Points
1: model3        5000 0.0833 0.1036 0.1735 0.1530
2: model1        5000 0.0833 0.1397 0.1752 0.1579
3: model9        5000 0.0833 0.1563 0.1752 0.1595
4: model1        1000 0.0167 0.0288 0.2079 0.1613
5: model5        5000 0.0833 0.0357 0.1943 0.1618