En este análisis exploratorio de datos (EDA), se buscará comprender el comportamiento de distintos algoritmos de Inteligencia Artificial (IA), frameworks, tipos de problemas y conjuntos de datos mediante el análisis de métricas clave como precisión, recall, F1_Score y tiempo de entrenamiento. Este proceso permitirá identificar patrones, relaciones y posibles optimizaciones en el uso de diferentes herramientas de IA, con el fin de mejorar el rendimiento de los modelos en distintos contextos.

La pregunta que se abordará es:

¿Cómo varía la precisión (Precision) de los diferentes algoritmos de aprendizaje automático según el tipo de problema (Problem_Type)?

Esta pregunta nos permitirá evaluar la eficiencia de los distintos frameworks al implementar redes neuronales, enfocándonos en cómo se desempeñan en términos de tiempo, un factor crucial en el entrenamiento de modelos.

Diccionario de variables que se encuentran en nuestra base de datos:

  1. Algorithm (categórica): Tipo de algoritmo de IA utilizado(‘Neural Network’, ‘Random Forest’, ‘SVM’, ‘K-Means’)

  2. Framework (categórica): Framework o biblioteca utilizada para la implementación del modelo de IA(‘TensorFlow’, ‘PyTorch’, ‘Keras’,‘Scikit-learn’)

  3. Problem_Type (categórica): Tipo de problema abordado por el modelo.(‘Classification’, ‘Regression’, ‘Clustering’)

  4. Dataset_Type (categórica): Tipo de datos utilizados en el entrenamiento del modelo(‘Image’, ‘Text’, ‘Tabular’, ‘Time Series’.)

  5. Accuracy (numérica, continua): Precisión del modelo en el conjunto de prueba (entre 0 y 1).}

  6. Precision (numérica, continua): Precisión del modelo (valor entre 0 y 1).

  7. Recall (numérica, continua): Sensibilidad o capacidad del modelo para identificar correctamente los positivos (entre 0 y 1).

  8. F1_Score (numérica, continua): Medida armónica entre precisión y recall (entre 0 y 1).

  9. Training_Time (numérica, continua): Tiempo de entrenamiento del modelo en horas.

  10. Date (fecha): Fecha en la que se realizó la evaluación del modelo, cubriendo el último año.

Cargar la base de datos

¿Qué framework presenta el menor tiempo de entrenamiento (Training_Time) promedio al trabajar con redes neuronales (Neural Networks) en problemas de clasificación?

library(readxl)
rut<-"C:/Users/User/Documents/PROYECTOS RSTUDIO/EDA semana uribe/Dataset_IA_corte_II.xlsx"
datos<-read_excel(rut)
head(datos)
## # A tibble: 6 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 SVM            Scikit-lea… Regression   Time Series     0.662     0.693 NA    
## 2 K-Means        Keras       Clustering   Time Series     0.744     0.490  0.877
## 3 Neural Network Keras       Clustering   Image           0.885     0.595  0.969
## 4 SVM            Keras       Clustering   Text            0.842     0.842  0.875
## 5 SVM            Scikit-lea… Regression   Tabular         0.723     0.686  0.301
## 6 K-Means        PyTorch     Regression   Image           0.637     0.626  7.45 
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Estructura del Dataset

names(datos)
##  [1] "Algorithm"     "Framework"     "Problem_Type"  "Dataset_Type" 
##  [5] "Accuracy"      "Precision"     "Recall"        "F1_Score"     
##  [9] "Training_Time" "Date"
dim(datos)
## [1] 560  10
str(datos)
## tibble [560 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:560] "SVM" "K-Means" "Neural Network" "SVM" ...
##  $ Framework    : chr [1:560] "Scikit-learn" "Keras" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:560] "Regression" "Clustering" "Clustering" "Clustering" ...
##  $ Dataset_Type : chr [1:560] "Time Series" "Time Series" "Image" "Text" ...
##  $ Accuracy     : num [1:560] 0.662 0.744 0.885 0.842 0.723 ...
##  $ Precision    : num [1:560] 0.693 0.49 0.595 0.842 0.686 ...
##  $ Recall       : num [1:560] NA 0.877 0.969 0.875 0.301 ...
##  $ F1_Score     : num [1:560] 0.443 0.441 0.964 0.704 0.646 ...
##  $ Training_Time: num [1:560] 4.98 NA 3.28 4.04 3.6 ...
##  $ Date         : POSIXct[1:560], format: "2023-03-08 11:26:21" "2023-03-09 11:26:21" ...

Base de datos completa

library(knitr)
library(kableExtra)
kable(datos, caption="Base de datos, Inteligencia Artificial(IA)") %>%
  kable_styling(full_width=F) %>%
  column_spec(2, width="20em") %>%
  scroll_box(width="900px", height="450px")
Base de datos, Inteligencia Artificial(IA)
Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall F1_Score Training_Time Date
SVM Scikit-learn Regression Time Series 0.6618051 0.6929447 NA 0.4426950 4.9785924 2023-03-08 11:26:21
K-Means Keras Clustering Time Series 0.7443216 0.4900292 0.8766533 0.4414046 NA 2023-03-09 11:26:21
Neural Network Keras Clustering Image 0.8852037 0.5948056 0.9685424 0.9644707 3.2825938 2023-03-10 11:26:21
SVM Keras Clustering Text 0.8416477 0.8424142 0.8748388 0.7041523 4.0416289 2023-03-11 11:26:21
SVM Scikit-learn Regression Tabular 0.7229514 0.6856109 0.3010956 0.6456472 3.6039908 2023-03-12 11:26:21
K-Means PyTorch Regression Image 0.6368133 0.6255330 7.4548096 0.8865271 3.0064753 2023-03-13 11:26:21
Neural Network PyTorch Regression Text 0.9985623 0.6366858 0.3357948 0.9014956 NA 2023-03-14 11:26:21
Neural Network Scikit-learn Regression Image 0.7130907 0.6756681 0.4803251 0.5993146 2.3283453 2023-03-15 11:26:21
SVM Keras Regression Time Series NA 0.8710099 0.3416673 0.8161708 3.4064529 2023-03-16 11:26:21
Random Forest Keras Regression Text 0.5818119 0.9352508 NA 0.8626737 3.4199049 2023-03-17 11:26:21
SVM PyTorch Regression Image 0.8974048 9.7320081 0.7806129 0.7927904 1.9283008 2023-03-18 11:26:21
SVM Keras Clustering Image 0.8468411 0.8721420 0.3801413 0.4909570 4.7142907 2023-03-19 11:26:21
SVM TensorFlow Clustering Tabular 0.6103848 0.5892441 0.5686872 0.9255299 0.9200495 2023-03-20 11:26:21
SVM PyTorch Clustering Image 0.5411905 0.8128808 0.6193656 0.7234567 2.5517613 2023-03-21 11:26:21
K-Means Keras Clustering Text 0.8402497 0.6625619 0.5583371 0.5694835 3.4853315 2023-03-22 11:26:21
Neural Network PyTorch Regression Text NA 0.5528024 0.3847175 0.6551369 3.5159654 2023-03-23 11:26:21
K-Means TensorFlow Classification Tabular 0.6366298 0.9045229 0.5932635 0.4225427 3.2783309 2023-03-24 11:26:21
K-Means PyTorch Regression Text 0.9754318 0.4230558 0.8258246 0.4767201 1.4489122 2023-03-25 11:26:21
K-Means PyTorch Classification Time Series 0.5755289 0.9410572 0.3497054 0.8593281 0.8654122 2023-03-26 11:26:21
SVM PyTorch Clustering Text 0.7161674 0.6768865 0.3561260 0.4000070 3.2161076 2023-03-27 11:26:21
Random Forest PyTorch Regression Text 9.7180796 0.7823209 0.5483399 0.6499395 3.0365804 2023-03-28 11:26:21
Neural Network TensorFlow Clustering Image 0.7098637 0.7956124 0.9592080 0.7135061 0.9788445 2023-03-29 11:26:21
Random Forest Scikit-learn Regression Image 0.8192630 0.9370706 0.7680009 0.4327807 3.5551818 2023-03-30 11:26:21
K-Means PyTorch Clustering Text 0.6987972 0.7820018 0.7750690 0.9838469 2.3303428 2023-03-31 11:26:21
K-Means PyTorch Classification Tabular 0.6371076 0.7683602 0.5533440 0.5356752 3.3720269 2023-04-01 11:26:21
Random Forest Keras Classification Image 0.9919888 0.4399912 0.7155626 0.5825192 4.2030987 2023-04-02 11:26:21
Random Forest PyTorch Classification Tabular 0.7046670 0.7110448 0.3070918 0.5823655 0.9316693 2023-04-03 11:26:21
Random Forest Keras Regression Text 0.9470496 0.4901014 0.7452672 0.5382500 0.1938099 2023-04-04 11:26:21
K-Means Scikit-learn Classification Text 0.6149773 0.8424603 0.9393009 0.4008843 3.9176027 2023-04-05 11:26:21
K-Means TensorFlow Regression Time Series NA 0.7073332 0.7288014 0.8376069 3.0875174 2023-04-06 11:26:21
Neural Network TensorFlow Classification Image 0.5155670 0.8081367 0.9115890 0.9801073 3.5282339 2023-04-07 11:26:21
Neural Network Scikit-learn Classification Text 0.8258334 0.4250037 NA 0.5345761 4.2069594 2023-04-08 11:26:21
K-Means TensorFlow Regression Time Series 0.6842632 0.4508752 0.3843909 0.7978283 4.0337736 2023-04-09 11:26:21
Random Forest Scikit-learn Regression Image NA 0.8297940 0.9317173 8.4513780 4.8085087 2023-04-10 11:26:21
Random Forest Scikit-learn Classification Text 0.7366050 0.4432506 0.3465107 0.9090552 2.7260819 2023-04-11 11:26:21
Neural Network Scikit-learn Regression Text 0.9840967 0.4427540 0.6737772 0.6535775 2.4936286 2023-04-12 11:26:21
K-Means TensorFlow Classification Image 5.9276276 NA 0.3994960 0.5817585 2.0692473 2023-04-13 11:26:21
Neural Network Keras Regression Image 0.9343116 0.9739008 0.3081946 0.5951771 0.8530861 2023-04-14 11:26:21
Neural Network Scikit-learn Clustering Image 0.8882984 0.8425050 0.5954240 0.8275728 NA 2023-04-15 11:26:21
SVM PyTorch Clustering Text 0.8854609 0.6119508 0.5065285 0.8900677 1.4573614 2023-04-16 11:26:21
SVM TensorFlow Clustering Text 0.9223916 0.5779213 0.6402003 0.5089684 4.6144068 2023-04-17 11:26:21
SVM Scikit-learn Clustering Time Series 0.8805120 0.6098219 0.7040399 NA 2.9576349 2023-04-18 11:26:21
Random Forest Scikit-learn Clustering Tabular 0.8131102 0.8647921 NA 0.9411641 3.0049169 2023-04-19 11:26:21
K-Means PyTorch Classification Image 0.5656224 0.7968224 0.3861023 0.8840161 1.8391308 2023-04-20 11:26:21
K-Means Keras Clustering Text NA 0.5111173 0.6910497 0.9909150 0.3547478 2023-04-21 11:26:21
K-Means Keras Clustering Text 0.9604239 0.5044656 0.5402171 0.8525490 0.2555445 2023-04-22 11:26:21
K-Means Keras Clustering Image 0.8083252 0.4590374 0.8104214 0.6359171 2.1743422 2023-04-23 11:26:21
SVM Scikit-learn Clustering Tabular 0.8982686 0.7961816 0.7566042 0.7543827 0.5143704 2023-04-24 11:26:21
Random Forest Scikit-learn Regression Image 0.7407612 0.8586236 0.8919231 0.7966086 NA 2023-04-25 11:26:21
Random Forest Scikit-learn Classification Tabular 0.5586541 0.5590279 0.7847450 0.4470735 4.4167677 2023-04-26 11:26:21
SVM TensorFlow Clustering Text 0.5625929 0.4125670 0.6009518 0.7266982 4.4262826 2023-04-27 11:26:21
Random Forest PyTorch Clustering Tabular 0.8427826 0.4493030 0.7710766 0.8255925 3.3293225 2023-04-28 11:26:21
SVM Scikit-learn Clustering Time Series 0.7151529 0.9807160 0.4927668 0.5003928 1.1350360 2023-04-29 11:26:21
K-Means PyTorch Classification Text 0.6002624 0.5772669 0.5144194 0.8683790 4.3286845 2023-04-30 11:26:21
SVM PyTorch Regression Time Series 0.7457973 0.8615339 0.8522896 NA 4.4418578 2023-05-01 11:26:21
K-Means Keras Classification Image 0.5321045 0.7747981 NA 0.9713331 1.0636026 2023-05-02 11:26:21
K-Means TensorFlow Regression Image 0.7909857 0.6291638 0.8588662 0.4254534 3.7133878 2023-05-03 11:26:21
Neural Network Scikit-learn Clustering Image 0.6344967 0.5234124 0.8756957 0.5591957 1.5039458 2023-05-04 11:26:21
SVM Keras Clustering Tabular 0.8987796 0.4728319 0.9002936 0.7609323 4.0329375 2023-05-05 11:26:21
Neural Network Keras Classification Text 0.6551810 0.7690078 0.9416447 0.5779359 4.9864657 2023-05-06 11:26:21
SVM Scikit-learn Classification Image 0.7276101 0.8647803 0.6016897 0.8286545 0.2471274 2023-05-07 11:26:21
SVM PyTorch Regression Image 0.5058103 0.7863426 0.5232068 0.8554032 4.4970927 2023-05-08 11:26:21
Neural Network Keras Regression Text 0.5362234 0.7181813 0.7075391 0.4615096 3.1508896 2023-05-09 11:26:21
Neural Network PyTorch Classification Image 0.6962468 0.4251707 0.5598207 0.7083127 4.8695748 2023-05-10 11:26:21
SVM TensorFlow Regression Time Series 0.7399694 0.9810933 0.7207519 0.7053343 2.3785466 2023-05-11 11:26:21
Random Forest PyTorch Classification Text 0.8000103 0.8792285 0.7939101 0.6215685 4.2522009 2023-05-12 11:26:21
K-Means TensorFlow Regression Text 0.6458313 0.5756932 0.7818835 0.9597549 0.4057309 2023-05-13 11:26:21
Neural Network PyTorch Classification Text 0.8474909 0.9879822 0.5621870 0.8965038 1.7428673 2023-05-14 11:26:21
K-Means Keras Classification Time Series 0.9300612 0.7611290 0.4168021 0.8183256 0.4283810 2023-05-15 11:26:21
Random Forest Keras Classification Text 0.8899255 0.7494536 0.6013705 0.8285960 4.8791814 2023-05-16 11:26:21
Random Forest TensorFlow Regression Time Series 0.5198094 0.8488439 0.3998159 0.6770297 4.1031721 2023-05-17 11:26:21
Random Forest Scikit-learn Regression Text 0.7402535 0.8870619 0.9230679 0.9525967 4.2774824 2023-05-18 11:26:21
Neural Network TensorFlow Regression Tabular 0.5524651 0.7938872 0.5421142 0.8167573 4.6960296 2023-05-19 11:26:21
Random Forest TensorFlow Classification Image 0.6210225 0.4768574 NA 0.8373886 0.5170071 2023-05-20 11:26:21
Neural Network PyTorch Regression Tabular 0.9933313 0.6029605 0.3178134 0.9170145 NA 2023-05-21 11:26:21
Random Forest PyTorch Regression Image 0.5712478 0.9568502 0.7520757 0.5644430 0.4480710 2023-05-22 11:26:21
K-Means TensorFlow Classification Time Series 0.7494441 0.5347694 0.7458316 0.8842425 1.1328862 2023-05-23 11:26:21
K-Means Keras Clustering Tabular 0.8090779 0.6233002 0.5384229 NA 1.2238923 2023-05-24 11:26:21
SVM PyTorch Classification Text 0.8512325 0.6592461 0.3501983 0.6072052 2.3986253 2023-05-25 11:26:21
K-Means Scikit-learn Regression Image 0.7798243 0.6636430 0.5867402 0.6013663 1.4159546 2023-05-26 11:26:21
SVM PyTorch Classification Image 0.5048854 0.7677637 0.5178522 0.9871153 0.5947927 2023-05-27 11:26:21
K-Means PyTorch Clustering Tabular 0.6632307 0.9658455 0.7739844 0.9139223 0.9207063 2023-05-28 11:26:21
Neural Network Scikit-learn Classification Time Series 0.7588558 0.5444156 0.7240455 0.8207019 0.8216503 2023-05-29 11:26:21
K-Means TensorFlow Regression Tabular 0.5439332 0.4729008 0.5552156 0.8362341 4.8680763 2023-05-30 11:26:21
SVM Scikit-learn Clustering Time Series 0.6753135 0.5184823 0.4525248 NA 3.8205333 2023-05-31 11:26:21
SVM TensorFlow Regression Time Series 0.5166016 0.9321549 0.9916252 0.9682544 4.8420312 2023-06-01 11:26:21
Random Forest TensorFlow Clustering Image 0.5392892 0.7874865 0.6178011 0.6977553 2.2547284 2023-06-02 11:26:21
Neural Network TensorFlow Clustering Tabular 0.6984616 0.5715441 0.7817920 0.6283106 1.4640056 2023-06-03 11:26:21
K-Means Scikit-learn Classification Tabular 0.5663579 0.8895682 0.3983871 0.4978212 4.0116367 2023-06-04 11:26:21
Random Forest TensorFlow Classification Time Series 0.7837704 0.9168220 NA 0.8717234 1.6986442 2023-06-05 11:26:21
K-Means Keras Regression Tabular 0.8447325 0.9079086 0.3192757 0.8406664 1.5669785 2023-06-06 11:26:21
K-Means Scikit-learn Classification Text 0.9002933 0.9513559 0.6538186 0.6306130 1.2392794 2023-06-07 11:26:21
Random Forest Keras Classification Text 0.6000751 0.5513446 0.9748122 0.4151160 0.7354270 2023-06-08 11:26:21
Random Forest TensorFlow Clustering Tabular 0.5837413 0.8530252 0.5689447 0.9033984 1.3566690 2023-06-09 11:26:21
Random Forest PyTorch Classification Tabular 0.5522839 0.6763237 0.3272972 0.4068508 1.8410668 2023-06-10 11:26:21
Random Forest TensorFlow Clustering Time Series 0.8182151 0.9051991 0.3216691 0.8222199 3.4025716 2023-06-11 11:26:21
Random Forest Keras Clustering Text 8.5323786 0.8370944 NA 0.9821543 0.4065677 2023-06-12 11:26:21
K-Means Keras Clustering Text 0.5157931 0.8658685 0.4120175 0.6625968 1.1314840 2023-06-13 11:26:21
Random Forest Scikit-learn Regression Image 0.9681061 0.7936971 0.3163465 0.5409840 4.0642158 2023-06-14 11:26:21
Neural Network PyTorch Regression Time Series 5.2598564 0.5064573 0.8293495 0.8229226 0.8199174 2023-06-15 11:26:21
SVM Scikit-learn Regression Image 0.7706482 0.7270162 0.6209661 0.8902769 1.7789612 2023-06-16 11:26:21
Random Forest Keras Clustering Text 0.8545303 0.9908018 0.5024713 0.7278582 4.3368287 2023-06-17 11:26:21
Random Forest TensorFlow Regression Time Series 0.9354846 0.9624328 NA 0.9802212 NA 2023-06-18 11:26:21
K-Means PyTorch Regression Tabular 0.8570435 0.4259042 0.3812973 0.4310012 0.5054105 2023-06-19 11:26:21
Random Forest TensorFlow Classification Time Series 0.9008640 0.4988889 0.9691432 0.7028774 2.4742362 2023-06-20 11:26:21
Random Forest TensorFlow Regression Tabular 0.6697251 0.4790373 0.5197767 0.8310724 1.5818556 2023-06-21 11:26:21
Random Forest PyTorch Regression Tabular NA 0.8355879 0.9218826 0.9175843 2.8607010 2023-06-22 11:26:21
K-Means Scikit-learn Clustering Time Series 0.5400574 0.8906712 0.7220600 0.5075534 4.0386440 2023-06-23 11:26:21
Random Forest TensorFlow Clustering Tabular 0.9474083 0.5281068 0.8786952 0.8800021 0.7720276 2023-06-24 11:26:21
SVM Keras Clustering Image 0.7737962 0.7035116 0.9888092 0.7316242 2.9454255 2023-06-25 11:26:21
K-Means TensorFlow Clustering Text 0.9086489 0.9044218 0.5018838 0.6379322 2.5769929 2023-06-26 11:26:21
SVM PyTorch Clustering Tabular 0.7261591 0.8396809 0.9727948 0.4790290 0.8059861 2023-06-27 11:26:21
K-Means PyTorch Clustering Image 0.8217888 0.7253423 5.7263733 0.9191775 3.1575926 2023-06-28 11:26:21
Random Forest PyTorch Regression Tabular 0.7632013 0.7542086 0.5698583 0.4943639 1.4408380 2023-06-29 11:26:21
SVM Scikit-learn Clustering Text 0.8657948 0.7050163 0.5382710 5.8587272 NA 2023-06-30 11:26:21
K-Means TensorFlow Clustering Image NA 0.5785291 0.6789853 0.5740273 0.5031344 2023-07-01 11:26:21
Neural Network Scikit-learn Classification Text 0.5301760 0.7390132 0.4079015 NA 2.3519619 2023-07-02 11:26:21
Random Forest Scikit-learn Regression Text 0.6235516 0.8133312 0.6875980 0.8036218 1.6016710 2023-07-03 11:26:21
K-Means PyTorch Regression Image 0.5797723 0.9239937 0.6791934 0.8780088 4.1296679 2023-07-04 11:26:21
Neural Network PyTorch Regression Image 0.9358918 0.7817748 0.8333313 0.5502807 0.3791958 2023-07-05 11:26:21
K-Means Scikit-learn Clustering Image 0.6096070 0.8566729 0.8835550 0.7749245 2.1511756 2023-07-06 11:26:21
Neural Network TensorFlow Clustering Time Series 0.9879326 0.4960430 0.6083089 0.7430476 2.3463140 2023-07-07 11:26:21
Random Forest Keras Clustering Image 0.6684479 0.6769345 0.5116333 0.8996982 3.6525298 2023-07-08 11:26:21
SVM Scikit-learn Classification Image 0.5910590 4.0559897 0.4815344 0.9436522 2.9140091 2023-07-09 11:26:21
Neural Network TensorFlow Clustering Image 0.8948493 0.5480073 0.4362367 0.4072941 3.3693405 2023-07-10 11:26:21
K-Means PyTorch Classification Image 0.8293539 NA NA 0.8044120 3.9075415 2023-07-11 11:26:21
Random Forest PyTorch Regression Tabular 0.7490979 NA 0.5397136 0.4311015 4.3247946 2023-07-12 11:26:21
Neural Network TensorFlow Clustering Time Series 0.7776818 0.4595069 4.8917338 NA 1.6379899 2023-07-13 11:26:21
K-Means Keras Classification Text 0.8596009 0.6408966 0.9764922 0.5725796 2.7363113 2023-07-14 11:26:21
K-Means PyTorch Classification Image NA 0.8800426 0.6903962 0.5840660 4.2165296 2023-07-15 11:26:21
K-Means PyTorch Clustering Tabular 0.9981670 0.5224214 0.5430939 0.6117751 4.9483105 2023-07-16 11:26:21
Neural Network PyTorch Regression Tabular 0.9873966 0.7330510 0.7063273 0.7727755 44.5864462 2023-07-17 11:26:21
Neural Network TensorFlow Regression Tabular 0.8251628 0.8398428 0.3974377 0.6004300 1.9201896 2023-07-18 11:26:21
Neural Network Scikit-learn Regression Text 0.5997712 0.7695913 0.6108306 0.8396194 1.0563159 2023-07-19 11:26:21
SVM PyTorch Classification Image 0.8401141 0.5128148 0.7383640 0.6427164 2.4980234 2023-07-20 11:26:21
Neural Network Scikit-learn Classification Time Series 0.5360992 0.6132307 0.6422285 0.4410119 3.7340431 2023-07-21 11:26:21
Neural Network Keras Classification Time Series 0.5153263 0.8702751 0.5812452 0.8702559 2.5135884 2023-07-22 11:26:21
Neural Network PyTorch Classification Tabular NA 0.7325359 0.9956939 0.5714550 2.4675862 2023-07-23 11:26:21
SVM PyTorch Regression Text 0.7313115 0.4031378 0.9162203 0.6596601 4.2073291 2023-07-24 11:26:21
Neural Network Keras Clustering Text 0.9341363 0.8565945 0.7363842 0.8112663 1.8708785 2023-07-25 11:26:21
K-Means PyTorch Classification Text 0.8635845 0.4211868 0.6985642 0.5994737 4.3129939 2023-07-26 11:26:21
Neural Network Scikit-learn Classification Time Series 0.8713533 0.8474403 0.7344623 0.4339514 20.9334352 2023-07-27 11:26:21
K-Means TensorFlow Regression Tabular 7.1274667 0.5214883 0.4409185 0.6243526 1.7083422 2023-07-28 11:26:21
K-Means Scikit-learn Clustering Image NA 0.9748441 0.5765964 0.9666691 2.3245506 2023-07-29 11:26:21
K-Means TensorFlow Clustering Time Series 0.6855194 6.2076445 0.3276217 0.7850406 3.8359901 2023-07-30 11:26:21
SVM TensorFlow Regression Tabular NA 0.5961590 0.6328822 0.8028875 0.7174099 2023-07-31 11:26:21
SVM Scikit-learn Regression Tabular 5.2005460 0.4893328 0.6801172 0.7793693 1.0624542 2023-08-01 11:26:21
SVM Scikit-learn Clustering Image NA 0.5833625 0.4594248 0.5193953 4.7620796 2023-08-02 11:26:21
Neural Network Scikit-learn Classification Image 0.7893377 0.9259905 0.9748202 0.6510003 0.9599090 2023-08-03 11:26:21
K-Means Scikit-learn Clustering Image 0.7193077 0.9978006 9.3661823 0.8505639 2.8818639 2023-08-04 11:26:21
SVM Keras Regression Text 0.8626288 0.6209857 0.8055003 0.4608237 2.9386409 2023-08-05 11:26:21
SVM PyTorch Classification Image 0.7433345 0.6691664 0.6733706 0.5667117 2.4991599 2023-08-06 11:26:21
Neural Network TensorFlow Classification Tabular 0.9367116 0.8332426 0.9089784 0.5657915 3.2592517 2023-08-07 11:26:21
SVM Scikit-learn Regression Image 0.9503509 0.9317175 0.3914566 0.6592114 1.2261510 2023-08-08 11:26:21
Neural Network Scikit-learn Regression Tabular 0.7108605 0.7558266 0.8533569 0.9882212 2.8080451 2023-08-09 11:26:21
Random Forest Scikit-learn Regression Time Series 0.6384139 0.6349154 0.3873746 0.4405015 1.9236490 2023-08-10 11:26:21
SVM Keras Clustering Text 0.7961752 0.6475731 0.8559475 0.7112206 3.3421696 2023-08-11 11:26:21
Random Forest Scikit-learn Clustering Time Series 0.9561817 0.8173709 0.4930373 0.5076188 0.7919954 2023-08-12 11:26:21
Neural Network Scikit-learn Classification Time Series NA 0.4019310 0.9139634 0.9824059 28.9729934 2023-08-13 11:26:21
K-Means Scikit-learn Regression Tabular 0.8114833 0.7717536 0.9608295 0.4679821 1.0078247 2023-08-14 11:26:21
SVM PyTorch Classification Time Series 0.8157801 0.6132958 0.4041572 0.6421606 NA 2023-08-15 11:26:21
Neural Network Keras Classification Tabular 0.8665565 0.8765184 0.6238729 0.8427310 1.1716780 2023-08-16 11:26:21
K-Means Scikit-learn Classification Image NA 0.4557944 0.9866912 0.8227327 0.9959051 2023-08-17 11:26:21
K-Means TensorFlow Classification Image NA 0.7529214 0.6383852 0.6536372 4.1459837 2023-08-18 11:26:21
Random Forest TensorFlow Regression Tabular 0.9545163 0.6885837 0.9044833 0.6079145 1.4999671 2023-08-19 11:26:21
Neural Network TensorFlow Classification Image 0.5898416 0.7853953 0.7121121 0.6385674 4.6429065 2023-08-20 11:26:21
K-Means Keras Regression Text 0.6187717 0.4389122 0.5627309 0.5585658 4.8526400 2023-08-21 11:26:21
SVM Keras Regression Time Series 0.9856975 NA 0.5000485 0.5231998 2.8991763 2023-08-22 11:26:21
SVM TensorFlow Clustering Text 0.5904885 0.7368908 0.4422562 0.6898238 0.8007055 2023-08-23 11:26:21
Random Forest Scikit-learn Regression Time Series 0.9271925 0.7363961 0.8332587 0.5611203 1.9352864 2023-08-24 11:26:21
K-Means Keras Clustering Image 0.7461389 0.7620926 0.5705784 0.5724770 4.0088881 2023-08-25 11:26:21
Neural Network TensorFlow Regression Text 0.6236155 0.8058808 0.6578928 0.7940536 1.9002146 2023-08-26 11:26:21
SVM TensorFlow Regression Image 0.9353750 0.8829934 0.6446278 0.9811224 0.5263844 2023-08-27 11:26:21
K-Means Scikit-learn Regression Time Series 0.7226526 0.5618924 0.7040953 0.7621823 2.8282461 2023-08-28 11:26:21
K-Means PyTorch Clustering Tabular 0.7574087 0.8950296 0.9059040 0.4461877 4.2410922 2023-08-29 11:26:21
Random Forest Keras Clustering Tabular 0.6796167 0.6989534 0.9865175 0.4453502 NA 2023-08-30 11:26:21
SVM Scikit-learn Classification Text 0.7964754 NA 0.5853089 0.9708539 0.9581034 2023-08-31 11:26:21
SVM TensorFlow Clustering Image 0.5817619 0.4351306 0.8792632 0.5783745 NA 2023-09-01 11:26:21
Neural Network TensorFlow Clustering Image 0.6955408 0.6005430 0.8351695 0.4552402 1.1804709 2023-09-02 11:26:21
SVM Scikit-learn Clustering Text 0.9847062 0.8709382 NA 0.7594268 1.1692169 2023-09-03 11:26:21
Neural Network Keras Clustering Tabular NA 0.8246086 0.9692330 0.7741893 4.3829517 2023-09-04 11:26:21
SVM PyTorch Classification Time Series 0.8283683 0.8731690 0.4403322 0.7891029 1.3233779 2023-09-05 11:26:21
Random Forest PyTorch Clustering Text 0.6625950 0.7103614 0.3764849 0.5604412 1.3899120 2023-09-06 11:26:21
SVM TensorFlow Regression Time Series 0.8867366 0.6641194 0.8977734 0.4090664 0.1032016 2023-09-07 11:26:21
Neural Network Keras Regression Tabular 0.5654368 0.4884715 0.6074049 0.9790092 4.3662783 2023-09-08 11:26:21
Neural Network PyTorch Clustering Image 0.9849105 0.5969157 0.8928782 0.5505358 3.9837146 2023-09-09 11:26:21
Random Forest Keras Clustering Tabular NA 0.6604116 0.9251631 0.8056158 3.1739118 2023-09-10 11:26:21
SVM Keras Regression Text 0.6180252 0.4531603 0.3437203 0.8239779 3.7763022 2023-09-11 11:26:21
SVM TensorFlow Regression Image NA 0.5323672 0.9184253 0.7660045 0.8450380 2023-09-12 11:26:21
Random Forest PyTorch Classification Text 0.5848790 0.7589352 0.6138233 0.5877444 2.3457856 2023-09-13 11:26:21
SVM Scikit-learn Regression Text 0.7598870 0.8413979 NA 0.5626578 1.8209750 2023-09-14 11:26:21
SVM TensorFlow Regression Tabular 0.6685016 0.9990085 0.7386148 0.7586010 0.5595050 2023-09-15 11:26:21
Neural Network PyTorch Clustering Text 0.9144417 0.9598680 0.9484678 NA 2.4820394 2023-09-16 11:26:21
SVM PyTorch Regression Tabular NA 0.7855391 0.3133813 0.9680402 4.6116243 2023-09-17 11:26:21
SVM PyTorch Classification Image 0.6243571 0.6527488 0.6337904 0.4635435 0.2962897 2023-09-18 11:26:21
Random Forest PyTorch Regression Tabular 0.8085725 0.7817064 0.7814054 0.4928972 1.5281585 2023-09-19 11:26:21
Random Forest PyTorch Clustering Image 0.8533886 0.8713910 0.8058949 0.9668418 1.1169506 2023-09-20 11:26:21
K-Means Keras Clustering Tabular 0.5835210 0.4710017 0.7847727 0.8419211 1.2669186 2023-09-21 11:26:21
Neural Network TensorFlow Regression Time Series 0.5838096 0.6459429 0.3941046 0.9297963 4.5513293 2023-09-22 11:26:21
SVM Keras Clustering Time Series 0.5183357 0.9038814 0.5095769 0.5215796 2.3935384 2023-09-23 11:26:21
SVM Scikit-learn Classification Text 0.8682010 0.6302998 0.5511009 0.7525515 2.3848671 2023-09-24 11:26:21
K-Means PyTorch Classification Tabular 0.8319023 0.7431234 0.8631060 0.8206838 3.8268206 2023-09-25 11:26:21
SVM TensorFlow Regression Tabular 0.7373154 0.7526616 0.4951319 0.8080671 0.8570118 2023-09-26 11:26:21
Neural Network PyTorch Regression Image 0.9220852 0.5106858 0.4474935 0.6448910 2.4876002 2023-09-27 11:26:21
K-Means TensorFlow Regression Text 0.9028351 0.6173413 0.9702136 0.4092369 2.2069275 2023-09-28 11:26:21
Neural Network Keras Clustering Tabular 0.7926772 0.6007068 0.3062043 0.7497556 3.0248624 2023-09-29 11:26:21
K-Means TensorFlow Classification Text 0.9341356 0.4157180 0.9984746 0.5518609 4.9978327 2023-09-30 11:26:21
K-Means Scikit-learn Classification Time Series 0.6029206 4.1451506 7.7377491 0.6701525 3.8702996 2023-10-01 11:26:21
Random Forest Scikit-learn Clustering Tabular 0.5559598 0.8990182 NA 0.9745486 2.0495419 2023-10-02 11:26:21
Neural Network TensorFlow Classification Tabular 0.6348748 0.5638425 0.5062336 0.6394212 4.1550100 2023-10-03 11:26:21
SVM TensorFlow Regression Text 0.5285434 0.7108473 0.3100207 0.9038810 0.9364710 2023-10-04 11:26:21
SVM TensorFlow Regression Tabular 0.7655848 0.5792353 0.8165087 5.1312436 0.2492721 2023-10-05 11:26:21
Neural Network TensorFlow Regression Text 0.9683028 0.9644075 0.8839012 0.8034763 1.1017970 2023-10-06 11:26:21
SVM Scikit-learn Classification Tabular 0.5196718 0.5555781 0.8183333 0.9862042 1.7680166 2023-10-07 11:26:21
SVM PyTorch Regression Image 0.5610550 0.6577941 0.3999952 0.4611359 NA 2023-10-08 11:26:21
Neural Network TensorFlow Clustering Text 0.7260995 0.9236382 0.8273995 0.4049920 3.1141328 2023-10-09 11:26:21
K-Means TensorFlow Clustering Tabular 0.9669375 0.9051601 0.8382459 0.6601497 4.5619680 2023-10-10 11:26:21
Neural Network Scikit-learn Classification Time Series 0.6580781 0.5116609 0.7609784 0.4555753 2.5982846 2023-10-11 11:26:21
K-Means Scikit-learn Clustering Tabular 0.7536174 0.8815860 0.8362812 0.8490306 2.5562507 2023-10-12 11:26:21
SVM Keras Clustering Time Series 0.5207864 0.6749121 0.8921450 0.9487292 0.3461907 2023-10-13 11:26:21
SVM TensorFlow Classification Time Series NA 0.6897813 0.7295229 0.6604126 0.2710666 2023-10-14 11:26:21
SVM Keras Regression Image 0.9933151 0.4800880 0.3620233 0.5552270 2.8006838 2023-10-15 11:26:21
Random Forest Scikit-learn Regression Text 0.9825593 0.4483609 0.6413395 0.6606420 2.2470752 2023-10-16 11:26:21
K-Means Scikit-learn Regression Tabular NA 0.8367636 0.3543545 0.8340687 4.2119839 2023-10-17 11:26:21
Random Forest PyTorch Classification Image 0.9759059 0.6978767 0.5852801 0.4054325 0.8873300 2023-10-18 11:26:21
Random Forest TensorFlow Clustering Tabular 0.8195600 0.6621104 NA 0.7536724 0.2223611 2023-10-19 11:26:21
Neural Network Keras Regression Time Series 0.9339591 0.8377049 0.3462069 0.7679750 2.3002900 2023-10-20 11:26:21
Random Forest Scikit-learn Regression Text 0.7273699 0.8593077 0.5441744 0.7826129 1.2636449 2023-10-21 11:26:21
Neural Network Scikit-learn Classification Tabular 0.7577980 0.4953449 0.3776987 0.5452132 0.3431104 2023-10-22 11:26:21
Neural Network Keras Regression Tabular 0.7444233 0.7661351 0.8657646 0.8284316 3.6505316 2023-10-23 11:26:21
Random Forest PyTorch Classification Time Series 0.8334321 0.4812124 0.9633816 NA 0.6468057 2023-10-24 11:26:21
K-Means Scikit-learn Clustering Tabular 0.5698256 0.8508251 0.3506215 0.5195622 3.0807656 2023-10-25 11:26:21
K-Means TensorFlow Classification Tabular 0.5149868 0.7941731 0.9685806 0.9264820 1.4771438 2023-10-26 11:26:21
K-Means Scikit-learn Classification Time Series 0.6539650 0.9739688 0.6658036 0.8432339 0.9501306 2023-10-27 11:26:21
K-Means TensorFlow Classification Image 0.8523404 0.4413748 0.5096960 0.4082473 1.9607825 2023-10-28 11:26:21
K-Means Keras Classification Time Series 0.6009267 NA 0.3538035 0.5490258 4.0270843 2023-10-29 11:26:21
Random Forest TensorFlow Clustering Image 0.8367162 0.5693122 0.6504370 0.5286439 2.0212585 2023-10-30 11:26:21
Random Forest Keras Clustering Text 0.9849560 0.5570234 0.8561609 0.5624838 3.7787713 2023-10-31 11:26:21
SVM TensorFlow Regression Image NA 0.5481873 0.7949605 0.5485359 0.7148829 2023-11-01 11:26:21
K-Means Keras Clustering Image 0.8363011 0.9437527 0.3351582 0.4375158 3.8865800 2023-11-02 11:26:21
Random Forest TensorFlow Clustering Text 0.7218751 0.5497277 0.3510313 0.6753643 1.2609733 2023-11-03 11:26:21
SVM PyTorch Regression Image 0.9340711 0.5631698 0.5820113 0.8396401 3.4192263 2023-11-04 11:26:21
K-Means TensorFlow Clustering Time Series 0.5885749 0.8556390 0.5067033 0.7640390 2.8723296 2023-11-05 11:26:21
Neural Network TensorFlow Classification Image 0.8463130 0.6698439 0.4626690 0.8037230 4.6528091 2023-11-06 11:26:21
SVM PyTorch Classification Text NA 0.8660263 0.4967031 0.4486895 1.9978803 2023-11-07 11:26:21
Random Forest Scikit-learn Regression Time Series 0.9723071 0.4392197 0.8624379 0.9708944 0.4242094 2023-11-08 11:26:21
Neural Network Scikit-learn Regression Text 0.8416240 0.6925427 0.9504596 0.9030951 0.1938619 2023-11-09 11:26:21
Neural Network TensorFlow Regression Text 0.7485874 0.4201682 0.5835719 0.8830542 4.1516069 2023-11-10 11:26:21
Neural Network PyTorch Classification Tabular 0.8089236 0.4375919 0.9342777 0.8937903 2.6712046 2023-11-11 11:26:21
SVM TensorFlow Clustering Tabular 0.9344525 0.9438625 0.5250470 0.9596263 3.8986959 2023-11-12 11:26:21
Random Forest Keras Classification Text 0.7853049 0.4835472 0.6335059 0.7265524 1.2486854 2023-11-13 11:26:21
Neural Network Keras Regression Text 0.5151935 0.7194524 0.4582203 0.5201692 1.7909104 2023-11-14 11:26:21
K-Means TensorFlow Clustering Text 0.9654743 NA 0.7483332 0.7700702 0.2475090 2023-11-15 11:26:21
Neural Network Scikit-learn Clustering Tabular 0.8447634 0.6084060 0.9852868 0.8457289 4.8113124 2023-11-16 11:26:21
Neural Network Scikit-learn Classification Tabular 0.8382567 0.9399000 0.7224452 0.8427504 3.3725527 2023-11-17 11:26:21
SVM TensorFlow Regression Image 0.6078376 0.4130940 0.5504699 0.7128694 4.6727049 2023-11-18 11:26:21
SVM TensorFlow Classification Image 8.2944274 0.7982738 0.7534722 0.4410752 1.4009324 2023-11-19 11:26:21
Random Forest Keras Clustering Text 0.6969322 0.9780367 0.3860445 0.6226678 3.0961717 2023-11-20 11:26:21
K-Means Scikit-learn Classification Time Series 0.8256165 0.7361009 0.9220614 0.9524600 NA 2023-11-21 11:26:21
SVM TensorFlow Classification Time Series 0.5532965 0.9620935 0.6521588 0.7506693 1.6559935 2023-11-22 11:26:21
Neural Network Keras Regression Tabular 0.8289227 0.4313547 0.6145448 0.7229992 4.2557353 2023-11-23 11:26:21
Random Forest Scikit-learn Regression Tabular 0.9997069 0.6512760 0.7101054 0.5613121 4.7410912 2023-11-24 11:26:21
Neural Network PyTorch Clustering Tabular 0.5241060 0.5560947 0.7373487 0.6210694 44.3579008 2023-11-25 11:26:21
Neural Network Keras Classification Text 0.9885871 0.8384926 0.3502431 0.9372077 3.7214282 2023-11-26 11:26:21
SVM Scikit-learn Classification Time Series 0.7034540 0.9887783 0.7778321 0.7998778 1.4595769 2023-11-27 11:26:21
Random Forest Scikit-learn Classification Image 0.9353767 0.5539180 0.4693522 0.8724322 1.4799186 2023-11-28 11:26:21
K-Means Keras Clustering Text 0.8911927 0.7925048 0.7997668 0.6726073 4.8204656 2023-11-29 11:26:21
K-Means PyTorch Clustering Image 0.7835081 0.5188586 0.8757744 0.7781483 0.1503940 2023-11-30 11:26:21
SVM Keras Regression Image 0.8692246 0.7391982 0.8627710 0.5490302 3.6086449 2023-12-01 11:26:21
SVM Scikit-learn Regression Text 0.9392578 0.6783595 NA 0.8232765 3.5606062 2023-12-02 11:26:21
Random Forest TensorFlow Regression Image 0.7020702 0.9832032 0.6641189 0.6565608 3.1511788 2023-12-03 11:26:21
K-Means TensorFlow Classification Time Series 0.6635166 0.7651164 0.4000132 0.6655273 4.9515333 2023-12-04 11:26:21
K-Means Keras Classification Image 0.8337967 0.6097038 0.8427423 0.7895934 1.6282092 2023-12-05 11:26:21
K-Means Scikit-learn Regression Time Series 0.9039230 0.4684575 0.4899866 0.9617684 1.7658621 2023-12-06 11:26:21
Neural Network TensorFlow Classification Time Series 0.8811426 0.4907481 0.6476868 0.4384038 0.4868031 2023-12-07 11:26:21
K-Means Scikit-learn Regression Time Series 0.8989068 0.5351902 0.4989919 0.8948459 2.2697109 2023-12-08 11:26:21
Neural Network Keras Clustering Tabular 0.7177917 0.5505800 0.3936799 0.5754299 13.7913778 2023-12-09 11:26:21
Random Forest TensorFlow Regression Image 0.9089171 0.9103696 0.7406904 0.6663517 1.7825948 2023-12-10 11:26:21
Neural Network Scikit-learn Clustering Text 0.5601045 0.7367337 0.3380324 0.4131480 4.1894018 2023-12-11 11:26:21
Random Forest TensorFlow Classification Text 0.7722445 0.7140345 0.8240517 0.5806276 46.8387412 2023-12-12 11:26:21
K-Means Keras Regression Image NA 0.4688613 0.5223108 0.7015787 1.0100921 2023-12-13 11:26:21
K-Means Scikit-learn Classification Tabular 0.6622929 0.9160838 0.3000943 0.4337056 1.9252938 2023-12-14 11:26:21
Random Forest TensorFlow Clustering Tabular 0.6832308 0.8336886 0.6577904 0.6946574 4.6502898 2023-12-15 11:26:21
SVM Keras Clustering Time Series 0.6980863 0.4406010 NA 0.9562664 0.4028615 2023-12-16 11:26:21
Random Forest TensorFlow Clustering Image NA 0.8247011 0.4933187 0.4632359 0.5525666 2023-12-17 11:26:21
SVM TensorFlow Classification Image 0.6942791 0.7261229 0.7948835 0.8586644 0.8976758 2023-12-18 11:26:21
Neural Network PyTorch Classification Text 0.7243468 0.4490352 3.4388274 0.6458026 3.0165429 2023-12-19 11:26:21
Neural Network PyTorch Classification Image NA 0.6749804 0.8875369 0.7931042 0.8373141 2023-12-20 11:26:21
Neural Network Keras Classification Tabular 0.6866259 NA 0.3026739 0.5561420 4.8435743 2023-12-21 11:26:21
K-Means PyTorch Clustering Image 0.6136348 0.4994647 0.4727767 0.4956954 2.2889012 2023-12-22 11:26:21
Neural Network Keras Clustering Text 0.5365980 9.6741889 0.8186328 0.4962776 2.5574380 2023-12-23 11:26:21
SVM PyTorch Regression Time Series 0.8017243 0.9099852 0.5213891 0.4422952 1.3086272 2023-12-24 11:26:21
Neural Network TensorFlow Regression Text 0.8341064 0.8014134 0.3713247 0.5113880 2.4077052 2023-12-25 11:26:21
Random Forest Keras Classification Text 0.8097452 NA 0.5521637 0.7985316 3.3433768 2023-12-26 11:26:21
Random Forest Keras Clustering Time Series 0.7317470 0.6470593 0.4892753 0.9290144 3.7807680 2023-12-27 11:26:21
K-Means TensorFlow Clustering Text 0.6898929 0.7905841 0.8898983 0.8884754 3.7939536 2023-12-28 11:26:21
Random Forest Keras Clustering Text 0.9316668 0.7272591 0.5193435 NA NA 2023-12-29 11:26:21
SVM Keras Regression Image 0.7595409 0.4373639 0.8522527 0.4662591 4.5313629 2023-12-30 11:26:21
Neural Network PyTorch Regression Image 0.7395909 0.7075016 0.9243105 0.5735125 NA 2023-12-31 11:26:21
K-Means Keras Clustering Time Series 0.5128210 0.8838422 0.6036722 0.5858841 3.8095358 2024-01-01 11:26:21
Neural Network Scikit-learn Clustering Image 0.6706239 0.6755439 0.9369602 5.4997417 0.3724671 2024-01-02 11:26:21
Neural Network TensorFlow Clustering Image NA 0.4311739 0.5641226 0.7090034 0.1337020 2024-01-03 11:26:21
SVM PyTorch Regression Text 0.6994114 0.8717669 0.9748549 0.7213323 1.1409917 2024-01-04 11:26:21
Random Forest PyTorch Clustering Tabular 7.9008618 0.5208183 0.3625027 0.6141316 3.3512656 2024-01-05 11:26:21
Random Forest Scikit-learn Clustering Image 0.7668013 0.5551725 0.7809135 0.6122735 2.1148677 2024-01-06 11:26:21
Neural Network TensorFlow Classification Tabular 0.8039525 0.4988238 0.6456698 0.8971334 2.0717907 2024-01-07 11:26:21
K-Means TensorFlow Classification Image 0.8824416 0.5981290 0.5713542 0.8735757 4.4344758 2024-01-08 11:26:21
Random Forest PyTorch Regression Image 0.9064929 0.8540509 0.7428983 0.5846775 4.4882580 2024-01-09 11:26:21
K-Means Keras Clustering Time Series 0.8590615 0.7116315 0.7927000 0.9482731 4.5549588 2024-01-10 11:26:21
Random Forest PyTorch Clustering Time Series 0.9777618 NA 0.3030543 0.9716890 1.6380026 2024-01-11 11:26:21
K-Means TensorFlow Classification Time Series 0.5091163 0.9266980 0.4168672 0.5960455 3.4861735 2024-01-12 11:26:21
SVM Scikit-learn Regression Tabular 5.9788899 0.9277491 0.7991322 0.6126550 1.4310026 2024-01-13 11:26:21
K-Means PyTorch Classification Text 0.5037814 0.9223471 0.7664698 0.7033805 1.0339882 2024-01-14 11:26:21
SVM TensorFlow Classification Image 0.8237374 5.4327773 0.9762333 0.9646725 1.0046990 2024-01-15 11:26:21
SVM PyTorch Classification Image 9.4901527 0.6707436 0.8327265 0.9257917 NA 2024-01-16 11:26:21
K-Means Keras Classification Image NA 0.9909938 0.9655409 0.4615408 2.2067923 2024-01-17 11:26:21
SVM PyTorch Clustering Tabular 0.9635173 0.8632075 0.7917784 0.6356384 4.1716509 2024-01-18 11:26:21
Neural Network PyTorch Clustering Time Series 0.5301337 0.4163005 0.5086365 0.7320227 0.6874664 2024-01-19 11:26:21
SVM TensorFlow Classification Text 0.9672180 0.4391228 0.3737554 0.7018795 3.7011367 2024-01-20 11:26:21
Random Forest TensorFlow Regression Tabular 0.6758113 0.6783588 0.8472767 0.5163178 2.7041291 2024-01-21 11:26:21
K-Means TensorFlow Regression Time Series 0.5507104 0.9455321 0.7509045 0.9152901 1.5109322 2024-01-22 11:26:21
Neural Network PyTorch Regression Tabular 0.7429359 0.7232211 0.3337302 0.8061645 2.5148160 2024-01-23 11:26:21
K-Means Keras Regression Text 0.6283883 0.6986875 0.5520353 0.9027450 1.5697161 2024-01-24 11:26:21
Random Forest Scikit-learn Classification Time Series 0.6424365 0.4632842 0.9697599 0.9152586 3.0208016 2024-01-25 11:26:21
Random Forest Scikit-learn Clustering Tabular 0.6536450 0.7940681 0.6502808 NA 2.2258352 2024-01-26 11:26:21
Random Forest TensorFlow Regression Text 0.9015129 8.9326190 0.6029104 0.6635264 0.9055871 2024-01-27 11:26:21
SVM TensorFlow Classification Image 0.7695806 0.6282520 0.6203897 0.7663281 0.6712571 2024-01-28 11:26:21
SVM PyTorch Regression Text 0.6556538 0.8653671 0.4462179 0.4962192 2.7788088 2024-01-29 11:26:21
K-Means Scikit-learn Regression Image 0.8051669 0.9786860 0.5580950 0.8041873 4.5218239 2024-01-30 11:26:21
K-Means TensorFlow Classification Tabular 0.8580753 0.5222599 0.5588693 0.5075520 1.7853927 2024-01-31 11:26:21
Neural Network Keras Clustering Time Series 0.6363120 0.7139978 0.3366485 0.8163698 3.6917005 2024-02-01 11:26:21
Neural Network Keras Clustering Time Series 0.7067746 0.5722828 0.8372973 0.5377589 3.3274406 2024-02-02 11:26:21
K-Means TensorFlow Classification Tabular 0.5609430 0.8757127 0.5915531 0.4705308 4.6646191 2024-02-03 11:26:21
SVM TensorFlow Clustering Time Series 0.5905747 0.7465560 0.8755259 0.4991707 4.1228070 2024-02-04 11:26:21
Random Forest Scikit-learn Classification Time Series NA 0.7807495 0.8952436 0.4011953 2.8763658 2024-02-05 11:26:21
K-Means Scikit-learn Classification Image 0.5907192 0.8787485 0.4483958 0.8312438 3.3194494 2024-02-06 11:26:21
Neural Network PyTorch Classification Tabular 0.7625817 0.6375823 0.7601474 0.8394406 4.5020937 2024-02-07 11:26:21
SVM Scikit-learn Clustering Text 0.8545231 0.9490540 0.6305973 0.7089600 2.0576420 2024-02-08 11:26:21
K-Means Keras Classification Image NA 0.7198173 0.9161097 0.4968177 1.7012941 2024-02-09 11:26:21
K-Means Keras Regression Tabular 0.7836561 0.4947729 0.4510185 0.4501326 0.1529743 2024-02-10 11:26:21
SVM TensorFlow Clustering Image 0.6282814 0.8175395 0.7744692 0.4114774 4.1501003 2024-02-11 11:26:21
K-Means Scikit-learn Clustering Text 0.9814634 0.8759568 0.7254265 0.4994489 4.0248054 2024-02-12 11:26:21
SVM PyTorch Clustering Text 0.7417728 NA 0.5067110 0.9345133 0.6118271 2024-02-13 11:26:21
Random Forest Keras Clustering Tabular 0.9029963 0.9143076 0.3956206 0.5449221 2.9266012 2024-02-14 11:26:21
SVM Keras Regression Text 0.7751133 0.9436860 0.7561478 0.6125628 2.3735103 2024-02-15 11:26:21
SVM PyTorch Clustering Image 0.5217063 0.5661427 0.8170182 4.6320729 0.6816746 2024-02-16 11:26:21
SVM PyTorch Clustering Text 0.8165757 0.9901129 0.5209391 0.5334168 4.9047843 2024-02-17 11:26:21
K-Means Keras Classification Image 0.9757017 0.4844269 0.7513828 0.7115347 1.1520098 2024-02-18 11:26:21
K-Means Keras Clustering Image 0.8008059 0.5212094 5.7659164 0.7646516 0.4294475 2024-02-19 11:26:21
SVM PyTorch Regression Text 0.9095944 0.5105349 0.7992448 0.5472114 3.0147522 2024-02-20 11:26:21
K-Means Keras Regression Tabular 0.9421032 0.9363938 0.4394507 0.4346393 3.7204467 2024-02-21 11:26:21
Neural Network TensorFlow Regression Time Series 0.6140399 0.7925755 0.9231505 0.6346197 0.2589729 2024-02-22 11:26:21
Neural Network TensorFlow Regression Text 0.6060224 0.4912626 0.5011867 0.5405220 3.3277164 2024-02-23 11:26:21
K-Means PyTorch Clustering Image 0.8054905 0.6641941 0.5574502 0.5317324 2.7082302 2024-02-24 11:26:21
K-Means Scikit-learn Clustering Image 0.7055142 0.7691788 0.3406644 0.9759177 0.6059985 2024-02-25 11:26:21
SVM PyTorch Regression Text 0.9199307 0.4500785 0.3780585 0.7697803 0.9453670 2024-02-26 11:26:21
Random Forest PyTorch Regression Tabular 0.9500116 0.9294498 0.6611033 0.7341271 2.8830741 2024-02-27 11:26:21
K-Means Scikit-learn Clustering Tabular 0.6767107 0.8821621 0.4873149 NA 1.5521523 2024-02-28 11:26:21
Neural Network Scikit-learn Regression Image 0.6184353 0.7031241 0.8848362 0.6573666 4.6977471 2024-02-29 11:26:21
SVM Scikit-learn Classification Image 0.8902628 0.9802760 0.3102854 0.7245430 4.1118271 2024-03-01 11:26:21
K-Means TensorFlow Regression Tabular 0.6374030 0.6506566 0.5653658 8.1785788 4.9194793 2024-03-02 11:26:21
Neural Network Keras Regression Time Series 0.9113072 0.9904662 0.5361419 0.8212877 1.3723871 2024-03-03 11:26:21
Neural Network TensorFlow Classification Text 0.7118691 0.8007520 0.3135293 0.5030163 4.8516458 2024-03-04 11:26:21
Random Forest Keras Classification Image 0.8337749 0.7808028 0.3870579 0.7000677 2.2130460 2024-03-05 11:26:21
SVM TensorFlow Clustering Image 0.5477677 0.4995729 0.5895499 0.6471749 1.8028421 2024-03-06 11:26:21
SVM Keras Clustering Tabular 0.8119297 0.9291567 0.6450052 0.9223162 0.3467127 2024-03-07 11:26:21
Random Forest TensorFlow Classification Text NA 0.6564938 0.5830028 0.7788514 0.3585498 2024-03-08 11:26:21
Random Forest Scikit-learn Classification Tabular 0.7933042 0.4973400 0.6716564 0.7195024 3.4908552 2024-03-09 11:26:21
SVM PyTorch Clustering Text 0.5840071 4.0756451 0.7165922 0.4692368 2.3438521 2024-03-10 11:26:21
SVM TensorFlow Regression Text 0.8684369 0.7358534 0.3069467 0.7633827 1.2099560 2024-03-11 11:26:21
K-Means Keras Classification Text 0.9313985 0.7164398 0.6248666 0.4703223 3.1084545 2024-03-12 11:26:21
K-Means Scikit-learn Clustering Image 0.6083699 0.8316122 0.9744495 0.6023239 1.3390048 2024-03-13 11:26:21
Random Forest TensorFlow Regression Time Series 0.5478573 0.9341548 0.6633226 0.4857052 2.9303956 2024-03-14 11:26:21
K-Means TensorFlow Classification Tabular 0.5118193 0.4476440 0.7742796 0.8152442 1.8601229 2024-03-15 11:26:21
Neural Network Scikit-learn Regression Time Series 0.8209858 0.8388979 0.5183047 0.5237513 4.1354059 2024-03-16 11:26:21
K-Means TensorFlow Clustering Tabular 0.8035470 0.5124472 0.8417940 0.6351156 4.1214112 2024-03-17 11:26:21
K-Means PyTorch Classification Text 0.7733487 0.9149062 0.8410450 9.3740487 2.4390170 2024-03-18 11:26:21
Neural Network TensorFlow Regression Image 0.6159735 0.8914381 0.6649072 0.5225898 1.8190136 2024-03-19 11:26:21
Random Forest PyTorch Classification Time Series 0.6954530 0.7244763 0.9832097 0.7046830 1.8765430 2024-03-20 11:26:21
Neural Network PyTorch Regression Image 0.7972382 0.8261457 0.3878852 0.6515630 4.0480012 2024-03-21 11:26:21
K-Means Scikit-learn Clustering Tabular 0.7483834 0.5886101 0.3118634 0.4108744 1.7080795 2024-03-22 11:26:21
Random Forest TensorFlow Clustering Image 0.9938928 0.6827007 0.8391107 0.8756549 1.1243909 2024-03-23 11:26:21
K-Means Scikit-learn Classification Image 0.5682199 0.8929821 0.8650089 0.4414275 0.5149729 2024-03-24 11:26:21
Neural Network TensorFlow Regression Time Series NA 0.6755591 0.3841451 0.6845530 2.3867963 2024-03-25 11:26:21
Neural Network TensorFlow Classification Text 0.7021594 0.6146790 4.8590798 0.7364070 2.4624473 2024-03-26 11:26:21
K-Means Keras Classification Image 0.7140998 0.6965275 0.3122867 0.7770559 4.2204527 2024-03-27 11:26:21
SVM TensorFlow Regression Text 0.8587989 0.8969496 0.5053160 0.8131790 1.1840696 2024-03-28 11:26:21
SVM Scikit-learn Regression Tabular 0.8462181 0.6011248 0.8411993 0.5519634 1.9668572 2024-03-29 11:26:21
Neural Network PyTorch Regression Image 0.9956280 0.5042570 0.6625724 0.4059873 4.0611881 2024-03-30 11:26:21
Neural Network TensorFlow Clustering Time Series 0.5641971 0.8272084 5.4366692 0.8340662 4.1357557 2024-03-31 11:26:21
SVM Keras Classification Tabular 0.5520548 0.8955869 0.5602175 0.7213940 1.9845943 2024-04-01 11:26:21
SVM Scikit-learn Classification Tabular 0.8621694 0.4603825 0.3009475 NA 2.3497077 2024-04-02 11:26:21
K-Means TensorFlow Clustering Time Series 0.7891935 0.5439245 0.5098604 0.8928330 1.5861009 2024-04-03 11:26:21
K-Means Scikit-learn Regression Image 0.6370803 0.4851832 0.7525211 NA NA 2024-04-04 11:26:21
SVM Keras Clustering Image 0.5397097 0.6087648 0.9819361 0.6910568 0.6813545 2024-04-05 11:26:21
K-Means Keras Clustering Tabular 0.5428291 0.6702106 0.8929426 0.6001770 4.6753722 2024-04-06 11:26:21
K-Means Scikit-learn Clustering Image 0.9470954 0.8492958 0.3165165 0.8749349 NA 2024-04-07 11:26:21
Random Forest TensorFlow Classification Text 0.5959337 0.7906886 0.9289924 0.6707765 2.7047035 2024-04-08 11:26:21
K-Means PyTorch Clustering Time Series 0.6616858 0.7725571 0.8482389 0.5100653 0.2600294 2024-04-09 11:26:21
Neural Network Keras Classification Text 0.6133282 0.6114250 0.8462633 0.9129844 2.4304213 2024-04-10 11:26:21
K-Means TensorFlow Classification Image 0.6774982 0.9048685 0.6205970 9.2953593 2.0979657 2024-04-11 11:26:21
K-Means TensorFlow Classification Tabular 0.5347119 0.6827723 0.5786037 0.6797859 0.8899722 2024-04-12 11:26:21
SVM PyTorch Regression Time Series 0.7595299 0.9874630 0.5120410 0.4454198 3.3164071 2024-04-13 11:26:21
Neural Network Keras Classification Image 0.5338063 0.7804853 0.3459864 0.6326957 4.8586394 2024-04-14 11:26:21
K-Means Keras Classification Image 0.9001783 0.4757588 0.4597648 NA 2.8547252 2024-04-15 11:26:21
K-Means TensorFlow Regression Tabular 0.6168560 0.8057065 0.4726225 0.9410644 3.6022911 2024-04-16 11:26:21
Random Forest TensorFlow Classification Tabular 0.7700060 0.5950624 0.6388539 0.5220836 4.3540722 2024-04-17 11:26:21
K-Means PyTorch Clustering Text 0.9400395 0.8117963 0.8232111 0.4401842 2.1465683 2024-04-18 11:26:21
K-Means PyTorch Clustering Time Series 0.8254387 0.4417847 0.6316669 0.9264103 0.6701566 2024-04-19 11:26:21
Random Forest Keras Classification Time Series 0.7664789 NA 0.3404911 0.6336430 3.1026996 2024-04-20 11:26:21
SVM Scikit-learn Regression Tabular 0.6621669 0.9134424 0.9704529 0.7250566 46.9856258 2024-04-21 11:26:21
K-Means Keras Clustering Text 0.6665010 0.5363077 0.9599070 0.9808395 3.3427109 2024-04-22 11:26:21
Random Forest TensorFlow Regression Time Series 0.8347435 0.9022247 0.8496327 0.4399388 0.4763349 2024-04-23 11:26:21
Neural Network Keras Regression Tabular 0.9970697 0.5675657 0.9939295 0.7889908 1.8378644 2024-04-24 11:26:21
SVM Scikit-learn Clustering Image NA 0.7857291 0.6811376 0.4444633 2.7982506 2024-04-25 11:26:21
Neural Network PyTorch Regression Time Series 0.7788917 0.8164903 0.9739378 0.6252814 2.0757180 2024-04-26 11:26:21
Neural Network Scikit-learn Clustering Text 0.8653253 0.7075929 0.3529235 0.8822887 4.1848557 2024-04-27 11:26:21
Random Forest Scikit-learn Regression Text 0.7326028 0.5831864 0.5559765 0.6600833 4.0918879 2024-04-28 11:26:21
K-Means Scikit-learn Clustering Text 0.5300712 NA 0.4577669 0.9983094 3.0979720 2024-04-29 11:26:21
Random Forest PyTorch Classification Image 0.7811484 0.4199136 0.4370832 0.7354356 1.9299439 2024-04-30 11:26:21
Random Forest Scikit-learn Clustering Time Series 0.9788126 0.5823678 0.3985621 0.5926973 1.3511482 2024-05-01 11:26:21
Random Forest Scikit-learn Clustering Time Series 0.5876515 0.7918977 0.7356895 5.3206680 0.6187842 2024-05-02 11:26:21
Random Forest Scikit-learn Clustering Image NA 0.9629829 0.8469253 0.6103123 1.8389986 2024-05-03 11:26:21
SVM TensorFlow Clustering Time Series 0.6004668 0.9227227 0.7048089 0.6235200 2.1245453 2024-05-04 11:26:21
K-Means Scikit-learn Regression Image 0.7679138 0.8596389 0.4028739 0.4412282 3.4115323 2024-05-05 11:26:21
Neural Network PyTorch Classification Image 0.5483382 0.8730684 0.8677866 0.6217445 3.3249831 2024-05-06 11:26:21
Neural Network Scikit-learn Clustering Image NA 0.7989909 0.7451683 0.6785431 0.4435299 2024-05-07 11:26:21
K-Means Scikit-learn Regression Text 0.8780817 0.5561721 0.5717160 NA 2.0367706 2024-05-08 11:26:21
Neural Network PyTorch Classification Time Series 0.6737858 0.9443170 0.7718912 0.7940377 0.9907164 2024-05-09 11:26:21
K-Means Scikit-learn Classification Image 0.8324559 0.8024394 0.4819335 0.8252594 0.8683823 2024-05-10 11:26:21
Neural Network PyTorch Classification Time Series 0.8977250 0.7362644 0.5416345 0.4050182 4.1537805 2024-05-11 11:26:21
Random Forest Scikit-learn Classification Text 0.9635889 0.4665937 0.9422401 0.5306004 0.3053846 2024-05-12 11:26:21
Neural Network PyTorch Clustering Image 0.6173210 0.6682333 0.5033532 0.7970106 2.1539961 2024-05-13 11:26:21
K-Means Scikit-learn Clustering Tabular 0.6996580 0.6762150 0.6289918 0.6903936 0.9241979 2024-05-14 11:26:21
K-Means Scikit-learn Regression Text 0.5762080 0.9187382 0.9230988 0.4031881 4.4087181 2024-05-15 11:26:21
K-Means TensorFlow Regression Tabular 0.9962418 0.7279889 0.7954049 0.8826968 28.2949852 2024-05-16 11:26:21
K-Means TensorFlow Regression Time Series 0.9635005 0.6282403 0.3435423 0.8636857 1.2337651 2024-05-17 11:26:21
K-Means Scikit-learn Classification Time Series 0.7699786 0.9860802 0.4031121 0.7290515 2.5618937 2024-05-18 11:26:21
SVM Scikit-learn Classification Text 0.9210166 NA 0.3054891 0.4398780 3.6842832 2024-05-19 11:26:21
SVM Keras Clustering Time Series 0.7604790 0.6535291 0.7416453 0.8599090 4.7947782 2024-05-20 11:26:21
Neural Network PyTorch Classification Tabular NA 0.4252148 0.6133714 0.7502282 1.1801854 2024-05-21 11:26:21
K-Means Keras Classification Image 0.5445622 0.8439425 0.3939924 0.8691760 4.4422907 2024-05-22 11:26:21
Neural Network TensorFlow Classification Image 0.8776352 0.9508459 0.9705539 0.8510482 4.6780902 2024-05-23 11:26:21
K-Means TensorFlow Regression Image 0.5638567 NA 0.6707618 NA 4.5904607 2024-05-24 11:26:21
K-Means TensorFlow Regression Text 0.9130338 0.9150050 0.4693253 0.7108048 3.2135303 2024-05-25 11:26:21
SVM TensorFlow Clustering Time Series 0.8910140 0.5753309 0.6504225 0.4841947 3.1861212 2024-05-26 11:26:21
SVM Scikit-learn Classification Tabular 0.8543723 0.9464621 0.7757342 0.8026945 2.0757321 2024-05-27 11:26:21
Random Forest TensorFlow Classification Time Series 0.5180802 0.8523771 0.3533675 0.7722843 3.7867496 2024-05-28 11:26:21
SVM Scikit-learn Clustering Time Series 0.6515642 0.8829441 0.4922927 0.8453519 2.7033407 2024-05-29 11:26:21
Random Forest Scikit-learn Classification Time Series 0.6315563 0.4108039 0.8648758 0.5019286 3.4194465 2024-05-30 11:26:21
Random Forest TensorFlow Classification Image 0.6800682 0.9776862 0.6217654 0.5169851 2.1994437 2024-05-31 11:26:21
Random Forest TensorFlow Regression Tabular 0.5438214 0.8360026 0.6826040 0.9342457 3.6843327 2024-06-01 11:26:21
Random Forest PyTorch Clustering Tabular 0.9684789 0.5828507 0.6029712 NA 4.1399104 2024-06-02 11:26:21
Random Forest Keras Clustering Time Series 0.7769011 0.8976368 0.3307300 0.9449371 0.8175377 2024-06-03 11:26:21
Neural Network TensorFlow Clustering Image 0.6527622 0.5689127 0.4160248 0.8552292 4.1813556 2024-06-04 11:26:21
SVM PyTorch Clustering Image 0.6984908 0.9236523 0.6118791 0.7582932 2.7474530 2024-06-05 11:26:21
Random Forest PyTorch Regression Tabular 0.7236013 0.4675482 0.4464298 0.7924796 4.2388630 2024-06-06 11:26:21
K-Means Keras Regression Image 0.8002972 0.8222116 0.3349853 0.9334768 2.2133784 2024-06-07 11:26:21
SVM Scikit-learn Classification Image 0.7578397 0.7244191 0.8905548 0.7472841 1.9572984 2024-06-08 11:26:21
SVM TensorFlow Classification Image 0.9596960 0.4579207 0.9868348 0.7794348 4.5837976 2024-06-09 11:26:21
Random Forest Keras Classification Text 0.7484817 0.5451363 0.8552000 0.4938860 1.3307294 2024-06-10 11:26:21
Neural Network TensorFlow Classification Time Series 0.9960790 0.4074424 0.8976494 0.6843527 4.2366404 2024-06-11 11:26:21
Random Forest PyTorch Regression Tabular 0.9257125 0.6812608 0.4693840 0.8298383 2.4717008 2024-06-12 11:26:21
Neural Network PyTorch Clustering Tabular 0.6042553 0.5807592 0.9724389 0.5625656 2.6190677 2024-06-13 11:26:21
K-Means Scikit-learn Regression Text 0.9652976 0.7590145 0.4378480 0.5213525 1.6114946 2024-06-14 11:26:21
SVM PyTorch Classification Time Series 0.5581832 0.5783427 0.9660009 0.5882961 2.9079068 2024-06-15 11:26:21
K-Means PyTorch Regression Tabular 0.9087249 0.5799515 0.9963735 0.5449004 1.6935349 2024-06-16 11:26:21
Neural Network Keras Clustering Image 0.6903116 0.8459159 0.7982060 0.5289476 0.2924201 2024-06-17 11:26:21
Neural Network TensorFlow Clustering Tabular 0.9389872 0.4288857 0.9868006 0.6549105 1.4327932 2024-06-18 11:26:21
K-Means TensorFlow Regression Image 0.9340283 0.9417370 0.6986778 0.9447031 0.1312573 2024-06-19 11:26:21
Neural Network Keras Regression Text NA 0.9113583 0.4816792 0.7042358 4.8939385 2024-06-20 11:26:21
K-Means Scikit-learn Regression Image 0.8950152 0.8006828 0.6058971 0.5127522 4.8321613 2024-06-21 11:26:21
SVM TensorFlow Classification Tabular 0.6523396 0.7559329 0.7154927 0.4461830 2.0362610 2024-06-22 11:26:21
K-Means PyTorch Regression Tabular 0.5404596 0.9353815 0.3511571 0.8176937 3.6690172 2024-06-23 11:26:21
SVM TensorFlow Regression Tabular 0.7014901 0.5111981 0.7356403 0.6296793 1.7944499 2024-06-24 11:26:21
K-Means PyTorch Regression Text 0.5867623 0.4473815 0.9868245 0.8930986 3.3883565 2024-06-25 11:26:21
Neural Network Scikit-learn Clustering Text 0.8474755 0.5437061 0.4330754 0.7957065 4.0466072 2024-06-26 11:26:21
K-Means Keras Clustering Time Series 0.6730499 0.8767470 0.8548166 0.8777449 4.7391056 2024-06-27 11:26:21
Neural Network TensorFlow Classification Tabular 0.9878051 0.4208022 0.9355292 0.5631676 2.0610402 2024-06-28 11:26:21
K-Means Keras Classification Image 0.8204860 0.7496841 0.9605911 0.8154154 3.9381601 2024-06-29 11:26:21
K-Means TensorFlow Clustering Text 0.9112403 0.9972625 0.9720949 0.5584372 1.4030494 2024-06-30 11:26:21
Random Forest PyTorch Clustering Tabular 0.5662623 0.9134177 0.6650217 0.9634411 NA 2024-07-01 11:26:21
Neural Network Keras Clustering Tabular 0.9310072 NA 0.9841155 0.7818246 NA 2024-07-02 11:26:21
Random Forest TensorFlow Regression Text 0.9613786 0.4381845 0.8301171 0.5947071 3.0572912 2024-07-03 11:26:21
Neural Network PyTorch Regression Time Series 0.7435310 0.8988241 0.4131700 0.5617071 3.3315793 2024-07-04 11:26:21
Random Forest Keras Clustering Text 0.8031265 0.7593871 0.6338301 0.5145560 3.4714232 2024-07-05 11:26:21
SVM Keras Regression Text 0.8824049 0.4689598 0.8028318 0.8167846 0.6898991 2024-07-06 11:26:21
K-Means TensorFlow Regression Image 0.5874193 0.4563144 0.4731382 0.5312294 4.6987959 2024-07-07 11:26:21
Neural Network TensorFlow Classification Image 0.7512830 0.9457761 0.7484306 0.7571820 0.9877981 2024-07-08 11:26:21
Neural Network TensorFlow Clustering Time Series 0.6993315 0.8015202 0.7666203 0.5587810 3.1514917 2024-07-09 11:26:21
K-Means PyTorch Clustering Tabular 0.5731870 0.8975721 0.4138898 0.7971814 1.1911128 2024-07-10 11:26:21
Neural Network TensorFlow Regression Text 0.6837672 0.9273873 0.6955597 0.8889638 1.6053604 2024-07-11 11:26:21
Neural Network Keras Clustering Time Series 0.5340862 0.7430634 0.8401388 0.8668151 2.7776469 2024-07-12 11:26:21
Random Forest Keras Classification Image 0.5129060 0.7104678 NA 0.8565108 2.1442478 2024-07-13 11:26:21
Neural Network TensorFlow Classification Text 0.5675831 0.6582564 0.3084722 0.5126334 0.8851054 2024-07-14 11:26:21
SVM TensorFlow Classification Tabular 0.9815576 0.5901680 0.3063269 0.4530310 0.9375884 2024-07-15 11:26:21
SVM Scikit-learn Clustering Image 0.7747648 0.6607576 0.5499205 0.8193693 2.1489095 2024-07-16 11:26:21
K-Means Keras Clustering Image 0.9829111 0.8643278 0.9483358 0.6210083 3.8106866 2024-07-17 11:26:21
Neural Network PyTorch Clustering Image 0.7162489 0.7611541 0.4600737 0.6594077 4.5000268 2024-07-18 11:26:21
K-Means Scikit-learn Clustering Time Series 0.6559081 0.9355140 0.7440546 0.4186895 0.5122255 2024-07-19 11:26:21
SVM TensorFlow Clustering Tabular 0.7530709 0.6660280 0.4554531 0.5557459 2.0262091 2024-07-20 11:26:21
Neural Network TensorFlow Classification Text 0.7197558 0.7642537 0.5251690 0.4202058 0.5912033 2024-07-21 11:26:21
Neural Network Keras Regression Time Series 0.5528323 0.7787845 0.8936295 0.9275115 0.1813032 2024-07-22 11:26:21
K-Means TensorFlow Classification Time Series 0.8204132 0.7550183 0.8102030 0.5460380 3.3424215 2024-07-23 11:26:21
SVM PyTorch Clustering Time Series 0.6080191 0.8215803 0.3667795 0.7344023 3.0520023 2024-07-24 11:26:21
Neural Network Keras Classification Image 0.8097940 0.5424601 0.6000914 0.4233876 0.8987937 2024-07-25 11:26:21
K-Means TensorFlow Clustering Tabular 0.8251005 0.7074183 0.3204188 NA 1.2456829 2024-07-26 11:26:21
SVM Keras Clustering Time Series 0.5760124 0.4625349 0.6366231 0.5938164 0.2161680 2024-07-27 11:26:21
K-Means TensorFlow Classification Text 0.5306748 0.6307068 0.7637038 0.9387515 4.1912413 2024-07-28 11:26:21
Random Forest TensorFlow Clustering Tabular 0.8903808 0.6926002 0.3829518 0.9328709 4.8759839 2024-07-29 11:26:21
Neural Network TensorFlow Regression Text 0.7299002 0.7913346 0.5022195 0.5951744 0.7637517 2024-07-30 11:26:21
Random Forest PyTorch Clustering Tabular 0.5290819 0.9703186 0.5784856 0.9405765 1.2342285 2024-07-31 11:26:21
SVM PyTorch Regression Text 0.9974332 0.7603906 0.9436717 0.9976946 4.3552067 2024-08-01 11:26:21
Random Forest Keras Regression Image 0.5288903 0.8461563 0.9952785 0.8952494 4.6396726 2024-08-02 11:26:21
Random Forest TensorFlow Clustering Time Series 0.8475176 0.7037596 0.3314379 0.9069228 2.1561742 2024-08-03 11:26:21
SVM TensorFlow Classification Text 0.9918395 0.7804624 0.8327055 0.5494052 0.3480501 2024-08-04 11:26:21
K-Means PyTorch Regression Tabular 0.6195901 0.4425593 0.5602068 0.7460214 0.2930200 2024-08-05 11:26:21
Random Forest TensorFlow Classification Time Series 0.5711247 0.5526349 NA 0.4403535 2.9123543 2024-08-06 11:26:21
K-Means PyTorch Regression Time Series 0.5606925 0.6171119 0.8279855 0.4569508 20.2518603 2024-08-07 11:26:21
Random Forest TensorFlow Clustering Tabular 0.6516376 0.6834960 0.9428985 0.9993356 0.2403969 2024-08-08 11:26:21
K-Means Keras Clustering Text 0.5505229 0.4273892 0.9656500 0.5959835 2.9579367 2024-08-09 11:26:21
SVM TensorFlow Regression Image 0.8460807 0.4840145 0.7039858 NA 0.1553824 2024-08-10 11:26:21
Random Forest TensorFlow Clustering Image 0.5311459 0.5660886 5.4998481 0.8839990 3.9581226 2024-08-11 11:26:21
SVM Scikit-learn Regression Text 0.7547111 0.9829196 0.8512844 0.9148140 1.6015208 2024-08-12 11:26:21
Random Forest PyTorch Clustering Time Series 0.9983484 0.5988082 0.4757010 0.9985770 0.2972642 2024-08-13 11:26:21
SVM TensorFlow Classification Time Series 0.9069851 0.6892246 0.6948518 0.5448979 2.9829259 2024-08-14 11:26:21
SVM TensorFlow Clustering Time Series 0.8076097 0.5176586 NA 0.4242105 2.0486536 2024-08-15 11:26:21
Random Forest Scikit-learn Classification Time Series 0.6531268 NA 0.7596418 0.6467149 4.8716220 2024-08-16 11:26:21
Random Forest Scikit-learn Clustering Text 0.8119479 0.5684099 0.4682792 0.4780484 2.7665734 2024-08-17 11:26:21
Random Forest PyTorch Classification Time Series 0.7635207 0.5241955 0.4341151 0.4134555 1.4486720 2024-08-18 11:26:21
Neural Network Scikit-learn Clustering Tabular 0.7130417 0.7099436 0.9427674 0.6162561 3.5762504 2024-08-19 11:26:21
K-Means Keras Classification Time Series 0.5653552 0.4033035 0.3712626 0.8702430 1.4303108 2024-08-20 11:26:21
Neural Network Scikit-learn Regression Image 0.9433021 0.4045984 0.6541705 0.7397084 4.5307921 2024-08-21 11:26:21
Neural Network Keras Regression Time Series NA 0.5314413 0.4545969 0.5876668 1.9361377 2024-08-22 11:26:21
SVM PyTorch Classification Text 0.5973113 0.4220328 0.3272510 0.7926052 2.7943560 2024-08-23 11:26:21
Random Forest PyTorch Clustering Image 0.6838797 0.4648155 0.3252132 0.5392109 0.3478082 2024-08-24 11:26:21
K-Means PyTorch Classification Text 0.7070649 0.6033164 0.4226552 0.4086289 2.1879941 2024-08-25 11:26:21
SVM PyTorch Clustering Text 0.9137689 0.8815514 0.9067392 0.8586120 NA 2024-08-26 11:26:21
Neural Network PyTorch Regression Image 0.8668072 0.7432292 0.4977351 0.7742459 4.0476791 2024-08-27 11:26:21
Random Forest Keras Clustering Text 0.8846524 NA 0.9653214 0.8573816 1.1991609 2024-08-28 11:26:21
SVM Keras Regression Text 0.5055156 5.7609329 0.7071356 0.4233628 1.2077874 2024-08-29 11:26:21
Random Forest PyTorch Clustering Time Series 0.7080770 0.9590522 0.6056299 0.9022718 4.1047960 2024-08-30 11:26:21
SVM Scikit-learn Clustering Tabular 0.7406721 0.6382090 0.7060622 0.7717157 4.6589523 2024-08-31 11:26:21
Random Forest TensorFlow Clustering Text 0.5095961 0.4522557 0.6616889 0.7380372 0.5672685 2024-09-01 11:26:21
Neural Network PyTorch Classification Text 0.6299066 0.7702399 0.8311434 7.7476843 2.3052873 2024-09-02 11:26:21
K-Means Scikit-learn Classification Text 0.8801449 0.4683030 0.4977472 NA 1.7535260 2024-09-03 11:26:21
Neural Network TensorFlow Regression Time Series 0.5685549 0.6071339 0.5471353 0.7521619 4.3663734 2024-09-04 11:26:21
K-Means PyTorch Regression Tabular 0.7676551 7.0444716 0.9258660 0.7485702 0.5092721 2024-09-05 11:26:21
Random Forest PyTorch Classification Image 0.6076009 0.9245335 0.9625196 0.9944075 1.1345171 2024-09-06 11:26:21
SVM Keras Regression Time Series NA 0.6961279 0.9247908 0.8540381 3.7870949 2024-09-07 11:26:21
Neural Network TensorFlow Classification Time Series 0.6206007 0.8213553 0.5936136 0.6653763 0.3513399 2024-09-08 11:26:21
K-Means TensorFlow Clustering Tabular 0.9879369 0.9956901 0.8462559 0.8244382 2.5134234 2024-09-09 11:26:21
Neural Network Scikit-learn Regression Tabular 0.9007686 0.4788935 NA 0.6335383 2.2663245 2024-09-10 11:26:21
K-Means Keras Regression Time Series 0.9797883 0.5648389 0.6482779 0.5373253 1.7385658 2024-09-11 11:26:21
Neural Network Keras Classification Image 0.7439270 0.6367456 0.4432761 0.7581111 2.0334043 2024-09-12 11:26:21
K-Means Keras Regression Time Series 0.5548681 0.6530969 0.7137912 0.9569093 2.6967089 2024-09-13 11:26:21
K-Means Keras Regression Tabular 0.7739797 0.6466126 0.4302951 0.9574837 0.8907001 2024-09-14 11:26:21
Random Forest TensorFlow Clustering Image 0.7271887 NA 0.5320953 0.6051432 2.9027798 2024-09-15 11:26:21
K-Means TensorFlow Regression Tabular 0.9221785 0.8284196 0.8985211 0.7166065 4.0466184 2024-09-16 11:26:21
Random Forest PyTorch Clustering Image 0.5490413 0.7647431 0.4449531 0.5269815 3.8247886 2024-09-17 11:26:21

Estructura general

No.filas : 560
No. Columnas : 10
Nombres de las variables: “Algorithm”, “Framework”, “Problem_Type”, “Dataset_Type”, “Accuracy”, “Precision”, “Recall” “F1_Score”, “Training_Time” y “Date”
- Tipos de variables:
cualitativas nominales: “Algorithm”, “Framework”, “Problem_Type”, “Dataset_Type”, “Date”
cualitativas Ordinales : no hay
cuantitativas: “Accuracy”, “Precision”, “Recall”, “F1_Score”, “Training_Time”

Ninguna variable necesita un tipo de conversión, porque cada una está registrada con su tipo de dato correspondiente.

Resumen estadístico

Analizaremos cada una de las variables, viendo sus medidas descriptivas, con el fin de tener una visión rápida de los datos y detectar patrones, anomalías o tendencias. Además, usaremos gráficas de barras para las variables categóricas e histogramas para las numéricas.

Variables Categóricas

-Algorithm

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Algorithm))
colnames(tabla_frecuencia)[1] <- "Algorithm"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
Frecuencia de Algoritmos
Algorithm Frecuencia
K-Means 163
Neural Network 135
Random Forest 126
SVM 136

Se puede notar que todos los algoritmos tienen casi la misma ocurrencia. Además, la moda de esta variable categórica es K_Means.

GRÁFICA:

library(ggplot2)
library(dplyr)

tabla_1 <- datos %>%
  dplyr::group_by(Algorithm) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Algorithm)

G1<-ggplot(tabla_1, aes(x =Algorithm, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,170))+
  labs(x="Algoritmo", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Algorithm")
G1

-Framework

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Framework))
colnames(tabla_frecuencia)[1] <- "Framework"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
Frecuencia de Algoritmos
Framework Frecuencia
Keras 124
PyTorch 135
Scikit-learn 134
TensorFlow 167

Se puede apreciar que todos los frameworks son usados casi con la misma ocurrencia, la moda de esta variable es TensorFlow.

GRÁFICA

library(ggplot2)
library(dplyr)

tabla_2 <- datos %>%
  dplyr::group_by(Framework) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Framework)

G2<-ggplot(tabla_2, aes(x = Framework, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,180))+
  labs(x="Framework", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Framework")
G2

-Problem_Type

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Problem_Type))
colnames(tabla_frecuencia)[1] <- "Problem_Type"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
Frecuencia de Algoritmos
Problem_Type Frecuencia
Classification 175
Clustering 196
Regression 189

Se observa que todos tipos de problemas abordados por los modelos se utilizan con una frecuencia similar, siendo Clustering la moda de esta variable.

GRÁFICA

library(ggplot2)
library(dplyr)

tabla_3 <- datos %>%
  dplyr::group_by(Problem_Type) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Problem_Type)

G3<-ggplot(tabla_3, aes(x = Problem_Type, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,210))+
  labs(x="Problem_Type", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Problem_Type")
G3

-Dataset_Type

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Dataset_Type))
colnames(tabla_frecuencia)[1] <- "Dataset_Type"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
Frecuencia de Algoritmos
Dataset_Type Frecuencia
Image 157
Tabular 136
Text 143
Time Series 124

Se aprecia que todos los tipos de datos usados en el entrenamiento del modelo tienen casi la misma frecuencia, siendo Image el tipo de dato más usado.

GRÁFICA

library(ggplot2)
library(dplyr)

tabla_4 <- datos %>%
  dplyr::group_by(Dataset_Type) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Dataset_Type)

G4<-ggplot(tabla_4, aes(x = Dataset_Type, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,165))+
  labs(x="Dataset_Type", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Dataset_Type")
G4

**Filtrar la base para compenderla mejor*

  1. filtro 1
library(dplyr)
df_clustering <- filter(datos, Problem_Type == "Clustering" & Algorithm == "K-Means")
str(df_clustering)
## tibble [58 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:58] "K-Means" "K-Means" "K-Means" "K-Means" ...
##  $ Framework    : chr [1:58] "Keras" "Keras" "PyTorch" "Keras" ...
##  $ Problem_Type : chr [1:58] "Clustering" "Clustering" "Clustering" "Clustering" ...
##  $ Dataset_Type : chr [1:58] "Time Series" "Text" "Text" "Text" ...
##  $ Accuracy     : num [1:58] 0.744 0.84 0.699 NA 0.96 ...
##  $ Precision    : num [1:58] 0.49 0.663 0.782 0.511 0.504 ...
##  $ Recall       : num [1:58] 0.877 0.558 0.775 0.691 0.54 ...
##  $ F1_Score     : num [1:58] 0.441 0.569 0.984 0.991 0.853 ...
##  $ Training_Time: num [1:58] NA 3.485 2.33 0.355 0.256 ...
##  $ Date         : POSIXct[1:58], format: "2023-03-09 11:26:21" "2023-03-22 11:26:21" ...

Descripción: en total hay 58 pruebas donde se usó el algoritmo de K-means para abordar problemas de Clustering.

  1. filtro 2
library(dplyr)
df_clustering <- filter(datos, Training_Time < 2 & Training_Time>1)
str(df_clustering)
## tibble [112 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:112] "SVM" "K-Means" "SVM" "K-Means" ...
##  $ Framework    : chr [1:112] "PyTorch" "PyTorch" "PyTorch" "PyTorch" ...
##  $ Problem_Type : chr [1:112] "Regression" "Regression" "Clustering" "Classification" ...
##  $ Dataset_Type : chr [1:112] "Image" "Text" "Text" "Image" ...
##  $ Accuracy     : num [1:112] 0.897 0.975 0.885 0.566 0.715 ...
##  $ Precision    : num [1:112] 9.732 0.423 0.612 0.797 0.981 ...
##  $ Recall       : num [1:112] 0.781 0.826 0.507 0.386 0.493 ...
##  $ F1_Score     : num [1:112] 0.793 0.477 0.89 0.884 0.5 ...
##  $ Training_Time: num [1:112] 1.93 1.45 1.46 1.84 1.14 ...
##  $ Date         : POSIXct[1:112], format: "2023-03-18 11:26:21" "2023-03-25 11:26:21" ...

Descripción: en total hay 112 pruebas donde el Training_time estuvo entre 1 y 2. Esto representa un porcentaje alto porque se cuentan con 560 pruebas.

  1. Filtro 3
library(dplyr)
df_clustering <- filter(datos, Precision < 0.5)
str(df_clustering)
## tibble [89 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:89] "K-Means" "K-Means" "Random Forest" "Random Forest" ...
##  $ Framework    : chr [1:89] "Keras" "PyTorch" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:89] "Clustering" "Regression" "Classification" "Regression" ...
##  $ Dataset_Type : chr [1:89] "Time Series" "Text" "Image" "Text" ...
##  $ Accuracy     : num [1:89] 0.744 0.975 0.992 0.947 0.826 ...
##  $ Precision    : num [1:89] 0.49 0.423 0.44 0.49 0.425 ...
##  $ Recall       : num [1:89] 0.877 0.826 0.716 0.745 NA ...
##  $ F1_Score     : num [1:89] 0.441 0.477 0.583 0.538 0.535 ...
##  $ Training_Time: num [1:89] NA 1.449 4.203 0.194 4.207 ...
##  $ Date         : POSIXct[1:89], format: "2023-03-09 11:26:21" "2023-03-25 11:26:21" ...

Descripción: en total hay 89 pruebas donde la precision fue menor al 0.5. Si esta medida representa porcentaje, entonces 89 pruebas tuvieron una precision menor al 50%.

  1. filtro 4
df_filtered <- filter(datos, Algorithm == "SVM", Dataset_Type == "Image", Accuracy > 0.7)
str(df_clustering)
## tibble [89 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:89] "K-Means" "K-Means" "Random Forest" "Random Forest" ...
##  $ Framework    : chr [1:89] "Keras" "PyTorch" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:89] "Clustering" "Regression" "Classification" "Regression" ...
##  $ Dataset_Type : chr [1:89] "Time Series" "Text" "Image" "Text" ...
##  $ Accuracy     : num [1:89] 0.744 0.975 0.992 0.947 0.826 ...
##  $ Precision    : num [1:89] 0.49 0.423 0.44 0.49 0.425 ...
##  $ Recall       : num [1:89] 0.877 0.826 0.716 0.745 NA ...
##  $ F1_Score     : num [1:89] 0.441 0.477 0.583 0.538 0.535 ...
##  $ Training_Time: num [1:89] NA 1.449 4.203 0.194 4.207 ...
##  $ Date         : POSIXct[1:89], format: "2023-03-09 11:26:21" "2023-03-25 11:26:21" ...

Descripción: en total hay 89 pruebas donde se usó el algoritmo de SVM, el tipo de dato fue Image y tuvo una precisión en el conjunto de prueba mayor al 0.7.

  1. filtro 5
df_filtered <- datos %>%
  filter(Problem_Type == "Clustering", 
         Framework == "Keras", 
         Precision > 0.6, 
         Accuracy > 0.8)
str(df_filtered)
## tibble [15 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:15] "SVM" "SVM" "K-Means" "K-Means" ...
##  $ Framework    : chr [1:15] "Keras" "Keras" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:15] "Clustering" "Clustering" "Clustering" "Clustering" ...
##  $ Dataset_Type : chr [1:15] "Text" "Image" "Text" "Tabular" ...
##  $ Accuracy     : num [1:15] 0.842 0.847 0.84 0.809 8.532 ...
##  $ Precision    : num [1:15] 0.842 0.872 0.663 0.623 0.837 ...
##  $ Recall       : num [1:15] 0.875 0.38 0.558 0.538 NA ...
##  $ F1_Score     : num [1:15] 0.704 0.491 0.569 NA 0.982 ...
##  $ Training_Time: num [1:15] 4.042 4.714 3.485 1.224 0.407 ...
##  $ Date         : POSIXct[1:15], format: "2023-03-11 11:26:21" "2023-03-19 11:26:21" ...

Descripción: en total hay 15 pruebas donde el tipo de problema fie CLustering, el framework usado fue Keras, la precisión fue mayor al 0.6 y la precisión en el conjunto de prueba fue mayor a 0.8.

Variables Numéricas y Tratamiento de datos NA

Se analizará cada variable númerica, viendo sus medidas de localización, medidas de dispersión, medidas de distribución y gráficas. Ya teniendo cada analisis se puede hacer la imputación de los datos faltantes, dado que, para hacer este paso se necesitan saber las medidas anteriores.

-Accuracy

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Accuracy))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
0.5038 0.6236 0.7578 0.8779 0.8824 9.718 39
num_NA <- sum(is.na(datos$Accuracy))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 6.964286

-Medidas de localización:
Esta variable tiene como media 0.8779 y mediana 0.7578.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Accuracy)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos$Accuracy
## A = 131.83, p-value < 2.2e-16

Esta variable numérica no sigue una distribución normal porque se hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo que indica que hay una fuerte evidencia para rechazar la hipótesis nula de normalidad. Además, la gráfica que se hará, reflejará que la variable no sigue una distribución normal.
-Coeficiete de dispersión:

media<-mean(datos$Accuracy, na.rm = TRUE)
sd_a<-sd(datos$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 107.4343

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Accuracy, na.rm = TRUE)
mediana <- median(datos$Accuracy, na.rm = TRUE)
desviacion <- sd(datos$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.38177

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA

Podemos ver que los NA’s de esta variable son 39, que corresponden a un 6.96 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.
De esta forma, haremos los histogramas antes y después del tratamiento para ver si no afecta en nada borrar la lista.

library(dplyr)
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)
dim(datos_omit)
## [1] 448  10
ggp1 <- ggplot(data.frame(value=datos$Accuracy), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Accuracy") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp2 <- ggplot(data.frame(value=datos_omit$Accuracy), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Accuracy") + ylab("Frecuencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Accuracy, datos_omit$Accuracy)
print(ks_test)
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Accuracy and datos_omit$Accuracy
## D = 0.022159, p-value = 0.9998
## alternative hypothesis: two-sided

-Precision

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Precision))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
0.4019 0.5632 0.7195 0.8129 0.8596 9.732 19
num_NA <- sum(is.na(datos$Precision))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.392857

-Medidas de localización:
Esta variable tiene como media 0.8129 y mediana 0.7195.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Precision)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos$Precision
## A = 118.96, p-value < 2.2e-16

Esta variable numérica no sigue una distribución normal porque se hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo que indica que hay una fuerte evidencia para rechazar la hipótesis nula de normalidad. Además, la gráfica que se hará, reflejará que la variable no sigue una distribución normal.
-Coeficiete de dispersión:

media<-mean(datos$Precision, na.rm = TRUE)
sd_a<-sd(datos$Precision, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 104.7427

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Precision, na.rm = TRUE)
mediana <- median(datos$Precision, na.rm = TRUE)
desviacion <- sd(datos$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3292674

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Los NA’s de esta variable son 19, que corresponden a un 3.39 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$Precision), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Precision") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$Precision), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Precisión") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Precision, datos_omit$Precision)
print(ks_test)
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Precision and datos_omit$Precision
## D = 0.019503, p-value = 1
## alternative hypothesis: two-sided

-Recall

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Recall))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
0.3001 0.4819 0.6493 0.7486 0.8404 9.366 20
num_NA <- sum(is.na(datos$Recall))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.571429

-Medidas de localización:
Esta variable tiene como media 0.7486 y mediana 0.6493.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Recall)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos$Recall
## A = 98.028, p-value < 2.2e-16

Esta variable numérica no sigue una distribución normal porque se hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo que indica que hay una fuerte evidencia para rechazar la hipótesis nula de normalidad. Además, la gráfica que se hará, reflejará que la variable no sigue una distribución normal.
-Coeficiete de dispersión:

media<-mean(datos$Recall, na.rm = TRUE)
sd_a<-sd(datos$Recall, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 104.7911

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Recall, na.rm = TRUE)
mediana <- median(datos$Recall, na.rm = TRUE)
desviacion <- sd(datos$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3797481

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Los NA’s de esta variable son 20, que corresponden a un 3.57 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$Recall), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Recall") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$Recall), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Recall") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Recall, datos_omit$Recall)
print(ks_test)
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Recall and datos_omit$Recall
## D = 0.015873, p-value = 1
## alternative hypothesis: two-sided

-F1_Score

Este es el resumen de la variable:

library(pander)
pander(summary(datos$F1_Score))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
0.4 0.5515 0.7086 0.8122 0.8438 9.374 20
num_NA <- sum(is.na(datos$F1_Score))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.571429

-Medidas de localización:
Esta variable tiene como media 0.8122 y mediana 0.7086.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$F1_Score)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos$F1_Score
## A = 122.21, p-value < 2.2e-16

Esta variable numérica no sigue una distribución normal porque se hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo que indica que hay una fuerte evidencia para rechazar la hipótesis nula de normalidad. Además, la gráfica que se hará, reflejará que la variable no sigue una distribución normal.
-Coeficiete de dispersión:

media<-mean(datos$F1_Score, na.rm = TRUE)
sd_a<-sd(datos$F1_Score, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 109.9297

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$F1_Score, na.rm = TRUE)
mediana <- median(datos$F1_Score, na.rm = TRUE)
desviacion <- sd(datos$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3479023

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Los NA’s de esta variable son 20, que corresponden a un 3.57 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$F1_Score), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("F1_Score") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$F1_Score), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("F1_Score") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$F1_Score, datos_omit$F1_Score)
print(ks_test)
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$F1_Score and datos_omit$F1_Score
## D = 0.021098, p-value = 0.9999
## alternative hypothesis: two-sided

-Training_Time

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Training_Time))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
0.1032 1.244 2.435 2.991 3.813 46.99 20
num_NA <- sum(is.na(datos$Training_Time))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.571429

-Medidas de localización:
Esta variable tiene como media 2.991 y mediana 2.435.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Training_Time)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos$Training_Time
## A = 80.983, p-value < 2.2e-16

Esta variable numérica no sigue una distribución normal porque se hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo que indica que hay una fuerte evidencia para rechazar la hipótesis nula de normalidad. Además, la gráfica que se hará, reflejará que la variable no sigue una distribución normal.
-Coeficiete de dispersión:

media<-mean(datos$Training_Time, na.rm = TRUE)
sd_a<-sd(datos$Training_Time, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 147.9032

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Training_Time, na.rm = TRUE)
mediana <- median(datos$Training_Time, na.rm = TRUE)
desviacion <- sd(datos$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3772653

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Usaremos el método de borrado de listas para tratar esta variable, veremos si siguen la misma distribución mediante la prueba de Kolmogorov-Smirnov (KS Test). Si es el caso, podemos tratar los datos NA’s mediante la eliminación de filas .

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$Training_Time), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Training_Time") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$Training_Time), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Training_Time") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Training_Time, datos_omit$Training_Time)
print(ks_test)
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Training_Time and datos_omit$Training_Time
## D = 0.013938, p-value = 1
## alternative hypothesis: two-sided

RESUMEN GENERAL:

Se puede apreciar que todas las variables numéricas tienen un comportamiento similar con respecto a sus medidas de localización, dispersión, distribución y naturaleza de las gráficas que son sesgadas hacia las derecha. Además, al hacer la prueba Kolmogorov-Smirnov (KS Test) para ver si las distribuciones de las gráficas originales y las que no tienen NA’s, son iguales estadísticamente, podemos ver que los valores p son mayores a 0.05 y los D cercanos a 0, teniendo como conclusión que las distribuciones son iguales.

TRATAMIENTO DE LOS DATOS FALTANTES (NA’S)

Gracias al analisis anterior donde se comparó las gráficas antes y después del borrado de listas correspondiente de cada variable, se puede concluir que no afecta en nada la gráfica el borrado de estos NA’s. Sin embargo, a continuación se mirará individualmente cada uno de estos casos para ver la falta de datos no está relacionada con los datos y en realidad son NA’s del tipo MCAR(Missing Completely At Random).

Eliminación de los NA de cada variable

datos_sin_na <- na.omit(datos)

Visualización

filas_na <- datos[!complete.cases(datos), ]
library(knitr)
library(kableExtra)
kable(filas_na, caption="Base de datos sin NA's") %>%
  kable_styling(full_width=F) %>%
  column_spec(2, width="20em") %>%
  scroll_box(width="900px", height="450px")
Base de datos sin NA’s
Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall F1_Score Training_Time Date
SVM Scikit-learn Regression Time Series 0.6618051 0.6929447 NA 0.4426950 4.9785924 2023-03-08 11:26:21
K-Means Keras Clustering Time Series 0.7443216 0.4900292 0.8766533 0.4414046 NA 2023-03-09 11:26:21
Neural Network PyTorch Regression Text 0.9985623 0.6366858 0.3357948 0.9014956 NA 2023-03-14 11:26:21
SVM Keras Regression Time Series NA 0.8710099 0.3416673 0.8161708 3.4064529 2023-03-16 11:26:21
Random Forest Keras Regression Text 0.5818119 0.9352508 NA 0.8626737 3.4199049 2023-03-17 11:26:21
Neural Network PyTorch Regression Text NA 0.5528024 0.3847175 0.6551369 3.5159654 2023-03-23 11:26:21
K-Means TensorFlow Regression Time Series NA 0.7073332 0.7288014 0.8376069 3.0875174 2023-04-06 11:26:21
Neural Network Scikit-learn Classification Text 0.8258334 0.4250037 NA 0.5345761 4.2069594 2023-04-08 11:26:21
Random Forest Scikit-learn Regression Image NA 0.8297940 0.9317173 8.4513780 4.8085087 2023-04-10 11:26:21
K-Means TensorFlow Classification Image 5.9276276 NA 0.3994960 0.5817585 2.0692473 2023-04-13 11:26:21
Neural Network Scikit-learn Clustering Image 0.8882984 0.8425050 0.5954240 0.8275728 NA 2023-04-15 11:26:21
SVM Scikit-learn Clustering Time Series 0.8805120 0.6098219 0.7040399 NA 2.9576349 2023-04-18 11:26:21
Random Forest Scikit-learn Clustering Tabular 0.8131102 0.8647921 NA 0.9411641 3.0049169 2023-04-19 11:26:21
K-Means Keras Clustering Text NA 0.5111173 0.6910497 0.9909150 0.3547478 2023-04-21 11:26:21
Random Forest Scikit-learn Regression Image 0.7407612 0.8586236 0.8919231 0.7966086 NA 2023-04-25 11:26:21
SVM PyTorch Regression Time Series 0.7457973 0.8615339 0.8522896 NA 4.4418578 2023-05-01 11:26:21
K-Means Keras Classification Image 0.5321045 0.7747981 NA 0.9713331 1.0636026 2023-05-02 11:26:21
Random Forest TensorFlow Classification Image 0.6210225 0.4768574 NA 0.8373886 0.5170071 2023-05-20 11:26:21
Neural Network PyTorch Regression Tabular 0.9933313 0.6029605 0.3178134 0.9170145 NA 2023-05-21 11:26:21
K-Means Keras Clustering Tabular 0.8090779 0.6233002 0.5384229 NA 1.2238923 2023-05-24 11:26:21
SVM Scikit-learn Clustering Time Series 0.6753135 0.5184823 0.4525248 NA 3.8205333 2023-05-31 11:26:21
Random Forest TensorFlow Classification Time Series 0.7837704 0.9168220 NA 0.8717234 1.6986442 2023-06-05 11:26:21
Random Forest Keras Clustering Text 8.5323786 0.8370944 NA 0.9821543 0.4065677 2023-06-12 11:26:21
Random Forest TensorFlow Regression Time Series 0.9354846 0.9624328 NA 0.9802212 NA 2023-06-18 11:26:21
Random Forest PyTorch Regression Tabular NA 0.8355879 0.9218826 0.9175843 2.8607010 2023-06-22 11:26:21
SVM Scikit-learn Clustering Text 0.8657948 0.7050163 0.5382710 5.8587272 NA 2023-06-30 11:26:21
K-Means TensorFlow Clustering Image NA 0.5785291 0.6789853 0.5740273 0.5031344 2023-07-01 11:26:21
Neural Network Scikit-learn Classification Text 0.5301760 0.7390132 0.4079015 NA 2.3519619 2023-07-02 11:26:21
K-Means PyTorch Classification Image 0.8293539 NA NA 0.8044120 3.9075415 2023-07-11 11:26:21
Random Forest PyTorch Regression Tabular 0.7490979 NA 0.5397136 0.4311015 4.3247946 2023-07-12 11:26:21
Neural Network TensorFlow Clustering Time Series 0.7776818 0.4595069 4.8917338 NA 1.6379899 2023-07-13 11:26:21
K-Means PyTorch Classification Image NA 0.8800426 0.6903962 0.5840660 4.2165296 2023-07-15 11:26:21
Neural Network PyTorch Classification Tabular NA 0.7325359 0.9956939 0.5714550 2.4675862 2023-07-23 11:26:21
K-Means Scikit-learn Clustering Image NA 0.9748441 0.5765964 0.9666691 2.3245506 2023-07-29 11:26:21
SVM TensorFlow Regression Tabular NA 0.5961590 0.6328822 0.8028875 0.7174099 2023-07-31 11:26:21
SVM Scikit-learn Clustering Image NA 0.5833625 0.4594248 0.5193953 4.7620796 2023-08-02 11:26:21
Neural Network Scikit-learn Classification Time Series NA 0.4019310 0.9139634 0.9824059 28.9729934 2023-08-13 11:26:21
SVM PyTorch Classification Time Series 0.8157801 0.6132958 0.4041572 0.6421606 NA 2023-08-15 11:26:21
K-Means Scikit-learn Classification Image NA 0.4557944 0.9866912 0.8227327 0.9959051 2023-08-17 11:26:21
K-Means TensorFlow Classification Image NA 0.7529214 0.6383852 0.6536372 4.1459837 2023-08-18 11:26:21
SVM Keras Regression Time Series 0.9856975 NA 0.5000485 0.5231998 2.8991763 2023-08-22 11:26:21
Random Forest Keras Clustering Tabular 0.6796167 0.6989534 0.9865175 0.4453502 NA 2023-08-30 11:26:21
SVM Scikit-learn Classification Text 0.7964754 NA 0.5853089 0.9708539 0.9581034 2023-08-31 11:26:21
SVM TensorFlow Clustering Image 0.5817619 0.4351306 0.8792632 0.5783745 NA 2023-09-01 11:26:21
SVM Scikit-learn Clustering Text 0.9847062 0.8709382 NA 0.7594268 1.1692169 2023-09-03 11:26:21
Neural Network Keras Clustering Tabular NA 0.8246086 0.9692330 0.7741893 4.3829517 2023-09-04 11:26:21
Random Forest Keras Clustering Tabular NA 0.6604116 0.9251631 0.8056158 3.1739118 2023-09-10 11:26:21
SVM TensorFlow Regression Image NA 0.5323672 0.9184253 0.7660045 0.8450380 2023-09-12 11:26:21
SVM Scikit-learn Regression Text 0.7598870 0.8413979 NA 0.5626578 1.8209750 2023-09-14 11:26:21
Neural Network PyTorch Clustering Text 0.9144417 0.9598680 0.9484678 NA 2.4820394 2023-09-16 11:26:21
SVM PyTorch Regression Tabular NA 0.7855391 0.3133813 0.9680402 4.6116243 2023-09-17 11:26:21
Random Forest Scikit-learn Clustering Tabular 0.5559598 0.8990182 NA 0.9745486 2.0495419 2023-10-02 11:26:21
SVM PyTorch Regression Image 0.5610550 0.6577941 0.3999952 0.4611359 NA 2023-10-08 11:26:21
SVM TensorFlow Classification Time Series NA 0.6897813 0.7295229 0.6604126 0.2710666 2023-10-14 11:26:21
K-Means Scikit-learn Regression Tabular NA 0.8367636 0.3543545 0.8340687 4.2119839 2023-10-17 11:26:21
Random Forest TensorFlow Clustering Tabular 0.8195600 0.6621104 NA 0.7536724 0.2223611 2023-10-19 11:26:21
Random Forest PyTorch Classification Time Series 0.8334321 0.4812124 0.9633816 NA 0.6468057 2023-10-24 11:26:21
K-Means Keras Classification Time Series 0.6009267 NA 0.3538035 0.5490258 4.0270843 2023-10-29 11:26:21
SVM TensorFlow Regression Image NA 0.5481873 0.7949605 0.5485359 0.7148829 2023-11-01 11:26:21
SVM PyTorch Classification Text NA 0.8660263 0.4967031 0.4486895 1.9978803 2023-11-07 11:26:21
K-Means TensorFlow Clustering Text 0.9654743 NA 0.7483332 0.7700702 0.2475090 2023-11-15 11:26:21
K-Means Scikit-learn Classification Time Series 0.8256165 0.7361009 0.9220614 0.9524600 NA 2023-11-21 11:26:21
SVM Scikit-learn Regression Text 0.9392578 0.6783595 NA 0.8232765 3.5606062 2023-12-02 11:26:21
K-Means Keras Regression Image NA 0.4688613 0.5223108 0.7015787 1.0100921 2023-12-13 11:26:21
SVM Keras Clustering Time Series 0.6980863 0.4406010 NA 0.9562664 0.4028615 2023-12-16 11:26:21
Random Forest TensorFlow Clustering Image NA 0.8247011 0.4933187 0.4632359 0.5525666 2023-12-17 11:26:21
Neural Network PyTorch Classification Image NA 0.6749804 0.8875369 0.7931042 0.8373141 2023-12-20 11:26:21
Neural Network Keras Classification Tabular 0.6866259 NA 0.3026739 0.5561420 4.8435743 2023-12-21 11:26:21
Random Forest Keras Classification Text 0.8097452 NA 0.5521637 0.7985316 3.3433768 2023-12-26 11:26:21
Random Forest Keras Clustering Text 0.9316668 0.7272591 0.5193435 NA NA 2023-12-29 11:26:21
Neural Network PyTorch Regression Image 0.7395909 0.7075016 0.9243105 0.5735125 NA 2023-12-31 11:26:21
Neural Network TensorFlow Clustering Image NA 0.4311739 0.5641226 0.7090034 0.1337020 2024-01-03 11:26:21
Random Forest PyTorch Clustering Time Series 0.9777618 NA 0.3030543 0.9716890 1.6380026 2024-01-11 11:26:21
SVM PyTorch Classification Image 9.4901527 0.6707436 0.8327265 0.9257917 NA 2024-01-16 11:26:21
K-Means Keras Classification Image NA 0.9909938 0.9655409 0.4615408 2.2067923 2024-01-17 11:26:21
Random Forest Scikit-learn Clustering Tabular 0.6536450 0.7940681 0.6502808 NA 2.2258352 2024-01-26 11:26:21
Random Forest Scikit-learn Classification Time Series NA 0.7807495 0.8952436 0.4011953 2.8763658 2024-02-05 11:26:21
K-Means Keras Classification Image NA 0.7198173 0.9161097 0.4968177 1.7012941 2024-02-09 11:26:21
SVM PyTorch Clustering Text 0.7417728 NA 0.5067110 0.9345133 0.6118271 2024-02-13 11:26:21
K-Means Scikit-learn Clustering Tabular 0.6767107 0.8821621 0.4873149 NA 1.5521523 2024-02-28 11:26:21
Random Forest TensorFlow Classification Text NA 0.6564938 0.5830028 0.7788514 0.3585498 2024-03-08 11:26:21
Neural Network TensorFlow Regression Time Series NA 0.6755591 0.3841451 0.6845530 2.3867963 2024-03-25 11:26:21
SVM Scikit-learn Classification Tabular 0.8621694 0.4603825 0.3009475 NA 2.3497077 2024-04-02 11:26:21
K-Means Scikit-learn Regression Image 0.6370803 0.4851832 0.7525211 NA NA 2024-04-04 11:26:21
K-Means Scikit-learn Clustering Image 0.9470954 0.8492958 0.3165165 0.8749349 NA 2024-04-07 11:26:21
K-Means Keras Classification Image 0.9001783 0.4757588 0.4597648 NA 2.8547252 2024-04-15 11:26:21
Random Forest Keras Classification Time Series 0.7664789 NA 0.3404911 0.6336430 3.1026996 2024-04-20 11:26:21
SVM Scikit-learn Clustering Image NA 0.7857291 0.6811376 0.4444633 2.7982506 2024-04-25 11:26:21
K-Means Scikit-learn Clustering Text 0.5300712 NA 0.4577669 0.9983094 3.0979720 2024-04-29 11:26:21
Random Forest Scikit-learn Clustering Image NA 0.9629829 0.8469253 0.6103123 1.8389986 2024-05-03 11:26:21
Neural Network Scikit-learn Clustering Image NA 0.7989909 0.7451683 0.6785431 0.4435299 2024-05-07 11:26:21
K-Means Scikit-learn Regression Text 0.8780817 0.5561721 0.5717160 NA 2.0367706 2024-05-08 11:26:21
SVM Scikit-learn Classification Text 0.9210166 NA 0.3054891 0.4398780 3.6842832 2024-05-19 11:26:21
Neural Network PyTorch Classification Tabular NA 0.4252148 0.6133714 0.7502282 1.1801854 2024-05-21 11:26:21
K-Means TensorFlow Regression Image 0.5638567 NA 0.6707618 NA 4.5904607 2024-05-24 11:26:21
Random Forest PyTorch Clustering Tabular 0.9684789 0.5828507 0.6029712 NA 4.1399104 2024-06-02 11:26:21
Neural Network Keras Regression Text NA 0.9113583 0.4816792 0.7042358 4.8939385 2024-06-20 11:26:21
Random Forest PyTorch Clustering Tabular 0.5662623 0.9134177 0.6650217 0.9634411 NA 2024-07-01 11:26:21
Neural Network Keras Clustering Tabular 0.9310072 NA 0.9841155 0.7818246 NA 2024-07-02 11:26:21
Random Forest Keras Classification Image 0.5129060 0.7104678 NA 0.8565108 2.1442478 2024-07-13 11:26:21
K-Means TensorFlow Clustering Tabular 0.8251005 0.7074183 0.3204188 NA 1.2456829 2024-07-26 11:26:21
Random Forest TensorFlow Classification Time Series 0.5711247 0.5526349 NA 0.4403535 2.9123543 2024-08-06 11:26:21
SVM TensorFlow Regression Image 0.8460807 0.4840145 0.7039858 NA 0.1553824 2024-08-10 11:26:21
SVM TensorFlow Clustering Time Series 0.8076097 0.5176586 NA 0.4242105 2.0486536 2024-08-15 11:26:21
Random Forest Scikit-learn Classification Time Series 0.6531268 NA 0.7596418 0.6467149 4.8716220 2024-08-16 11:26:21
Neural Network Keras Regression Time Series NA 0.5314413 0.4545969 0.5876668 1.9361377 2024-08-22 11:26:21
SVM PyTorch Clustering Text 0.9137689 0.8815514 0.9067392 0.8586120 NA 2024-08-26 11:26:21
Random Forest Keras Clustering Text 0.8846524 NA 0.9653214 0.8573816 1.1991609 2024-08-28 11:26:21
K-Means Scikit-learn Classification Text 0.8801449 0.4683030 0.4977472 NA 1.7535260 2024-09-03 11:26:21
SVM Keras Regression Time Series NA 0.6961279 0.9247908 0.8540381 3.7870949 2024-09-07 11:26:21
Neural Network Scikit-learn Regression Tabular 0.9007686 0.4788935 NA 0.6335383 2.2663245 2024-09-10 11:26:21
Random Forest TensorFlow Clustering Image 0.7271887 NA 0.5320953 0.6051432 2.9027798 2024-09-15 11:26:21

Se puede apreciar que la falta de datos no está relacionada con los datos. No se sabe a ciencia cierta si estos datos faltantes fueron borrados al azar o se perdieron algunos formularios, además, los datos no comparten un patrón claro, por ejemplo: que los datos faltantes en Accuracy son solo los que tienen SVM en Algorithm.

Ahora analizaremos como cambian las medidas de localización, dispersion y distribución.
#### Accuracy

Localización:

summary(datos$Accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.5038  0.6236  0.7578  0.8779  0.8824  9.7181      39
summary(datos_sin_na$Accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5038  0.6183  0.7490  0.8458  0.8698  9.7181

Dispersión:

media<-mean(datos$Accuracy, na.rm = TRUE)
sd_a<-sd(datos$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 107.4343
media<-mean(datos_sin_na$Accuracy, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 97.14993

Distribución:

media <- mean(datos$Accuracy, na.rm = TRUE)
mediana <- median(datos$Accuracy, na.rm = TRUE)
desviacion <- sd(datos$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.38177
media <- mean(datos_sin_na$Accuracy, na.rm = TRUE)
mediana <- median(datos_sin_na$Accuracy, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3532361
  • Observación: Las medidas en la variable Accuracy no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Precision

Localización:

summary(datos$Precision)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.4019  0.5632  0.7195  0.8129  0.8596  9.7320      19
summary(datos_sin_na$Precision)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4031  0.5661  0.7275  0.8387  0.8670  9.7320

Dispersión:

media <- mean(datos$Precision, na.rm = TRUE)
sd_p <- sd(datos$Precision, na.rm = TRUE)
cv_p <- (sd_p / media) * 100
cv_p
## [1] 104.7427
media <- mean(datos_sin_na$Precision, na.rm = TRUE)
sd_p <- sd(datos_sin_na$Precision, na.rm = TRUE)
cv_p <- (sd_p / media) * 100
cv_p
## [1] 110.986

Distribución:

media <- mean(datos$Precision, na.rm = TRUE)
mediana <- median(datos$Precision, na.rm = TRUE)
desviacion <- sd(datos$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3292674
media <- mean(datos_sin_na$Precision, na.rm = TRUE)
mediana <- median(datos_sin_na$Precision, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.358299
  • Observación: Las medidas en la variable Precision no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Recall

Localización:

summary(datos$Recall)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.3001  0.4819  0.6493  0.7486  0.8404  9.3662      20
summary(datos_sin_na$Recall)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3001  0.4898  0.6513  0.7611  0.8365  9.3662

Dispersión:

media <- mean(datos$Recall, na.rm = TRUE)
sd_r <- sd(datos$Recall, na.rm = TRUE)
cv_r <- (sd_r / media) * 100
cv_r
## [1] 104.7911
media <- mean(datos_sin_na$Recall, na.rm = TRUE)
sd_r <- sd(datos_sin_na$Recall, na.rm = TRUE)
cv_r <- (sd_r / media) * 100
cv_r
## [1] 109.2378

Distribución:

media <- mean(datos$Recall, na.rm = TRUE)
mediana <- median(datos$Recall, na.rm = TRUE)
desviacion <- sd(datos$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3797481
media <- mean(datos_sin_na$Recall, na.rm = TRUE)
mediana <- median(datos_sin_na$Recall, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3961938
  • Observación: Las medidas en la variable Recall no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

F1_Score

Localización:

summary(datos$F1_Score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.4000  0.5515  0.7086  0.8122  0.8438  9.3740      20
summary(datos_sin_na$F1_Score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4000  0.5486  0.7031  0.8014  0.8396  9.3740

Dispersión:

media <- mean(datos$F1_Score, na.rm = TRUE)
sd_f1 <- sd(datos$F1_Score, na.rm = TRUE)
cv_f1 <- (sd_f1 / media) * 100
cv_f1
## [1] 109.9297
media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
sd_f1 <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
cv_f1 <- (sd_f1 / media) * 100
cv_f1
## [1] 109.1754

Distribución:

media <- mean(datos$F1_Score, na.rm = TRUE)
mediana <- median(datos$F1_Score, na.rm = TRUE)
desviacion <- sd(datos$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3479023
media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
mediana <- median(datos_sin_na$F1_Score, na.rm = TRUE)
desviacion <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3369211
  • Observación: Las medidas en la variable F1_Score no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Training_Time

Localización:

summary(datos$Training_Time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.1032  1.2441  2.4347  2.9910  3.8131 46.9856      20
summary(datos_sin_na$Training_Time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1032  1.2982  2.4809  3.0598  3.8446 46.9856

Dispersión:

media <- mean(datos$Training_Time, na.rm = TRUE)
sd_tt <- sd(datos$Training_Time, na.rm = TRUE)
cv_tt <- (sd_tt / media) * 100
cv_tt
## [1] 147.9032
media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
sd_tt <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
cv_tt <- (sd_tt / media) * 100
cv_tt
## [1] 151.8546

Distribución:

media <- mean(datos$Training_Time, na.rm = TRUE)
mediana <- median(datos$Training_Time, na.rm = TRUE)
desviacion <- sd(datos$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3772653
media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
mediana <- median(datos_sin_na$Training_Time, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3737476
  • Observación: Las medidas en la variable Training_Time no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Conclusion: Se puede concluir que el tratamiento de datos NA’s no afecta la distribucuión de los datos y el cambio que sufrieron las medidas no afecta en nada el análisis.

Detección de outliers y limpieza

Para la detección de datos atípicos usaremos el gráfico de caja y bigotes, y el filtro de Hampel se usará por encima de los percentiles porque este sirve más para variables que no siguen una distribución normal o tiene colas muy largas, como es el caso. Además, trabaja con medidas más robustas como la media y la mediana

Accuracy

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Accuracy) +
  geom_boxplot(fill = "#0c4c8a") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Accuracy) - 3 * mad(datos_sin_na$Accuracy, constant = 1)
lower_bound
## [1] 0.3595111
upper_bound <- median(datos_sin_na$Accuracy) + 3 * mad(datos_sin_na$Accuracy, constant = 1)
upper_bound
## [1] 1.13852
outlier_ind <- which(datos_sin_na$Accuracy < lower_bound | datos_sin_na$Accuracy > upper_bound)
outlier_ind
## [1]  15  77 110 112 196 232 239
datos_sin_na[outlier_ind, ]
## # A tibble: 7 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Random Forest  PyTorch     Regression   Text             9.72     0.782  0.548
## 2 Neural Network PyTorch     Regression   Time Series      5.26     0.506  0.829
## 3 K-Means        TensorFlow  Regression   Tabular          7.13     0.521  0.441
## 4 SVM            Scikit-lea… Regression   Tabular          5.20     0.489  0.680
## 5 SVM            TensorFlow  Classificat… Image            8.29     0.798  0.753
## 6 Random Forest  PyTorch     Clustering   Tabular          7.90     0.521  0.363
## 7 SVM            Scikit-lea… Regression   Tabular          5.98     0.928  0.799
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
## 
## Adjuntando el paquete: 'EnvStats'
## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm
test <- rosnerTest(datos_sin_na$Accuracy, k = 7)
test$all.stats
##   i    Mean.i      SD.i    Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1 0 0.8457621 0.8216572 9.718080      15 10.79808   3.833870    TRUE
## 2 1 0.8259135 0.7069241 8.294427     196 10.56480   3.833271    TRUE
## 3 2 0.8091679 0.6125670 7.900862     232 11.57701   3.832670    TRUE
## 4 3 0.7932315 0.5124044 7.127467     110 12.36179   3.832068    TRUE
## 5 4 0.7789652 0.4151830 5.978890     239 12.52442   3.831464    TRUE
## 6 5 0.7672273 0.3338475 5.259856      77 13.45713   3.830859    TRUE
## 7 6 0.7570629 0.2565839 5.200546     112 17.31786   3.830252    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$Accuracy, probs=c(.05, .95), na.rm = T)
datos_sin_na$Accuracy[datos_sin_na$Accuracy < lower_bound] <- caps[1]
datos_sin_na$Accuracy[datos_sin_na$Accuracy > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 7 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Random Forest  PyTorch     Regression   Text            0.988     0.782  0.548
## 2 Neural Network PyTorch     Regression   Time Series     0.988     0.506  0.829
## 3 K-Means        TensorFlow  Regression   Tabular         0.988     0.521  0.441
## 4 SVM            Scikit-lea… Regression   Tabular         0.988     0.489  0.680
## 5 SVM            TensorFlow  Classificat… Image           0.988     0.798  0.753
## 6 Random Forest  PyTorch     Clustering   Tabular         0.988     0.521  0.363
## 7 SVM            Scikit-lea… Regression   Tabular         0.988     0.928  0.799
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafica de accuracy sin Outeliers

ggplot(datos_sin_na, aes(y = Accuracy)) +
  geom_boxplot(outlier.colour = "orange", outlier.shape = 16, outlier.size = 2, fill = "skyblue", color = "darkblue") +
  labs(title = "Caja de Bigote de Accuracy sin outeliers",
       y = "Accuracy") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Precision

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Precision) +
  geom_boxplot(fill = "darkred") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Precision) - 3 * mad(datos_sin_na$Precision, constant = 1)
lower_bound
## [1] 0.2777772
upper_bound <- median(datos_sin_na$Precision) + 3 * mad(datos_sin_na$Precision, constant = 1)
upper_bound
## [1] 1.177228
outlier_ind <- which(datos_sin_na$Precision < lower_bound | datos_sin_na$Precision > upper_bound)
outlier_ind
##  [1]   6  96 111 157 223 241 250 288 433 439
length(outlier_ind)
## [1] 10
datos_sin_na[outlier_ind, ]
## # A tibble: 10 × 10
##    Algorithm      Framework  Problem_Type Dataset_Type Accuracy Precision Recall
##    <chr>          <chr>      <chr>        <chr>           <dbl>     <dbl>  <dbl>
##  1 SVM            PyTorch    Regression   Image           0.897      9.73  0.781
##  2 SVM            Scikit-le… Classificat… Image           0.591      4.06  0.482
##  3 K-Means        TensorFlow Clustering   Time Series     0.686      6.21  0.328
##  4 K-Means        Scikit-le… Classificat… Time Series     0.603      4.15  7.74 
##  5 Neural Network Keras      Clustering   Text            0.537      9.67  0.819
##  6 SVM            TensorFlow Classificat… Image           0.824      5.43  0.976
##  7 Random Forest  TensorFlow Regression   Text            0.902      8.93  0.603
##  8 SVM            PyTorch    Clustering   Text            0.584      4.08  0.717
##  9 SVM            Keras      Regression   Text            0.506      5.76  0.707
## 10 K-Means        PyTorch    Regression   Tabular         0.768      7.04  0.926
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$Precision, k = 10)
test$all.stats
##    i    Mean.i      SD.i    Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 0.8386718 0.9308086 9.732008       6  9.55442   3.833870    TRUE
## 2  1 0.8187762 0.8310328 9.674189     223 10.65591   3.833271    TRUE
## 3  2 0.7989210 0.7180191 8.932619     250 11.32797   3.832670    TRUE
## 4  3 0.7806431 0.6061150 7.044472     439 10.33439   3.832068    TRUE
## 5  4 0.7665353 0.5286183 6.207645     111 10.29308   3.831464    TRUE
## 6  5 0.7542529 0.4614512 5.760933     433 10.84986   3.830859    TRUE
## 7  6 0.7429256 0.3955383 5.432777     241 11.85688   3.830252    TRUE
## 8  7 0.7322910 0.3266570 4.145151     157 10.44784   3.829643    TRUE
## 9  8 0.7245345 0.2834703 4.075645     288 11.82174   3.829033    TRUE
## 10 9 0.7169010 0.2341822 4.055990      96 14.25851   3.828422    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

Para el tratamiento de valoresa atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$Precision, probs=c(.05, .95), na.rm = T)
datos_sin_na$Precision[datos_sin_na$Precision < lower_bound] <- caps[1]
datos_sin_na$Precision[datos_sin_na$Precision > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 10 × 10
##    Algorithm      Framework  Problem_Type Dataset_Type Accuracy Precision Recall
##    <chr>          <chr>      <chr>        <chr>           <dbl>     <dbl>  <dbl>
##  1 SVM            PyTorch    Regression   Image           0.897     0.982  0.781
##  2 SVM            Scikit-le… Classificat… Image           0.591     0.982  0.482
##  3 K-Means        TensorFlow Clustering   Time Series     0.686     0.982  0.328
##  4 K-Means        Scikit-le… Classificat… Time Series     0.603     0.982  7.74 
##  5 Neural Network Keras      Clustering   Text            0.537     0.982  0.819
##  6 SVM            TensorFlow Classificat… Image           0.824     0.982  0.976
##  7 Random Forest  TensorFlow Regression   Text            0.902     0.982  0.603
##  8 SVM            PyTorch    Clustering   Text            0.584     0.982  0.717
##  9 SVM            Keras      Regression   Text            0.506     0.982  0.707
## 10 K-Means        PyTorch    Regression   Tabular         0.768     0.982  0.926
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafica de Precision sin Outeliers

ggplot(datos_sin_na, aes(y = Precision)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 2, fill = "gold", color = "darkorange") +
  labs(title = "Caja de Bigote de Precision sin Outeliers",
       y = "Precision") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Recall

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Recall) +
  geom_boxplot(fill = "#0c4") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Recall) - 3 * mad(datos_sin_na$Recall, constant = 1)
lower_bound
## [1] 0.1155031
upper_bound <- median(datos_sin_na$Recall) + 3 * mad(datos_sin_na$Recall, constant = 1)
upper_bound
## [1] 1.187093
outlier_ind <- which(datos_sin_na$Recall < lower_bound | datos_sin_na$Recall > upper_bound)
outlier_ind
## [1]   4  88 114 157 221 270 303 308 420
length(outlier_ind)
## [1] 9
datos_sin_na[outlier_ind, ]
## # A tibble: 9 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 K-Means        PyTorch     Regression   Image           0.637     0.626   7.45
## 2 K-Means        PyTorch     Clustering   Image           0.822     0.725   5.73
## 3 K-Means        Scikit-lea… Clustering   Image           0.719     0.998   9.37
## 4 K-Means        Scikit-lea… Classificat… Time Series     0.603     0.982   7.74
## 5 Neural Network PyTorch     Classificat… Text            0.724     0.449   3.44
## 6 K-Means        Keras       Clustering   Image           0.801     0.521   5.77
## 7 Neural Network TensorFlow  Classificat… Text            0.702     0.615   4.86
## 8 Neural Network TensorFlow  Clustering   Time Series     0.564     0.827   5.44
## 9 Random Forest  TensorFlow  Clustering   Image           0.531     0.566   5.50
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$Recall, k = 9)
test$all.stats
##   i    Mean.i      SD.i    Value Obs.Num     R.i+1 lambda.i+1 Outlier
## 1 0 0.7610971 0.8314054 9.366182     114 10.350047   3.833870    TRUE
## 2 1 0.7418463 0.7255257 7.737749     157  9.642529   3.833271    TRUE
## 3 2 0.7261605 0.6460189 7.454810       4 10.415561   3.832670    TRUE
## 4 3 0.7110399 0.5622109 5.765916     270  8.991068   3.832068    TRUE
## 5 4 0.6996551 0.5089064 5.726373      88  9.877490   3.831464    TRUE
## 6 5 0.6883081 0.4497505 5.499848     420 10.698244   3.830859    TRUE
## 7 6 0.6774222 0.3874519 5.436669     308 12.283452   3.830252    TRUE
## 8 7 0.6666303 0.3144283 4.859080     303 13.333562   3.829643    TRUE
## 9 8 0.6571020 0.2428199 3.438827     221 11.455922   3.829033    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

Para el tratamiento de valoresa atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$Recall, probs=c(.05, .95), na.rm = T)
datos_sin_na$Recall[datos_sin_na$Recall < lower_bound] <- caps[1]
datos_sin_na$Recall[datos_sin_na$Recall > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 9 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 K-Means        PyTorch     Regression   Image           0.637     0.626  0.976
## 2 K-Means        PyTorch     Clustering   Image           0.822     0.725  0.976
## 3 K-Means        Scikit-lea… Clustering   Image           0.719     0.998  0.976
## 4 K-Means        Scikit-lea… Classificat… Time Series     0.603     0.982  0.976
## 5 Neural Network PyTorch     Classificat… Text            0.724     0.449  0.976
## 6 K-Means        Keras       Clustering   Image           0.801     0.521  0.976
## 7 Neural Network TensorFlow  Classificat… Text            0.702     0.615  0.976
## 8 Neural Network TensorFlow  Clustering   Time Series     0.564     0.827  0.976
## 9 Random Forest  TensorFlow  Clustering   Image           0.531     0.566  0.976
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafica de Recall sin Outeliers

ggplot(datos_sin_na, aes(y = Recall)) +
  geom_boxplot(outlier.colour = "blue", outlier.shape = 16, outlier.size = 2, fill = "lightgreen", color = "darkgreen") +
  labs(title = "Caja de Bigote de Recall",
       y = "Recall") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

F1_Score

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = F1_Score) +
  geom_boxplot(fill = "#0C945689") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$F1_Score) - 3 * mad(datos_sin_na$F1_Score, constant = 1)
lower_bound
## [1] 0.2672388
upper_bound <- median(datos_sin_na$F1_Score) + 3 * mad(datos_sin_na$F1_Score, constant = 1)
upper_bound
## [1] 1.139019
outlier_ind <- which(datos_sin_na$F1_Score < lower_bound | datos_sin_na$F1_Score > upper_bound)
outlier_ind
## [1] 160 230 267 281 296 316 333 437
length(outlier_ind)
## [1] 8
datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 SVM            TensorFlow  Regression   Tabular         0.766     0.579  0.817
## 2 Neural Network Scikit-lea… Clustering   Image           0.671     0.676  0.937
## 3 SVM            PyTorch     Clustering   Image           0.522     0.566  0.817
## 4 K-Means        TensorFlow  Regression   Tabular         0.637     0.651  0.565
## 5 K-Means        PyTorch     Classificat… Text            0.773     0.915  0.841
## 6 K-Means        TensorFlow  Classificat… Image           0.677     0.905  0.621
## 7 Random Forest  Scikit-lea… Clustering   Time Series     0.588     0.792  0.736
## 8 Neural Network PyTorch     Classificat… Text            0.630     0.770  0.831
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$F1_Score, k = 8)
test$all.stats
##   i    Mean.i      SD.i    Value Obs.Num     R.i+1 lambda.i+1 Outlier
## 1 0 0.8013885 0.8749189 9.374049     296  9.798234   3.833870    TRUE
## 2 1 0.7822103 0.7759213 9.295359     316 10.971665   3.833271    TRUE
## 3 2 0.7631225 0.6634602 8.178579     281 11.176942   3.832670    TRUE
## 4 3 0.7464585 0.5630661 7.747684     437 12.434110   3.832068    TRUE
## 5 4 0.7306900 0.4548205 5.499742     230 10.485569   3.831464    TRUE
## 6 5 0.7199247 0.3946604 5.320668     333 11.657473   3.830859    TRUE
## 7 6 0.7095157 0.3286397 5.131244     160 13.454635   3.830252    TRUE
## 8 7 0.6994892 0.2524146 4.632073     267 15.579855   3.829643    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$F1_Score, probs=c(.05, .95), na.rm = T)
datos_sin_na$F1_Score[datos_sin_na$F1_Score < lower_bound] <- caps[1]
datos_sin_na$F1_Score[datos_sin_na$F1_Score > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 SVM            TensorFlow  Regression   Tabular         0.766     0.579  0.817
## 2 Neural Network Scikit-lea… Clustering   Image           0.671     0.676  0.937
## 3 SVM            PyTorch     Clustering   Image           0.522     0.566  0.817
## 4 K-Means        TensorFlow  Regression   Tabular         0.637     0.651  0.565
## 5 K-Means        PyTorch     Classificat… Text            0.773     0.915  0.841
## 6 K-Means        TensorFlow  Classificat… Image           0.677     0.905  0.621
## 7 Random Forest  Scikit-lea… Clustering   Time Series     0.588     0.792  0.736
## 8 Neural Network PyTorch     Classificat… Text            0.630     0.770  0.831
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafico sin Outeliers de F1_Score

ggplot(datos_sin_na, aes(y = F1_Score)) +
  geom_boxplot(outlier.colour = "purple", outlier.shape = 16, outlier.size = 2, fill = "yellow", color = "orange") +
  labs(title = "Caja de Bigote de F1 Score",
       y = "F1 Score") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 

Training_Time

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Training_Time) +
  geom_boxplot(fill = "#F39C12") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Training_Time) - 3 * mad(datos_sin_na$Training_Time, constant = 1)
lower_bound
## [1] -1.363486
upper_bound <- median(datos_sin_na$Training_Time) + 3 * mad(datos_sin_na$Training_Time, constant = 1)
upper_bound
## [1] 6.325322
outlier_ind <- which(datos_sin_na$Training_Time < lower_bound | datos_sin_na$Training_Time > upper_bound)
outlier_ind
## [1] 100 109 201 214 217 324 344 417
length(outlier_ind)
## [1] 8
datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Neural Network PyTorch     Regression   Tabular         0.987     0.733  0.706
## 2 Neural Network Scikit-lea… Classificat… Time Series     0.871     0.847  0.734
## 3 Neural Network PyTorch     Clustering   Tabular         0.524     0.556  0.737
## 4 Neural Network Keras       Clustering   Tabular         0.718     0.551  0.394
## 5 Random Forest  TensorFlow  Classificat… Text            0.772     0.714  0.824
## 6 SVM            Scikit-lea… Regression   Tabular         0.662     0.913  0.970
## 7 K-Means        TensorFlow  Regression   Tabular         0.996     0.728  0.795
## 8 K-Means        PyTorch     Regression   Time Series     0.561     0.617  0.828
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$Training_Time, k = 8)
test$all.stats
##   i   Mean.i     SD.i    Value Obs.Num     R.i+1 lambda.i+1 Outlier
## 1 0 3.059781 4.646419 46.98563     324  9.453699   3.833870    TRUE
## 2 1 2.961513 4.159537 46.83874     217 10.548585   3.833271    TRUE
## 3 2 2.863133 3.606191 44.58645     100 11.569913   3.832670    TRUE
## 4 3 2.769373 3.017332 44.35790     201 13.783214   3.832068    TRUE
## 5 4 2.675705 2.282925 28.29499     344 11.222129   3.831464    TRUE
## 6 5 2.617874 1.932676 20.93344     109  9.476788   3.830859    TRUE
## 7 6 2.576436 1.726646 20.25186     417 10.236856   3.830252    TRUE
## 8 7 2.536356 1.508782 13.79138     214  7.459672   3.829643    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$Training_Time, probs=c(.05, .95), na.rm = T)
datos_sin_na$Training_Time[datos_sin_na$Training_Time < lower_bound] <- caps[1]
datos_sin_na$Training_Time[datos_sin_na$Training_Time > upper_bound] <- caps[2]

Grafico sin outeliers de Training time

datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Neural Network PyTorch     Regression   Tabular         0.987     0.733  0.706
## 2 Neural Network Scikit-lea… Classificat… Time Series     0.871     0.847  0.734
## 3 Neural Network PyTorch     Clustering   Tabular         0.524     0.556  0.737
## 4 Neural Network Keras       Clustering   Tabular         0.718     0.551  0.394
## 5 Random Forest  TensorFlow  Classificat… Text            0.772     0.714  0.824
## 6 SVM            Scikit-lea… Regression   Tabular         0.662     0.913  0.970
## 7 K-Means        TensorFlow  Regression   Tabular         0.996     0.728  0.795
## 8 K-Means        PyTorch     Regression   Time Series     0.561     0.617  0.828
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
ggplot(datos_sin_na, aes(y = Training_Time)) +
  geom_boxplot(outlier.colour = "brown", outlier.shape = 16, outlier.size = 2, fill = "lightcoral", color = "darkred") +
  labs(title = "Caja de Bigote de Training Time",
       y = "Training Time") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Analisis de las variables numéricas después de los tratamientos

Accuracy

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Accuracy))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5038 0.6183 0.749 0.7507 0.8698 0.9997

Coeficiente de dispersión

media<-mean(datos_sin_na$Accuracy, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 19.56421

Asimetría

media <- mean(datos_sin_na$Accuracy, na.rm = TRUE)
mediana <- median(datos_sin_na$Accuracy, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.03537286

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Accuracy)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Accuracy
## A = 4.9922, p-value = 2.319e-12

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Accuracy), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Accuracy") +
  xlab("Accuracy") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Precision

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Precision))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4031 0.5661 0.7275 0.7154 0.867 0.999

Coeficiente de dispersión

media<-mean(datos_sin_na$Precision, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Precision, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 24.35969

Asimetría

media <- mean(datos_sin_na$Precision, na.rm = TRUE)
mediana <- median(datos_sin_na$Precision, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] -0.2088443

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Precision)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Precision
## A = 5.5097, p-value = 1.331e-13

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Precision), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Precision") +
  xlab("Precision") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Recall

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Recall))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3001 0.4898 0.6513 0.6573 0.8365 0.9985

Coeficiente de dispersión

media <- mean(datos_sin_na$Recall, na.rm = TRUE)
sd_a <- sd(datos_sin_na$Recall, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a
## [1] 31.41571

Asimetría

media <- mean(datos_sin_na$Recall, na.rm = TRUE)
mediana <- median(datos_sin_na$Recall, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.08711636

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Recall)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Recall
## A = 5.1701, p-value = 8.675e-13

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Recall), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Recall") +
  xlab("Recall") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

F1_Score

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$F1_Score))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4 0.5486 0.7031 0.6955 0.8396 0.9993

Coeficiente de dispersión

media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
sd_a <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a
## [1] 24.64901

Asimetría

media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
mediana <- median(datos_sin_na$F1_Score, na.rm = TRUE)
desviacion <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] -0.1334735

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$F1_Score)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$F1_Score
## A = 5.019, p-value = 2e-12

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$F1_Score), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de F1_Score") +
  xlab("F1_Score") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Training_Time

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Training_Time))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1032 1.298 2.481 2.552 3.845 4.998

Coeficiente de dispersión

media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
sd_a <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a
## [1] 56.11634

Asimetría

media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
mediana <- median(datos_sin_na$Training_Time, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.1492231

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Training_Time)
print(ad_test)
## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Training_Time
## A = 6.1063, p-value = 4.996e-15

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Training_Time), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Training_Time") +
  xlab("Training_Time") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Analisis Bivariado

Recordemos que nuestra pregunta problema es: ¿Cómo varía la precisión (Precision) de los diferentes algoritmos de aprendizaje automático según el tipo de problema (Problem_Type)?

Para resolver esta pregunta haremos 2 tipos de graficas:

  1. Grafico de puntos
ggplot(datos_sin_na, aes(x = Problem_Type, y = Precision)) +
  geom_jitter(aes(color = Problem_Type), size = 3, width = 0.2) +
  labs(title = "Precisión por Tipo de Problema",
       x = "Tipo de Problema",
       y = "Precisión") +
  theme_minimal()

Este gráfico muestra la distribución de las precisiones individuales para cada tipo de problema:clasificación, agrupamiento y regresión.

En el eje vertical se observa la precisión, que varía entre 0.4 y 1.0, mientras que el eje horizontal clasifica los tres tipos de problemas. Cada punto representa la precisión de un modelo en su respectiva categoría. Se puede ver que los modelos de clasificación y agrupamiento generalmente tienen una precisión más alta y concentrada en valores cercanos a 1.0, mientras que los modelos de regresión presentan una mayor dispersión y una precisión generalmente más baja.

  1. Grafico de barra con la media de precisión
ggplot(datos_sin_na, aes(x = Problem_Type, y = Precision, fill = Problem_Type)) +
  stat_summary(fun = mean, geom = "bar", position = "dodge") +
  labs(title = "Media de Precisión por Tipo de Problema",
       x = "Tipo de Problema",
       y = "Media de Precisión") +
  theme_minimal()

Este gráfico presenta el promedio de las precisiones obtenidas, con regresión ligeramente superior a clasificación y clustering, aunque la diferencia es mínima. Esto sugiere que los modelos en cada tipo de problema tienden a obtener niveles de precisión comparables en promedio.

Analisis combinado

  1. Variabilidad: La primera gráfica (strip plot) muestra la dispersión de la precisión de cada modelo para los tres tipos de problemas. Aquí se observa que los modelos de clasificación y clustering tienden a tener una precisión bastante concentrada en niveles altos, mientras que los modelos de regresión presentan una mayor variabilidad, con algunos modelos alcanzando precisiones bajas.

  2. La segunda gráfica (gráfico de barras) presenta las medias de precisión para cada tipo de problema. En esta gráfica se observa que, en promedio, los tres tipos de problemas tienen medias de precisión similares, alrededor de 0.65, con los problemas de regresión mostrando una ligera ventaja.

Conclusión

El análisis de la precisión de diferentes algoritmos de aprendizaje automático muestra que los métodos de regresión son más efectivos en comparación con los de clasificación y clustering, presentando una mayor mediana y un rango más amplio de valores. Mientras que la clasificación y el clustering muestran un rendimiento similar y menor.