En este análisis exploratorio de datos (EDA), se buscará comprender el comportamiento de distintos algoritmos de Inteligencia Artificial (IA), frameworks, tipos de problemas y conjuntos de datos mediante el análisis de métricas clave como precisión, recall, F1_Score y tiempo de entrenamiento. Este proceso permitirá identificar patrones, relaciones y posibles optimizaciones en el uso de diferentes herramientas de IA, con el fin de mejorar el rendimiento de los modelos en distintos contextos.
La pregunta que se abordará es:
¿Cómo varía la precisión (Precision) de los diferentes algoritmos de aprendizaje automático según el tipo de problema (Problem_Type)?
Esta pregunta nos permitirá evaluar la eficiencia de los distintos frameworks al implementar redes neuronales, enfocándonos en cómo se desempeñan en términos de tiempo, un factor crucial en el entrenamiento de modelos.
Diccionario de variables que se encuentran en nuestra base de datos:
Algorithm (categórica): Tipo de algoritmo de IA utilizado(‘Neural Network’, ‘Random Forest’, ‘SVM’, ‘K-Means’)
Framework (categórica): Framework o biblioteca utilizada para la implementación del modelo de IA(‘TensorFlow’, ‘PyTorch’, ‘Keras’,‘Scikit-learn’)
Problem_Type (categórica): Tipo de problema abordado por el modelo.(‘Classification’, ‘Regression’, ‘Clustering’)
Dataset_Type (categórica): Tipo de datos utilizados en el entrenamiento del modelo(‘Image’, ‘Text’, ‘Tabular’, ‘Time Series’.)
Accuracy (numérica, continua): Precisión del modelo en el conjunto de prueba (entre 0 y 1).}
Precision (numérica, continua): Precisión del modelo (valor entre 0 y 1).
Recall (numérica, continua): Sensibilidad o capacidad del modelo para identificar correctamente los positivos (entre 0 y 1).
F1_Score (numérica, continua): Medida armónica entre precisión y recall (entre 0 y 1).
Training_Time (numérica, continua): Tiempo de entrenamiento del modelo en horas.
Date (fecha): Fecha en la que se realizó la evaluación del modelo, cubriendo el último año.
¿Qué framework presenta el menor tiempo de entrenamiento (Training_Time) promedio al trabajar con redes neuronales (Neural Networks) en problemas de clasificación?
library(readxl)
rut<-"C:/Users/User/Documents/PROYECTOS RSTUDIO/EDA semana uribe/Dataset_IA_corte_II.xlsx"
datos<-read_excel(rut)
head(datos)
## # A tibble: 6 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 SVM Scikit-lea… Regression Time Series 0.662 0.693 NA
## 2 K-Means Keras Clustering Time Series 0.744 0.490 0.877
## 3 Neural Network Keras Clustering Image 0.885 0.595 0.969
## 4 SVM Keras Clustering Text 0.842 0.842 0.875
## 5 SVM Scikit-lea… Regression Tabular 0.723 0.686 0.301
## 6 K-Means PyTorch Regression Image 0.637 0.626 7.45
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
names(datos)
## [1] "Algorithm" "Framework" "Problem_Type" "Dataset_Type"
## [5] "Accuracy" "Precision" "Recall" "F1_Score"
## [9] "Training_Time" "Date"
dim(datos)
## [1] 560 10
str(datos)
## tibble [560 × 10] (S3: tbl_df/tbl/data.frame)
## $ Algorithm : chr [1:560] "SVM" "K-Means" "Neural Network" "SVM" ...
## $ Framework : chr [1:560] "Scikit-learn" "Keras" "Keras" "Keras" ...
## $ Problem_Type : chr [1:560] "Regression" "Clustering" "Clustering" "Clustering" ...
## $ Dataset_Type : chr [1:560] "Time Series" "Time Series" "Image" "Text" ...
## $ Accuracy : num [1:560] 0.662 0.744 0.885 0.842 0.723 ...
## $ Precision : num [1:560] 0.693 0.49 0.595 0.842 0.686 ...
## $ Recall : num [1:560] NA 0.877 0.969 0.875 0.301 ...
## $ F1_Score : num [1:560] 0.443 0.441 0.964 0.704 0.646 ...
## $ Training_Time: num [1:560] 4.98 NA 3.28 4.04 3.6 ...
## $ Date : POSIXct[1:560], format: "2023-03-08 11:26:21" "2023-03-09 11:26:21" ...
library(knitr)
library(kableExtra)
kable(datos, caption="Base de datos, Inteligencia Artificial(IA)") %>%
kable_styling(full_width=F) %>%
column_spec(2, width="20em") %>%
scroll_box(width="900px", height="450px")
| Algorithm | Framework | Problem_Type | Dataset_Type | Accuracy | Precision | Recall | F1_Score | Training_Time | Date |
|---|---|---|---|---|---|---|---|---|---|
| SVM | Scikit-learn | Regression | Time Series | 0.6618051 | 0.6929447 | NA | 0.4426950 | 4.9785924 | 2023-03-08 11:26:21 |
| K-Means | Keras | Clustering | Time Series | 0.7443216 | 0.4900292 | 0.8766533 | 0.4414046 | NA | 2023-03-09 11:26:21 |
| Neural Network | Keras | Clustering | Image | 0.8852037 | 0.5948056 | 0.9685424 | 0.9644707 | 3.2825938 | 2023-03-10 11:26:21 |
| SVM | Keras | Clustering | Text | 0.8416477 | 0.8424142 | 0.8748388 | 0.7041523 | 4.0416289 | 2023-03-11 11:26:21 |
| SVM | Scikit-learn | Regression | Tabular | 0.7229514 | 0.6856109 | 0.3010956 | 0.6456472 | 3.6039908 | 2023-03-12 11:26:21 |
| K-Means | PyTorch | Regression | Image | 0.6368133 | 0.6255330 | 7.4548096 | 0.8865271 | 3.0064753 | 2023-03-13 11:26:21 |
| Neural Network | PyTorch | Regression | Text | 0.9985623 | 0.6366858 | 0.3357948 | 0.9014956 | NA | 2023-03-14 11:26:21 |
| Neural Network | Scikit-learn | Regression | Image | 0.7130907 | 0.6756681 | 0.4803251 | 0.5993146 | 2.3283453 | 2023-03-15 11:26:21 |
| SVM | Keras | Regression | Time Series | NA | 0.8710099 | 0.3416673 | 0.8161708 | 3.4064529 | 2023-03-16 11:26:21 |
| Random Forest | Keras | Regression | Text | 0.5818119 | 0.9352508 | NA | 0.8626737 | 3.4199049 | 2023-03-17 11:26:21 |
| SVM | PyTorch | Regression | Image | 0.8974048 | 9.7320081 | 0.7806129 | 0.7927904 | 1.9283008 | 2023-03-18 11:26:21 |
| SVM | Keras | Clustering | Image | 0.8468411 | 0.8721420 | 0.3801413 | 0.4909570 | 4.7142907 | 2023-03-19 11:26:21 |
| SVM | TensorFlow | Clustering | Tabular | 0.6103848 | 0.5892441 | 0.5686872 | 0.9255299 | 0.9200495 | 2023-03-20 11:26:21 |
| SVM | PyTorch | Clustering | Image | 0.5411905 | 0.8128808 | 0.6193656 | 0.7234567 | 2.5517613 | 2023-03-21 11:26:21 |
| K-Means | Keras | Clustering | Text | 0.8402497 | 0.6625619 | 0.5583371 | 0.5694835 | 3.4853315 | 2023-03-22 11:26:21 |
| Neural Network | PyTorch | Regression | Text | NA | 0.5528024 | 0.3847175 | 0.6551369 | 3.5159654 | 2023-03-23 11:26:21 |
| K-Means | TensorFlow | Classification | Tabular | 0.6366298 | 0.9045229 | 0.5932635 | 0.4225427 | 3.2783309 | 2023-03-24 11:26:21 |
| K-Means | PyTorch | Regression | Text | 0.9754318 | 0.4230558 | 0.8258246 | 0.4767201 | 1.4489122 | 2023-03-25 11:26:21 |
| K-Means | PyTorch | Classification | Time Series | 0.5755289 | 0.9410572 | 0.3497054 | 0.8593281 | 0.8654122 | 2023-03-26 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.7161674 | 0.6768865 | 0.3561260 | 0.4000070 | 3.2161076 | 2023-03-27 11:26:21 |
| Random Forest | PyTorch | Regression | Text | 9.7180796 | 0.7823209 | 0.5483399 | 0.6499395 | 3.0365804 | 2023-03-28 11:26:21 |
| Neural Network | TensorFlow | Clustering | Image | 0.7098637 | 0.7956124 | 0.9592080 | 0.7135061 | 0.9788445 | 2023-03-29 11:26:21 |
| Random Forest | Scikit-learn | Regression | Image | 0.8192630 | 0.9370706 | 0.7680009 | 0.4327807 | 3.5551818 | 2023-03-30 11:26:21 |
| K-Means | PyTorch | Clustering | Text | 0.6987972 | 0.7820018 | 0.7750690 | 0.9838469 | 2.3303428 | 2023-03-31 11:26:21 |
| K-Means | PyTorch | Classification | Tabular | 0.6371076 | 0.7683602 | 0.5533440 | 0.5356752 | 3.3720269 | 2023-04-01 11:26:21 |
| Random Forest | Keras | Classification | Image | 0.9919888 | 0.4399912 | 0.7155626 | 0.5825192 | 4.2030987 | 2023-04-02 11:26:21 |
| Random Forest | PyTorch | Classification | Tabular | 0.7046670 | 0.7110448 | 0.3070918 | 0.5823655 | 0.9316693 | 2023-04-03 11:26:21 |
| Random Forest | Keras | Regression | Text | 0.9470496 | 0.4901014 | 0.7452672 | 0.5382500 | 0.1938099 | 2023-04-04 11:26:21 |
| K-Means | Scikit-learn | Classification | Text | 0.6149773 | 0.8424603 | 0.9393009 | 0.4008843 | 3.9176027 | 2023-04-05 11:26:21 |
| K-Means | TensorFlow | Regression | Time Series | NA | 0.7073332 | 0.7288014 | 0.8376069 | 3.0875174 | 2023-04-06 11:26:21 |
| Neural Network | TensorFlow | Classification | Image | 0.5155670 | 0.8081367 | 0.9115890 | 0.9801073 | 3.5282339 | 2023-04-07 11:26:21 |
| Neural Network | Scikit-learn | Classification | Text | 0.8258334 | 0.4250037 | NA | 0.5345761 | 4.2069594 | 2023-04-08 11:26:21 |
| K-Means | TensorFlow | Regression | Time Series | 0.6842632 | 0.4508752 | 0.3843909 | 0.7978283 | 4.0337736 | 2023-04-09 11:26:21 |
| Random Forest | Scikit-learn | Regression | Image | NA | 0.8297940 | 0.9317173 | 8.4513780 | 4.8085087 | 2023-04-10 11:26:21 |
| Random Forest | Scikit-learn | Classification | Text | 0.7366050 | 0.4432506 | 0.3465107 | 0.9090552 | 2.7260819 | 2023-04-11 11:26:21 |
| Neural Network | Scikit-learn | Regression | Text | 0.9840967 | 0.4427540 | 0.6737772 | 0.6535775 | 2.4936286 | 2023-04-12 11:26:21 |
| K-Means | TensorFlow | Classification | Image | 5.9276276 | NA | 0.3994960 | 0.5817585 | 2.0692473 | 2023-04-13 11:26:21 |
| Neural Network | Keras | Regression | Image | 0.9343116 | 0.9739008 | 0.3081946 | 0.5951771 | 0.8530861 | 2023-04-14 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Image | 0.8882984 | 0.8425050 | 0.5954240 | 0.8275728 | NA | 2023-04-15 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.8854609 | 0.6119508 | 0.5065285 | 0.8900677 | 1.4573614 | 2023-04-16 11:26:21 |
| SVM | TensorFlow | Clustering | Text | 0.9223916 | 0.5779213 | 0.6402003 | 0.5089684 | 4.6144068 | 2023-04-17 11:26:21 |
| SVM | Scikit-learn | Clustering | Time Series | 0.8805120 | 0.6098219 | 0.7040399 | NA | 2.9576349 | 2023-04-18 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Tabular | 0.8131102 | 0.8647921 | NA | 0.9411641 | 3.0049169 | 2023-04-19 11:26:21 |
| K-Means | PyTorch | Classification | Image | 0.5656224 | 0.7968224 | 0.3861023 | 0.8840161 | 1.8391308 | 2023-04-20 11:26:21 |
| K-Means | Keras | Clustering | Text | NA | 0.5111173 | 0.6910497 | 0.9909150 | 0.3547478 | 2023-04-21 11:26:21 |
| K-Means | Keras | Clustering | Text | 0.9604239 | 0.5044656 | 0.5402171 | 0.8525490 | 0.2555445 | 2023-04-22 11:26:21 |
| K-Means | Keras | Clustering | Image | 0.8083252 | 0.4590374 | 0.8104214 | 0.6359171 | 2.1743422 | 2023-04-23 11:26:21 |
| SVM | Scikit-learn | Clustering | Tabular | 0.8982686 | 0.7961816 | 0.7566042 | 0.7543827 | 0.5143704 | 2023-04-24 11:26:21 |
| Random Forest | Scikit-learn | Regression | Image | 0.7407612 | 0.8586236 | 0.8919231 | 0.7966086 | NA | 2023-04-25 11:26:21 |
| Random Forest | Scikit-learn | Classification | Tabular | 0.5586541 | 0.5590279 | 0.7847450 | 0.4470735 | 4.4167677 | 2023-04-26 11:26:21 |
| SVM | TensorFlow | Clustering | Text | 0.5625929 | 0.4125670 | 0.6009518 | 0.7266982 | 4.4262826 | 2023-04-27 11:26:21 |
| Random Forest | PyTorch | Clustering | Tabular | 0.8427826 | 0.4493030 | 0.7710766 | 0.8255925 | 3.3293225 | 2023-04-28 11:26:21 |
| SVM | Scikit-learn | Clustering | Time Series | 0.7151529 | 0.9807160 | 0.4927668 | 0.5003928 | 1.1350360 | 2023-04-29 11:26:21 |
| K-Means | PyTorch | Classification | Text | 0.6002624 | 0.5772669 | 0.5144194 | 0.8683790 | 4.3286845 | 2023-04-30 11:26:21 |
| SVM | PyTorch | Regression | Time Series | 0.7457973 | 0.8615339 | 0.8522896 | NA | 4.4418578 | 2023-05-01 11:26:21 |
| K-Means | Keras | Classification | Image | 0.5321045 | 0.7747981 | NA | 0.9713331 | 1.0636026 | 2023-05-02 11:26:21 |
| K-Means | TensorFlow | Regression | Image | 0.7909857 | 0.6291638 | 0.8588662 | 0.4254534 | 3.7133878 | 2023-05-03 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Image | 0.6344967 | 0.5234124 | 0.8756957 | 0.5591957 | 1.5039458 | 2023-05-04 11:26:21 |
| SVM | Keras | Clustering | Tabular | 0.8987796 | 0.4728319 | 0.9002936 | 0.7609323 | 4.0329375 | 2023-05-05 11:26:21 |
| Neural Network | Keras | Classification | Text | 0.6551810 | 0.7690078 | 0.9416447 | 0.5779359 | 4.9864657 | 2023-05-06 11:26:21 |
| SVM | Scikit-learn | Classification | Image | 0.7276101 | 0.8647803 | 0.6016897 | 0.8286545 | 0.2471274 | 2023-05-07 11:26:21 |
| SVM | PyTorch | Regression | Image | 0.5058103 | 0.7863426 | 0.5232068 | 0.8554032 | 4.4970927 | 2023-05-08 11:26:21 |
| Neural Network | Keras | Regression | Text | 0.5362234 | 0.7181813 | 0.7075391 | 0.4615096 | 3.1508896 | 2023-05-09 11:26:21 |
| Neural Network | PyTorch | Classification | Image | 0.6962468 | 0.4251707 | 0.5598207 | 0.7083127 | 4.8695748 | 2023-05-10 11:26:21 |
| SVM | TensorFlow | Regression | Time Series | 0.7399694 | 0.9810933 | 0.7207519 | 0.7053343 | 2.3785466 | 2023-05-11 11:26:21 |
| Random Forest | PyTorch | Classification | Text | 0.8000103 | 0.8792285 | 0.7939101 | 0.6215685 | 4.2522009 | 2023-05-12 11:26:21 |
| K-Means | TensorFlow | Regression | Text | 0.6458313 | 0.5756932 | 0.7818835 | 0.9597549 | 0.4057309 | 2023-05-13 11:26:21 |
| Neural Network | PyTorch | Classification | Text | 0.8474909 | 0.9879822 | 0.5621870 | 0.8965038 | 1.7428673 | 2023-05-14 11:26:21 |
| K-Means | Keras | Classification | Time Series | 0.9300612 | 0.7611290 | 0.4168021 | 0.8183256 | 0.4283810 | 2023-05-15 11:26:21 |
| Random Forest | Keras | Classification | Text | 0.8899255 | 0.7494536 | 0.6013705 | 0.8285960 | 4.8791814 | 2023-05-16 11:26:21 |
| Random Forest | TensorFlow | Regression | Time Series | 0.5198094 | 0.8488439 | 0.3998159 | 0.6770297 | 4.1031721 | 2023-05-17 11:26:21 |
| Random Forest | Scikit-learn | Regression | Text | 0.7402535 | 0.8870619 | 0.9230679 | 0.9525967 | 4.2774824 | 2023-05-18 11:26:21 |
| Neural Network | TensorFlow | Regression | Tabular | 0.5524651 | 0.7938872 | 0.5421142 | 0.8167573 | 4.6960296 | 2023-05-19 11:26:21 |
| Random Forest | TensorFlow | Classification | Image | 0.6210225 | 0.4768574 | NA | 0.8373886 | 0.5170071 | 2023-05-20 11:26:21 |
| Neural Network | PyTorch | Regression | Tabular | 0.9933313 | 0.6029605 | 0.3178134 | 0.9170145 | NA | 2023-05-21 11:26:21 |
| Random Forest | PyTorch | Regression | Image | 0.5712478 | 0.9568502 | 0.7520757 | 0.5644430 | 0.4480710 | 2023-05-22 11:26:21 |
| K-Means | TensorFlow | Classification | Time Series | 0.7494441 | 0.5347694 | 0.7458316 | 0.8842425 | 1.1328862 | 2023-05-23 11:26:21 |
| K-Means | Keras | Clustering | Tabular | 0.8090779 | 0.6233002 | 0.5384229 | NA | 1.2238923 | 2023-05-24 11:26:21 |
| SVM | PyTorch | Classification | Text | 0.8512325 | 0.6592461 | 0.3501983 | 0.6072052 | 2.3986253 | 2023-05-25 11:26:21 |
| K-Means | Scikit-learn | Regression | Image | 0.7798243 | 0.6636430 | 0.5867402 | 0.6013663 | 1.4159546 | 2023-05-26 11:26:21 |
| SVM | PyTorch | Classification | Image | 0.5048854 | 0.7677637 | 0.5178522 | 0.9871153 | 0.5947927 | 2023-05-27 11:26:21 |
| K-Means | PyTorch | Clustering | Tabular | 0.6632307 | 0.9658455 | 0.7739844 | 0.9139223 | 0.9207063 | 2023-05-28 11:26:21 |
| Neural Network | Scikit-learn | Classification | Time Series | 0.7588558 | 0.5444156 | 0.7240455 | 0.8207019 | 0.8216503 | 2023-05-29 11:26:21 |
| K-Means | TensorFlow | Regression | Tabular | 0.5439332 | 0.4729008 | 0.5552156 | 0.8362341 | 4.8680763 | 2023-05-30 11:26:21 |
| SVM | Scikit-learn | Clustering | Time Series | 0.6753135 | 0.5184823 | 0.4525248 | NA | 3.8205333 | 2023-05-31 11:26:21 |
| SVM | TensorFlow | Regression | Time Series | 0.5166016 | 0.9321549 | 0.9916252 | 0.9682544 | 4.8420312 | 2023-06-01 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | 0.5392892 | 0.7874865 | 0.6178011 | 0.6977553 | 2.2547284 | 2023-06-02 11:26:21 |
| Neural Network | TensorFlow | Clustering | Tabular | 0.6984616 | 0.5715441 | 0.7817920 | 0.6283106 | 1.4640056 | 2023-06-03 11:26:21 |
| K-Means | Scikit-learn | Classification | Tabular | 0.5663579 | 0.8895682 | 0.3983871 | 0.4978212 | 4.0116367 | 2023-06-04 11:26:21 |
| Random Forest | TensorFlow | Classification | Time Series | 0.7837704 | 0.9168220 | NA | 0.8717234 | 1.6986442 | 2023-06-05 11:26:21 |
| K-Means | Keras | Regression | Tabular | 0.8447325 | 0.9079086 | 0.3192757 | 0.8406664 | 1.5669785 | 2023-06-06 11:26:21 |
| K-Means | Scikit-learn | Classification | Text | 0.9002933 | 0.9513559 | 0.6538186 | 0.6306130 | 1.2392794 | 2023-06-07 11:26:21 |
| Random Forest | Keras | Classification | Text | 0.6000751 | 0.5513446 | 0.9748122 | 0.4151160 | 0.7354270 | 2023-06-08 11:26:21 |
| Random Forest | TensorFlow | Clustering | Tabular | 0.5837413 | 0.8530252 | 0.5689447 | 0.9033984 | 1.3566690 | 2023-06-09 11:26:21 |
| Random Forest | PyTorch | Classification | Tabular | 0.5522839 | 0.6763237 | 0.3272972 | 0.4068508 | 1.8410668 | 2023-06-10 11:26:21 |
| Random Forest | TensorFlow | Clustering | Time Series | 0.8182151 | 0.9051991 | 0.3216691 | 0.8222199 | 3.4025716 | 2023-06-11 11:26:21 |
| Random Forest | Keras | Clustering | Text | 8.5323786 | 0.8370944 | NA | 0.9821543 | 0.4065677 | 2023-06-12 11:26:21 |
| K-Means | Keras | Clustering | Text | 0.5157931 | 0.8658685 | 0.4120175 | 0.6625968 | 1.1314840 | 2023-06-13 11:26:21 |
| Random Forest | Scikit-learn | Regression | Image | 0.9681061 | 0.7936971 | 0.3163465 | 0.5409840 | 4.0642158 | 2023-06-14 11:26:21 |
| Neural Network | PyTorch | Regression | Time Series | 5.2598564 | 0.5064573 | 0.8293495 | 0.8229226 | 0.8199174 | 2023-06-15 11:26:21 |
| SVM | Scikit-learn | Regression | Image | 0.7706482 | 0.7270162 | 0.6209661 | 0.8902769 | 1.7789612 | 2023-06-16 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.8545303 | 0.9908018 | 0.5024713 | 0.7278582 | 4.3368287 | 2023-06-17 11:26:21 |
| Random Forest | TensorFlow | Regression | Time Series | 0.9354846 | 0.9624328 | NA | 0.9802212 | NA | 2023-06-18 11:26:21 |
| K-Means | PyTorch | Regression | Tabular | 0.8570435 | 0.4259042 | 0.3812973 | 0.4310012 | 0.5054105 | 2023-06-19 11:26:21 |
| Random Forest | TensorFlow | Classification | Time Series | 0.9008640 | 0.4988889 | 0.9691432 | 0.7028774 | 2.4742362 | 2023-06-20 11:26:21 |
| Random Forest | TensorFlow | Regression | Tabular | 0.6697251 | 0.4790373 | 0.5197767 | 0.8310724 | 1.5818556 | 2023-06-21 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | NA | 0.8355879 | 0.9218826 | 0.9175843 | 2.8607010 | 2023-06-22 11:26:21 |
| K-Means | Scikit-learn | Clustering | Time Series | 0.5400574 | 0.8906712 | 0.7220600 | 0.5075534 | 4.0386440 | 2023-06-23 11:26:21 |
| Random Forest | TensorFlow | Clustering | Tabular | 0.9474083 | 0.5281068 | 0.8786952 | 0.8800021 | 0.7720276 | 2023-06-24 11:26:21 |
| SVM | Keras | Clustering | Image | 0.7737962 | 0.7035116 | 0.9888092 | 0.7316242 | 2.9454255 | 2023-06-25 11:26:21 |
| K-Means | TensorFlow | Clustering | Text | 0.9086489 | 0.9044218 | 0.5018838 | 0.6379322 | 2.5769929 | 2023-06-26 11:26:21 |
| SVM | PyTorch | Clustering | Tabular | 0.7261591 | 0.8396809 | 0.9727948 | 0.4790290 | 0.8059861 | 2023-06-27 11:26:21 |
| K-Means | PyTorch | Clustering | Image | 0.8217888 | 0.7253423 | 5.7263733 | 0.9191775 | 3.1575926 | 2023-06-28 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | 0.7632013 | 0.7542086 | 0.5698583 | 0.4943639 | 1.4408380 | 2023-06-29 11:26:21 |
| SVM | Scikit-learn | Clustering | Text | 0.8657948 | 0.7050163 | 0.5382710 | 5.8587272 | NA | 2023-06-30 11:26:21 |
| K-Means | TensorFlow | Clustering | Image | NA | 0.5785291 | 0.6789853 | 0.5740273 | 0.5031344 | 2023-07-01 11:26:21 |
| Neural Network | Scikit-learn | Classification | Text | 0.5301760 | 0.7390132 | 0.4079015 | NA | 2.3519619 | 2023-07-02 11:26:21 |
| Random Forest | Scikit-learn | Regression | Text | 0.6235516 | 0.8133312 | 0.6875980 | 0.8036218 | 1.6016710 | 2023-07-03 11:26:21 |
| K-Means | PyTorch | Regression | Image | 0.5797723 | 0.9239937 | 0.6791934 | 0.8780088 | 4.1296679 | 2023-07-04 11:26:21 |
| Neural Network | PyTorch | Regression | Image | 0.9358918 | 0.7817748 | 0.8333313 | 0.5502807 | 0.3791958 | 2023-07-05 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | 0.6096070 | 0.8566729 | 0.8835550 | 0.7749245 | 2.1511756 | 2023-07-06 11:26:21 |
| Neural Network | TensorFlow | Clustering | Time Series | 0.9879326 | 0.4960430 | 0.6083089 | 0.7430476 | 2.3463140 | 2023-07-07 11:26:21 |
| Random Forest | Keras | Clustering | Image | 0.6684479 | 0.6769345 | 0.5116333 | 0.8996982 | 3.6525298 | 2023-07-08 11:26:21 |
| SVM | Scikit-learn | Classification | Image | 0.5910590 | 4.0559897 | 0.4815344 | 0.9436522 | 2.9140091 | 2023-07-09 11:26:21 |
| Neural Network | TensorFlow | Clustering | Image | 0.8948493 | 0.5480073 | 0.4362367 | 0.4072941 | 3.3693405 | 2023-07-10 11:26:21 |
| K-Means | PyTorch | Classification | Image | 0.8293539 | NA | NA | 0.8044120 | 3.9075415 | 2023-07-11 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | 0.7490979 | NA | 0.5397136 | 0.4311015 | 4.3247946 | 2023-07-12 11:26:21 |
| Neural Network | TensorFlow | Clustering | Time Series | 0.7776818 | 0.4595069 | 4.8917338 | NA | 1.6379899 | 2023-07-13 11:26:21 |
| K-Means | Keras | Classification | Text | 0.8596009 | 0.6408966 | 0.9764922 | 0.5725796 | 2.7363113 | 2023-07-14 11:26:21 |
| K-Means | PyTorch | Classification | Image | NA | 0.8800426 | 0.6903962 | 0.5840660 | 4.2165296 | 2023-07-15 11:26:21 |
| K-Means | PyTorch | Clustering | Tabular | 0.9981670 | 0.5224214 | 0.5430939 | 0.6117751 | 4.9483105 | 2023-07-16 11:26:21 |
| Neural Network | PyTorch | Regression | Tabular | 0.9873966 | 0.7330510 | 0.7063273 | 0.7727755 | 44.5864462 | 2023-07-17 11:26:21 |
| Neural Network | TensorFlow | Regression | Tabular | 0.8251628 | 0.8398428 | 0.3974377 | 0.6004300 | 1.9201896 | 2023-07-18 11:26:21 |
| Neural Network | Scikit-learn | Regression | Text | 0.5997712 | 0.7695913 | 0.6108306 | 0.8396194 | 1.0563159 | 2023-07-19 11:26:21 |
| SVM | PyTorch | Classification | Image | 0.8401141 | 0.5128148 | 0.7383640 | 0.6427164 | 2.4980234 | 2023-07-20 11:26:21 |
| Neural Network | Scikit-learn | Classification | Time Series | 0.5360992 | 0.6132307 | 0.6422285 | 0.4410119 | 3.7340431 | 2023-07-21 11:26:21 |
| Neural Network | Keras | Classification | Time Series | 0.5153263 | 0.8702751 | 0.5812452 | 0.8702559 | 2.5135884 | 2023-07-22 11:26:21 |
| Neural Network | PyTorch | Classification | Tabular | NA | 0.7325359 | 0.9956939 | 0.5714550 | 2.4675862 | 2023-07-23 11:26:21 |
| SVM | PyTorch | Regression | Text | 0.7313115 | 0.4031378 | 0.9162203 | 0.6596601 | 4.2073291 | 2023-07-24 11:26:21 |
| Neural Network | Keras | Clustering | Text | 0.9341363 | 0.8565945 | 0.7363842 | 0.8112663 | 1.8708785 | 2023-07-25 11:26:21 |
| K-Means | PyTorch | Classification | Text | 0.8635845 | 0.4211868 | 0.6985642 | 0.5994737 | 4.3129939 | 2023-07-26 11:26:21 |
| Neural Network | Scikit-learn | Classification | Time Series | 0.8713533 | 0.8474403 | 0.7344623 | 0.4339514 | 20.9334352 | 2023-07-27 11:26:21 |
| K-Means | TensorFlow | Regression | Tabular | 7.1274667 | 0.5214883 | 0.4409185 | 0.6243526 | 1.7083422 | 2023-07-28 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | NA | 0.9748441 | 0.5765964 | 0.9666691 | 2.3245506 | 2023-07-29 11:26:21 |
| K-Means | TensorFlow | Clustering | Time Series | 0.6855194 | 6.2076445 | 0.3276217 | 0.7850406 | 3.8359901 | 2023-07-30 11:26:21 |
| SVM | TensorFlow | Regression | Tabular | NA | 0.5961590 | 0.6328822 | 0.8028875 | 0.7174099 | 2023-07-31 11:26:21 |
| SVM | Scikit-learn | Regression | Tabular | 5.2005460 | 0.4893328 | 0.6801172 | 0.7793693 | 1.0624542 | 2023-08-01 11:26:21 |
| SVM | Scikit-learn | Clustering | Image | NA | 0.5833625 | 0.4594248 | 0.5193953 | 4.7620796 | 2023-08-02 11:26:21 |
| Neural Network | Scikit-learn | Classification | Image | 0.7893377 | 0.9259905 | 0.9748202 | 0.6510003 | 0.9599090 | 2023-08-03 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | 0.7193077 | 0.9978006 | 9.3661823 | 0.8505639 | 2.8818639 | 2023-08-04 11:26:21 |
| SVM | Keras | Regression | Text | 0.8626288 | 0.6209857 | 0.8055003 | 0.4608237 | 2.9386409 | 2023-08-05 11:26:21 |
| SVM | PyTorch | Classification | Image | 0.7433345 | 0.6691664 | 0.6733706 | 0.5667117 | 2.4991599 | 2023-08-06 11:26:21 |
| Neural Network | TensorFlow | Classification | Tabular | 0.9367116 | 0.8332426 | 0.9089784 | 0.5657915 | 3.2592517 | 2023-08-07 11:26:21 |
| SVM | Scikit-learn | Regression | Image | 0.9503509 | 0.9317175 | 0.3914566 | 0.6592114 | 1.2261510 | 2023-08-08 11:26:21 |
| Neural Network | Scikit-learn | Regression | Tabular | 0.7108605 | 0.7558266 | 0.8533569 | 0.9882212 | 2.8080451 | 2023-08-09 11:26:21 |
| Random Forest | Scikit-learn | Regression | Time Series | 0.6384139 | 0.6349154 | 0.3873746 | 0.4405015 | 1.9236490 | 2023-08-10 11:26:21 |
| SVM | Keras | Clustering | Text | 0.7961752 | 0.6475731 | 0.8559475 | 0.7112206 | 3.3421696 | 2023-08-11 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Time Series | 0.9561817 | 0.8173709 | 0.4930373 | 0.5076188 | 0.7919954 | 2023-08-12 11:26:21 |
| Neural Network | Scikit-learn | Classification | Time Series | NA | 0.4019310 | 0.9139634 | 0.9824059 | 28.9729934 | 2023-08-13 11:26:21 |
| K-Means | Scikit-learn | Regression | Tabular | 0.8114833 | 0.7717536 | 0.9608295 | 0.4679821 | 1.0078247 | 2023-08-14 11:26:21 |
| SVM | PyTorch | Classification | Time Series | 0.8157801 | 0.6132958 | 0.4041572 | 0.6421606 | NA | 2023-08-15 11:26:21 |
| Neural Network | Keras | Classification | Tabular | 0.8665565 | 0.8765184 | 0.6238729 | 0.8427310 | 1.1716780 | 2023-08-16 11:26:21 |
| K-Means | Scikit-learn | Classification | Image | NA | 0.4557944 | 0.9866912 | 0.8227327 | 0.9959051 | 2023-08-17 11:26:21 |
| K-Means | TensorFlow | Classification | Image | NA | 0.7529214 | 0.6383852 | 0.6536372 | 4.1459837 | 2023-08-18 11:26:21 |
| Random Forest | TensorFlow | Regression | Tabular | 0.9545163 | 0.6885837 | 0.9044833 | 0.6079145 | 1.4999671 | 2023-08-19 11:26:21 |
| Neural Network | TensorFlow | Classification | Image | 0.5898416 | 0.7853953 | 0.7121121 | 0.6385674 | 4.6429065 | 2023-08-20 11:26:21 |
| K-Means | Keras | Regression | Text | 0.6187717 | 0.4389122 | 0.5627309 | 0.5585658 | 4.8526400 | 2023-08-21 11:26:21 |
| SVM | Keras | Regression | Time Series | 0.9856975 | NA | 0.5000485 | 0.5231998 | 2.8991763 | 2023-08-22 11:26:21 |
| SVM | TensorFlow | Clustering | Text | 0.5904885 | 0.7368908 | 0.4422562 | 0.6898238 | 0.8007055 | 2023-08-23 11:26:21 |
| Random Forest | Scikit-learn | Regression | Time Series | 0.9271925 | 0.7363961 | 0.8332587 | 0.5611203 | 1.9352864 | 2023-08-24 11:26:21 |
| K-Means | Keras | Clustering | Image | 0.7461389 | 0.7620926 | 0.5705784 | 0.5724770 | 4.0088881 | 2023-08-25 11:26:21 |
| Neural Network | TensorFlow | Regression | Text | 0.6236155 | 0.8058808 | 0.6578928 | 0.7940536 | 1.9002146 | 2023-08-26 11:26:21 |
| SVM | TensorFlow | Regression | Image | 0.9353750 | 0.8829934 | 0.6446278 | 0.9811224 | 0.5263844 | 2023-08-27 11:26:21 |
| K-Means | Scikit-learn | Regression | Time Series | 0.7226526 | 0.5618924 | 0.7040953 | 0.7621823 | 2.8282461 | 2023-08-28 11:26:21 |
| K-Means | PyTorch | Clustering | Tabular | 0.7574087 | 0.8950296 | 0.9059040 | 0.4461877 | 4.2410922 | 2023-08-29 11:26:21 |
| Random Forest | Keras | Clustering | Tabular | 0.6796167 | 0.6989534 | 0.9865175 | 0.4453502 | NA | 2023-08-30 11:26:21 |
| SVM | Scikit-learn | Classification | Text | 0.7964754 | NA | 0.5853089 | 0.9708539 | 0.9581034 | 2023-08-31 11:26:21 |
| SVM | TensorFlow | Clustering | Image | 0.5817619 | 0.4351306 | 0.8792632 | 0.5783745 | NA | 2023-09-01 11:26:21 |
| Neural Network | TensorFlow | Clustering | Image | 0.6955408 | 0.6005430 | 0.8351695 | 0.4552402 | 1.1804709 | 2023-09-02 11:26:21 |
| SVM | Scikit-learn | Clustering | Text | 0.9847062 | 0.8709382 | NA | 0.7594268 | 1.1692169 | 2023-09-03 11:26:21 |
| Neural Network | Keras | Clustering | Tabular | NA | 0.8246086 | 0.9692330 | 0.7741893 | 4.3829517 | 2023-09-04 11:26:21 |
| SVM | PyTorch | Classification | Time Series | 0.8283683 | 0.8731690 | 0.4403322 | 0.7891029 | 1.3233779 | 2023-09-05 11:26:21 |
| Random Forest | PyTorch | Clustering | Text | 0.6625950 | 0.7103614 | 0.3764849 | 0.5604412 | 1.3899120 | 2023-09-06 11:26:21 |
| SVM | TensorFlow | Regression | Time Series | 0.8867366 | 0.6641194 | 0.8977734 | 0.4090664 | 0.1032016 | 2023-09-07 11:26:21 |
| Neural Network | Keras | Regression | Tabular | 0.5654368 | 0.4884715 | 0.6074049 | 0.9790092 | 4.3662783 | 2023-09-08 11:26:21 |
| Neural Network | PyTorch | Clustering | Image | 0.9849105 | 0.5969157 | 0.8928782 | 0.5505358 | 3.9837146 | 2023-09-09 11:26:21 |
| Random Forest | Keras | Clustering | Tabular | NA | 0.6604116 | 0.9251631 | 0.8056158 | 3.1739118 | 2023-09-10 11:26:21 |
| SVM | Keras | Regression | Text | 0.6180252 | 0.4531603 | 0.3437203 | 0.8239779 | 3.7763022 | 2023-09-11 11:26:21 |
| SVM | TensorFlow | Regression | Image | NA | 0.5323672 | 0.9184253 | 0.7660045 | 0.8450380 | 2023-09-12 11:26:21 |
| Random Forest | PyTorch | Classification | Text | 0.5848790 | 0.7589352 | 0.6138233 | 0.5877444 | 2.3457856 | 2023-09-13 11:26:21 |
| SVM | Scikit-learn | Regression | Text | 0.7598870 | 0.8413979 | NA | 0.5626578 | 1.8209750 | 2023-09-14 11:26:21 |
| SVM | TensorFlow | Regression | Tabular | 0.6685016 | 0.9990085 | 0.7386148 | 0.7586010 | 0.5595050 | 2023-09-15 11:26:21 |
| Neural Network | PyTorch | Clustering | Text | 0.9144417 | 0.9598680 | 0.9484678 | NA | 2.4820394 | 2023-09-16 11:26:21 |
| SVM | PyTorch | Regression | Tabular | NA | 0.7855391 | 0.3133813 | 0.9680402 | 4.6116243 | 2023-09-17 11:26:21 |
| SVM | PyTorch | Classification | Image | 0.6243571 | 0.6527488 | 0.6337904 | 0.4635435 | 0.2962897 | 2023-09-18 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | 0.8085725 | 0.7817064 | 0.7814054 | 0.4928972 | 1.5281585 | 2023-09-19 11:26:21 |
| Random Forest | PyTorch | Clustering | Image | 0.8533886 | 0.8713910 | 0.8058949 | 0.9668418 | 1.1169506 | 2023-09-20 11:26:21 |
| K-Means | Keras | Clustering | Tabular | 0.5835210 | 0.4710017 | 0.7847727 | 0.8419211 | 1.2669186 | 2023-09-21 11:26:21 |
| Neural Network | TensorFlow | Regression | Time Series | 0.5838096 | 0.6459429 | 0.3941046 | 0.9297963 | 4.5513293 | 2023-09-22 11:26:21 |
| SVM | Keras | Clustering | Time Series | 0.5183357 | 0.9038814 | 0.5095769 | 0.5215796 | 2.3935384 | 2023-09-23 11:26:21 |
| SVM | Scikit-learn | Classification | Text | 0.8682010 | 0.6302998 | 0.5511009 | 0.7525515 | 2.3848671 | 2023-09-24 11:26:21 |
| K-Means | PyTorch | Classification | Tabular | 0.8319023 | 0.7431234 | 0.8631060 | 0.8206838 | 3.8268206 | 2023-09-25 11:26:21 |
| SVM | TensorFlow | Regression | Tabular | 0.7373154 | 0.7526616 | 0.4951319 | 0.8080671 | 0.8570118 | 2023-09-26 11:26:21 |
| Neural Network | PyTorch | Regression | Image | 0.9220852 | 0.5106858 | 0.4474935 | 0.6448910 | 2.4876002 | 2023-09-27 11:26:21 |
| K-Means | TensorFlow | Regression | Text | 0.9028351 | 0.6173413 | 0.9702136 | 0.4092369 | 2.2069275 | 2023-09-28 11:26:21 |
| Neural Network | Keras | Clustering | Tabular | 0.7926772 | 0.6007068 | 0.3062043 | 0.7497556 | 3.0248624 | 2023-09-29 11:26:21 |
| K-Means | TensorFlow | Classification | Text | 0.9341356 | 0.4157180 | 0.9984746 | 0.5518609 | 4.9978327 | 2023-09-30 11:26:21 |
| K-Means | Scikit-learn | Classification | Time Series | 0.6029206 | 4.1451506 | 7.7377491 | 0.6701525 | 3.8702996 | 2023-10-01 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Tabular | 0.5559598 | 0.8990182 | NA | 0.9745486 | 2.0495419 | 2023-10-02 11:26:21 |
| Neural Network | TensorFlow | Classification | Tabular | 0.6348748 | 0.5638425 | 0.5062336 | 0.6394212 | 4.1550100 | 2023-10-03 11:26:21 |
| SVM | TensorFlow | Regression | Text | 0.5285434 | 0.7108473 | 0.3100207 | 0.9038810 | 0.9364710 | 2023-10-04 11:26:21 |
| SVM | TensorFlow | Regression | Tabular | 0.7655848 | 0.5792353 | 0.8165087 | 5.1312436 | 0.2492721 | 2023-10-05 11:26:21 |
| Neural Network | TensorFlow | Regression | Text | 0.9683028 | 0.9644075 | 0.8839012 | 0.8034763 | 1.1017970 | 2023-10-06 11:26:21 |
| SVM | Scikit-learn | Classification | Tabular | 0.5196718 | 0.5555781 | 0.8183333 | 0.9862042 | 1.7680166 | 2023-10-07 11:26:21 |
| SVM | PyTorch | Regression | Image | 0.5610550 | 0.6577941 | 0.3999952 | 0.4611359 | NA | 2023-10-08 11:26:21 |
| Neural Network | TensorFlow | Clustering | Text | 0.7260995 | 0.9236382 | 0.8273995 | 0.4049920 | 3.1141328 | 2023-10-09 11:26:21 |
| K-Means | TensorFlow | Clustering | Tabular | 0.9669375 | 0.9051601 | 0.8382459 | 0.6601497 | 4.5619680 | 2023-10-10 11:26:21 |
| Neural Network | Scikit-learn | Classification | Time Series | 0.6580781 | 0.5116609 | 0.7609784 | 0.4555753 | 2.5982846 | 2023-10-11 11:26:21 |
| K-Means | Scikit-learn | Clustering | Tabular | 0.7536174 | 0.8815860 | 0.8362812 | 0.8490306 | 2.5562507 | 2023-10-12 11:26:21 |
| SVM | Keras | Clustering | Time Series | 0.5207864 | 0.6749121 | 0.8921450 | 0.9487292 | 0.3461907 | 2023-10-13 11:26:21 |
| SVM | TensorFlow | Classification | Time Series | NA | 0.6897813 | 0.7295229 | 0.6604126 | 0.2710666 | 2023-10-14 11:26:21 |
| SVM | Keras | Regression | Image | 0.9933151 | 0.4800880 | 0.3620233 | 0.5552270 | 2.8006838 | 2023-10-15 11:26:21 |
| Random Forest | Scikit-learn | Regression | Text | 0.9825593 | 0.4483609 | 0.6413395 | 0.6606420 | 2.2470752 | 2023-10-16 11:26:21 |
| K-Means | Scikit-learn | Regression | Tabular | NA | 0.8367636 | 0.3543545 | 0.8340687 | 4.2119839 | 2023-10-17 11:26:21 |
| Random Forest | PyTorch | Classification | Image | 0.9759059 | 0.6978767 | 0.5852801 | 0.4054325 | 0.8873300 | 2023-10-18 11:26:21 |
| Random Forest | TensorFlow | Clustering | Tabular | 0.8195600 | 0.6621104 | NA | 0.7536724 | 0.2223611 | 2023-10-19 11:26:21 |
| Neural Network | Keras | Regression | Time Series | 0.9339591 | 0.8377049 | 0.3462069 | 0.7679750 | 2.3002900 | 2023-10-20 11:26:21 |
| Random Forest | Scikit-learn | Regression | Text | 0.7273699 | 0.8593077 | 0.5441744 | 0.7826129 | 1.2636449 | 2023-10-21 11:26:21 |
| Neural Network | Scikit-learn | Classification | Tabular | 0.7577980 | 0.4953449 | 0.3776987 | 0.5452132 | 0.3431104 | 2023-10-22 11:26:21 |
| Neural Network | Keras | Regression | Tabular | 0.7444233 | 0.7661351 | 0.8657646 | 0.8284316 | 3.6505316 | 2023-10-23 11:26:21 |
| Random Forest | PyTorch | Classification | Time Series | 0.8334321 | 0.4812124 | 0.9633816 | NA | 0.6468057 | 2023-10-24 11:26:21 |
| K-Means | Scikit-learn | Clustering | Tabular | 0.5698256 | 0.8508251 | 0.3506215 | 0.5195622 | 3.0807656 | 2023-10-25 11:26:21 |
| K-Means | TensorFlow | Classification | Tabular | 0.5149868 | 0.7941731 | 0.9685806 | 0.9264820 | 1.4771438 | 2023-10-26 11:26:21 |
| K-Means | Scikit-learn | Classification | Time Series | 0.6539650 | 0.9739688 | 0.6658036 | 0.8432339 | 0.9501306 | 2023-10-27 11:26:21 |
| K-Means | TensorFlow | Classification | Image | 0.8523404 | 0.4413748 | 0.5096960 | 0.4082473 | 1.9607825 | 2023-10-28 11:26:21 |
| K-Means | Keras | Classification | Time Series | 0.6009267 | NA | 0.3538035 | 0.5490258 | 4.0270843 | 2023-10-29 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | 0.8367162 | 0.5693122 | 0.6504370 | 0.5286439 | 2.0212585 | 2023-10-30 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.9849560 | 0.5570234 | 0.8561609 | 0.5624838 | 3.7787713 | 2023-10-31 11:26:21 |
| SVM | TensorFlow | Regression | Image | NA | 0.5481873 | 0.7949605 | 0.5485359 | 0.7148829 | 2023-11-01 11:26:21 |
| K-Means | Keras | Clustering | Image | 0.8363011 | 0.9437527 | 0.3351582 | 0.4375158 | 3.8865800 | 2023-11-02 11:26:21 |
| Random Forest | TensorFlow | Clustering | Text | 0.7218751 | 0.5497277 | 0.3510313 | 0.6753643 | 1.2609733 | 2023-11-03 11:26:21 |
| SVM | PyTorch | Regression | Image | 0.9340711 | 0.5631698 | 0.5820113 | 0.8396401 | 3.4192263 | 2023-11-04 11:26:21 |
| K-Means | TensorFlow | Clustering | Time Series | 0.5885749 | 0.8556390 | 0.5067033 | 0.7640390 | 2.8723296 | 2023-11-05 11:26:21 |
| Neural Network | TensorFlow | Classification | Image | 0.8463130 | 0.6698439 | 0.4626690 | 0.8037230 | 4.6528091 | 2023-11-06 11:26:21 |
| SVM | PyTorch | Classification | Text | NA | 0.8660263 | 0.4967031 | 0.4486895 | 1.9978803 | 2023-11-07 11:26:21 |
| Random Forest | Scikit-learn | Regression | Time Series | 0.9723071 | 0.4392197 | 0.8624379 | 0.9708944 | 0.4242094 | 2023-11-08 11:26:21 |
| Neural Network | Scikit-learn | Regression | Text | 0.8416240 | 0.6925427 | 0.9504596 | 0.9030951 | 0.1938619 | 2023-11-09 11:26:21 |
| Neural Network | TensorFlow | Regression | Text | 0.7485874 | 0.4201682 | 0.5835719 | 0.8830542 | 4.1516069 | 2023-11-10 11:26:21 |
| Neural Network | PyTorch | Classification | Tabular | 0.8089236 | 0.4375919 | 0.9342777 | 0.8937903 | 2.6712046 | 2023-11-11 11:26:21 |
| SVM | TensorFlow | Clustering | Tabular | 0.9344525 | 0.9438625 | 0.5250470 | 0.9596263 | 3.8986959 | 2023-11-12 11:26:21 |
| Random Forest | Keras | Classification | Text | 0.7853049 | 0.4835472 | 0.6335059 | 0.7265524 | 1.2486854 | 2023-11-13 11:26:21 |
| Neural Network | Keras | Regression | Text | 0.5151935 | 0.7194524 | 0.4582203 | 0.5201692 | 1.7909104 | 2023-11-14 11:26:21 |
| K-Means | TensorFlow | Clustering | Text | 0.9654743 | NA | 0.7483332 | 0.7700702 | 0.2475090 | 2023-11-15 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Tabular | 0.8447634 | 0.6084060 | 0.9852868 | 0.8457289 | 4.8113124 | 2023-11-16 11:26:21 |
| Neural Network | Scikit-learn | Classification | Tabular | 0.8382567 | 0.9399000 | 0.7224452 | 0.8427504 | 3.3725527 | 2023-11-17 11:26:21 |
| SVM | TensorFlow | Regression | Image | 0.6078376 | 0.4130940 | 0.5504699 | 0.7128694 | 4.6727049 | 2023-11-18 11:26:21 |
| SVM | TensorFlow | Classification | Image | 8.2944274 | 0.7982738 | 0.7534722 | 0.4410752 | 1.4009324 | 2023-11-19 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.6969322 | 0.9780367 | 0.3860445 | 0.6226678 | 3.0961717 | 2023-11-20 11:26:21 |
| K-Means | Scikit-learn | Classification | Time Series | 0.8256165 | 0.7361009 | 0.9220614 | 0.9524600 | NA | 2023-11-21 11:26:21 |
| SVM | TensorFlow | Classification | Time Series | 0.5532965 | 0.9620935 | 0.6521588 | 0.7506693 | 1.6559935 | 2023-11-22 11:26:21 |
| Neural Network | Keras | Regression | Tabular | 0.8289227 | 0.4313547 | 0.6145448 | 0.7229992 | 4.2557353 | 2023-11-23 11:26:21 |
| Random Forest | Scikit-learn | Regression | Tabular | 0.9997069 | 0.6512760 | 0.7101054 | 0.5613121 | 4.7410912 | 2023-11-24 11:26:21 |
| Neural Network | PyTorch | Clustering | Tabular | 0.5241060 | 0.5560947 | 0.7373487 | 0.6210694 | 44.3579008 | 2023-11-25 11:26:21 |
| Neural Network | Keras | Classification | Text | 0.9885871 | 0.8384926 | 0.3502431 | 0.9372077 | 3.7214282 | 2023-11-26 11:26:21 |
| SVM | Scikit-learn | Classification | Time Series | 0.7034540 | 0.9887783 | 0.7778321 | 0.7998778 | 1.4595769 | 2023-11-27 11:26:21 |
| Random Forest | Scikit-learn | Classification | Image | 0.9353767 | 0.5539180 | 0.4693522 | 0.8724322 | 1.4799186 | 2023-11-28 11:26:21 |
| K-Means | Keras | Clustering | Text | 0.8911927 | 0.7925048 | 0.7997668 | 0.6726073 | 4.8204656 | 2023-11-29 11:26:21 |
| K-Means | PyTorch | Clustering | Image | 0.7835081 | 0.5188586 | 0.8757744 | 0.7781483 | 0.1503940 | 2023-11-30 11:26:21 |
| SVM | Keras | Regression | Image | 0.8692246 | 0.7391982 | 0.8627710 | 0.5490302 | 3.6086449 | 2023-12-01 11:26:21 |
| SVM | Scikit-learn | Regression | Text | 0.9392578 | 0.6783595 | NA | 0.8232765 | 3.5606062 | 2023-12-02 11:26:21 |
| Random Forest | TensorFlow | Regression | Image | 0.7020702 | 0.9832032 | 0.6641189 | 0.6565608 | 3.1511788 | 2023-12-03 11:26:21 |
| K-Means | TensorFlow | Classification | Time Series | 0.6635166 | 0.7651164 | 0.4000132 | 0.6655273 | 4.9515333 | 2023-12-04 11:26:21 |
| K-Means | Keras | Classification | Image | 0.8337967 | 0.6097038 | 0.8427423 | 0.7895934 | 1.6282092 | 2023-12-05 11:26:21 |
| K-Means | Scikit-learn | Regression | Time Series | 0.9039230 | 0.4684575 | 0.4899866 | 0.9617684 | 1.7658621 | 2023-12-06 11:26:21 |
| Neural Network | TensorFlow | Classification | Time Series | 0.8811426 | 0.4907481 | 0.6476868 | 0.4384038 | 0.4868031 | 2023-12-07 11:26:21 |
| K-Means | Scikit-learn | Regression | Time Series | 0.8989068 | 0.5351902 | 0.4989919 | 0.8948459 | 2.2697109 | 2023-12-08 11:26:21 |
| Neural Network | Keras | Clustering | Tabular | 0.7177917 | 0.5505800 | 0.3936799 | 0.5754299 | 13.7913778 | 2023-12-09 11:26:21 |
| Random Forest | TensorFlow | Regression | Image | 0.9089171 | 0.9103696 | 0.7406904 | 0.6663517 | 1.7825948 | 2023-12-10 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Text | 0.5601045 | 0.7367337 | 0.3380324 | 0.4131480 | 4.1894018 | 2023-12-11 11:26:21 |
| Random Forest | TensorFlow | Classification | Text | 0.7722445 | 0.7140345 | 0.8240517 | 0.5806276 | 46.8387412 | 2023-12-12 11:26:21 |
| K-Means | Keras | Regression | Image | NA | 0.4688613 | 0.5223108 | 0.7015787 | 1.0100921 | 2023-12-13 11:26:21 |
| K-Means | Scikit-learn | Classification | Tabular | 0.6622929 | 0.9160838 | 0.3000943 | 0.4337056 | 1.9252938 | 2023-12-14 11:26:21 |
| Random Forest | TensorFlow | Clustering | Tabular | 0.6832308 | 0.8336886 | 0.6577904 | 0.6946574 | 4.6502898 | 2023-12-15 11:26:21 |
| SVM | Keras | Clustering | Time Series | 0.6980863 | 0.4406010 | NA | 0.9562664 | 0.4028615 | 2023-12-16 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | NA | 0.8247011 | 0.4933187 | 0.4632359 | 0.5525666 | 2023-12-17 11:26:21 |
| SVM | TensorFlow | Classification | Image | 0.6942791 | 0.7261229 | 0.7948835 | 0.8586644 | 0.8976758 | 2023-12-18 11:26:21 |
| Neural Network | PyTorch | Classification | Text | 0.7243468 | 0.4490352 | 3.4388274 | 0.6458026 | 3.0165429 | 2023-12-19 11:26:21 |
| Neural Network | PyTorch | Classification | Image | NA | 0.6749804 | 0.8875369 | 0.7931042 | 0.8373141 | 2023-12-20 11:26:21 |
| Neural Network | Keras | Classification | Tabular | 0.6866259 | NA | 0.3026739 | 0.5561420 | 4.8435743 | 2023-12-21 11:26:21 |
| K-Means | PyTorch | Clustering | Image | 0.6136348 | 0.4994647 | 0.4727767 | 0.4956954 | 2.2889012 | 2023-12-22 11:26:21 |
| Neural Network | Keras | Clustering | Text | 0.5365980 | 9.6741889 | 0.8186328 | 0.4962776 | 2.5574380 | 2023-12-23 11:26:21 |
| SVM | PyTorch | Regression | Time Series | 0.8017243 | 0.9099852 | 0.5213891 | 0.4422952 | 1.3086272 | 2023-12-24 11:26:21 |
| Neural Network | TensorFlow | Regression | Text | 0.8341064 | 0.8014134 | 0.3713247 | 0.5113880 | 2.4077052 | 2023-12-25 11:26:21 |
| Random Forest | Keras | Classification | Text | 0.8097452 | NA | 0.5521637 | 0.7985316 | 3.3433768 | 2023-12-26 11:26:21 |
| Random Forest | Keras | Clustering | Time Series | 0.7317470 | 0.6470593 | 0.4892753 | 0.9290144 | 3.7807680 | 2023-12-27 11:26:21 |
| K-Means | TensorFlow | Clustering | Text | 0.6898929 | 0.7905841 | 0.8898983 | 0.8884754 | 3.7939536 | 2023-12-28 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.9316668 | 0.7272591 | 0.5193435 | NA | NA | 2023-12-29 11:26:21 |
| SVM | Keras | Regression | Image | 0.7595409 | 0.4373639 | 0.8522527 | 0.4662591 | 4.5313629 | 2023-12-30 11:26:21 |
| Neural Network | PyTorch | Regression | Image | 0.7395909 | 0.7075016 | 0.9243105 | 0.5735125 | NA | 2023-12-31 11:26:21 |
| K-Means | Keras | Clustering | Time Series | 0.5128210 | 0.8838422 | 0.6036722 | 0.5858841 | 3.8095358 | 2024-01-01 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Image | 0.6706239 | 0.6755439 | 0.9369602 | 5.4997417 | 0.3724671 | 2024-01-02 11:26:21 |
| Neural Network | TensorFlow | Clustering | Image | NA | 0.4311739 | 0.5641226 | 0.7090034 | 0.1337020 | 2024-01-03 11:26:21 |
| SVM | PyTorch | Regression | Text | 0.6994114 | 0.8717669 | 0.9748549 | 0.7213323 | 1.1409917 | 2024-01-04 11:26:21 |
| Random Forest | PyTorch | Clustering | Tabular | 7.9008618 | 0.5208183 | 0.3625027 | 0.6141316 | 3.3512656 | 2024-01-05 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Image | 0.7668013 | 0.5551725 | 0.7809135 | 0.6122735 | 2.1148677 | 2024-01-06 11:26:21 |
| Neural Network | TensorFlow | Classification | Tabular | 0.8039525 | 0.4988238 | 0.6456698 | 0.8971334 | 2.0717907 | 2024-01-07 11:26:21 |
| K-Means | TensorFlow | Classification | Image | 0.8824416 | 0.5981290 | 0.5713542 | 0.8735757 | 4.4344758 | 2024-01-08 11:26:21 |
| Random Forest | PyTorch | Regression | Image | 0.9064929 | 0.8540509 | 0.7428983 | 0.5846775 | 4.4882580 | 2024-01-09 11:26:21 |
| K-Means | Keras | Clustering | Time Series | 0.8590615 | 0.7116315 | 0.7927000 | 0.9482731 | 4.5549588 | 2024-01-10 11:26:21 |
| Random Forest | PyTorch | Clustering | Time Series | 0.9777618 | NA | 0.3030543 | 0.9716890 | 1.6380026 | 2024-01-11 11:26:21 |
| K-Means | TensorFlow | Classification | Time Series | 0.5091163 | 0.9266980 | 0.4168672 | 0.5960455 | 3.4861735 | 2024-01-12 11:26:21 |
| SVM | Scikit-learn | Regression | Tabular | 5.9788899 | 0.9277491 | 0.7991322 | 0.6126550 | 1.4310026 | 2024-01-13 11:26:21 |
| K-Means | PyTorch | Classification | Text | 0.5037814 | 0.9223471 | 0.7664698 | 0.7033805 | 1.0339882 | 2024-01-14 11:26:21 |
| SVM | TensorFlow | Classification | Image | 0.8237374 | 5.4327773 | 0.9762333 | 0.9646725 | 1.0046990 | 2024-01-15 11:26:21 |
| SVM | PyTorch | Classification | Image | 9.4901527 | 0.6707436 | 0.8327265 | 0.9257917 | NA | 2024-01-16 11:26:21 |
| K-Means | Keras | Classification | Image | NA | 0.9909938 | 0.9655409 | 0.4615408 | 2.2067923 | 2024-01-17 11:26:21 |
| SVM | PyTorch | Clustering | Tabular | 0.9635173 | 0.8632075 | 0.7917784 | 0.6356384 | 4.1716509 | 2024-01-18 11:26:21 |
| Neural Network | PyTorch | Clustering | Time Series | 0.5301337 | 0.4163005 | 0.5086365 | 0.7320227 | 0.6874664 | 2024-01-19 11:26:21 |
| SVM | TensorFlow | Classification | Text | 0.9672180 | 0.4391228 | 0.3737554 | 0.7018795 | 3.7011367 | 2024-01-20 11:26:21 |
| Random Forest | TensorFlow | Regression | Tabular | 0.6758113 | 0.6783588 | 0.8472767 | 0.5163178 | 2.7041291 | 2024-01-21 11:26:21 |
| K-Means | TensorFlow | Regression | Time Series | 0.5507104 | 0.9455321 | 0.7509045 | 0.9152901 | 1.5109322 | 2024-01-22 11:26:21 |
| Neural Network | PyTorch | Regression | Tabular | 0.7429359 | 0.7232211 | 0.3337302 | 0.8061645 | 2.5148160 | 2024-01-23 11:26:21 |
| K-Means | Keras | Regression | Text | 0.6283883 | 0.6986875 | 0.5520353 | 0.9027450 | 1.5697161 | 2024-01-24 11:26:21 |
| Random Forest | Scikit-learn | Classification | Time Series | 0.6424365 | 0.4632842 | 0.9697599 | 0.9152586 | 3.0208016 | 2024-01-25 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Tabular | 0.6536450 | 0.7940681 | 0.6502808 | NA | 2.2258352 | 2024-01-26 11:26:21 |
| Random Forest | TensorFlow | Regression | Text | 0.9015129 | 8.9326190 | 0.6029104 | 0.6635264 | 0.9055871 | 2024-01-27 11:26:21 |
| SVM | TensorFlow | Classification | Image | 0.7695806 | 0.6282520 | 0.6203897 | 0.7663281 | 0.6712571 | 2024-01-28 11:26:21 |
| SVM | PyTorch | Regression | Text | 0.6556538 | 0.8653671 | 0.4462179 | 0.4962192 | 2.7788088 | 2024-01-29 11:26:21 |
| K-Means | Scikit-learn | Regression | Image | 0.8051669 | 0.9786860 | 0.5580950 | 0.8041873 | 4.5218239 | 2024-01-30 11:26:21 |
| K-Means | TensorFlow | Classification | Tabular | 0.8580753 | 0.5222599 | 0.5588693 | 0.5075520 | 1.7853927 | 2024-01-31 11:26:21 |
| Neural Network | Keras | Clustering | Time Series | 0.6363120 | 0.7139978 | 0.3366485 | 0.8163698 | 3.6917005 | 2024-02-01 11:26:21 |
| Neural Network | Keras | Clustering | Time Series | 0.7067746 | 0.5722828 | 0.8372973 | 0.5377589 | 3.3274406 | 2024-02-02 11:26:21 |
| K-Means | TensorFlow | Classification | Tabular | 0.5609430 | 0.8757127 | 0.5915531 | 0.4705308 | 4.6646191 | 2024-02-03 11:26:21 |
| SVM | TensorFlow | Clustering | Time Series | 0.5905747 | 0.7465560 | 0.8755259 | 0.4991707 | 4.1228070 | 2024-02-04 11:26:21 |
| Random Forest | Scikit-learn | Classification | Time Series | NA | 0.7807495 | 0.8952436 | 0.4011953 | 2.8763658 | 2024-02-05 11:26:21 |
| K-Means | Scikit-learn | Classification | Image | 0.5907192 | 0.8787485 | 0.4483958 | 0.8312438 | 3.3194494 | 2024-02-06 11:26:21 |
| Neural Network | PyTorch | Classification | Tabular | 0.7625817 | 0.6375823 | 0.7601474 | 0.8394406 | 4.5020937 | 2024-02-07 11:26:21 |
| SVM | Scikit-learn | Clustering | Text | 0.8545231 | 0.9490540 | 0.6305973 | 0.7089600 | 2.0576420 | 2024-02-08 11:26:21 |
| K-Means | Keras | Classification | Image | NA | 0.7198173 | 0.9161097 | 0.4968177 | 1.7012941 | 2024-02-09 11:26:21 |
| K-Means | Keras | Regression | Tabular | 0.7836561 | 0.4947729 | 0.4510185 | 0.4501326 | 0.1529743 | 2024-02-10 11:26:21 |
| SVM | TensorFlow | Clustering | Image | 0.6282814 | 0.8175395 | 0.7744692 | 0.4114774 | 4.1501003 | 2024-02-11 11:26:21 |
| K-Means | Scikit-learn | Clustering | Text | 0.9814634 | 0.8759568 | 0.7254265 | 0.4994489 | 4.0248054 | 2024-02-12 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.7417728 | NA | 0.5067110 | 0.9345133 | 0.6118271 | 2024-02-13 11:26:21 |
| Random Forest | Keras | Clustering | Tabular | 0.9029963 | 0.9143076 | 0.3956206 | 0.5449221 | 2.9266012 | 2024-02-14 11:26:21 |
| SVM | Keras | Regression | Text | 0.7751133 | 0.9436860 | 0.7561478 | 0.6125628 | 2.3735103 | 2024-02-15 11:26:21 |
| SVM | PyTorch | Clustering | Image | 0.5217063 | 0.5661427 | 0.8170182 | 4.6320729 | 0.6816746 | 2024-02-16 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.8165757 | 0.9901129 | 0.5209391 | 0.5334168 | 4.9047843 | 2024-02-17 11:26:21 |
| K-Means | Keras | Classification | Image | 0.9757017 | 0.4844269 | 0.7513828 | 0.7115347 | 1.1520098 | 2024-02-18 11:26:21 |
| K-Means | Keras | Clustering | Image | 0.8008059 | 0.5212094 | 5.7659164 | 0.7646516 | 0.4294475 | 2024-02-19 11:26:21 |
| SVM | PyTorch | Regression | Text | 0.9095944 | 0.5105349 | 0.7992448 | 0.5472114 | 3.0147522 | 2024-02-20 11:26:21 |
| K-Means | Keras | Regression | Tabular | 0.9421032 | 0.9363938 | 0.4394507 | 0.4346393 | 3.7204467 | 2024-02-21 11:26:21 |
| Neural Network | TensorFlow | Regression | Time Series | 0.6140399 | 0.7925755 | 0.9231505 | 0.6346197 | 0.2589729 | 2024-02-22 11:26:21 |
| Neural Network | TensorFlow | Regression | Text | 0.6060224 | 0.4912626 | 0.5011867 | 0.5405220 | 3.3277164 | 2024-02-23 11:26:21 |
| K-Means | PyTorch | Clustering | Image | 0.8054905 | 0.6641941 | 0.5574502 | 0.5317324 | 2.7082302 | 2024-02-24 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | 0.7055142 | 0.7691788 | 0.3406644 | 0.9759177 | 0.6059985 | 2024-02-25 11:26:21 |
| SVM | PyTorch | Regression | Text | 0.9199307 | 0.4500785 | 0.3780585 | 0.7697803 | 0.9453670 | 2024-02-26 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | 0.9500116 | 0.9294498 | 0.6611033 | 0.7341271 | 2.8830741 | 2024-02-27 11:26:21 |
| K-Means | Scikit-learn | Clustering | Tabular | 0.6767107 | 0.8821621 | 0.4873149 | NA | 1.5521523 | 2024-02-28 11:26:21 |
| Neural Network | Scikit-learn | Regression | Image | 0.6184353 | 0.7031241 | 0.8848362 | 0.6573666 | 4.6977471 | 2024-02-29 11:26:21 |
| SVM | Scikit-learn | Classification | Image | 0.8902628 | 0.9802760 | 0.3102854 | 0.7245430 | 4.1118271 | 2024-03-01 11:26:21 |
| K-Means | TensorFlow | Regression | Tabular | 0.6374030 | 0.6506566 | 0.5653658 | 8.1785788 | 4.9194793 | 2024-03-02 11:26:21 |
| Neural Network | Keras | Regression | Time Series | 0.9113072 | 0.9904662 | 0.5361419 | 0.8212877 | 1.3723871 | 2024-03-03 11:26:21 |
| Neural Network | TensorFlow | Classification | Text | 0.7118691 | 0.8007520 | 0.3135293 | 0.5030163 | 4.8516458 | 2024-03-04 11:26:21 |
| Random Forest | Keras | Classification | Image | 0.8337749 | 0.7808028 | 0.3870579 | 0.7000677 | 2.2130460 | 2024-03-05 11:26:21 |
| SVM | TensorFlow | Clustering | Image | 0.5477677 | 0.4995729 | 0.5895499 | 0.6471749 | 1.8028421 | 2024-03-06 11:26:21 |
| SVM | Keras | Clustering | Tabular | 0.8119297 | 0.9291567 | 0.6450052 | 0.9223162 | 0.3467127 | 2024-03-07 11:26:21 |
| Random Forest | TensorFlow | Classification | Text | NA | 0.6564938 | 0.5830028 | 0.7788514 | 0.3585498 | 2024-03-08 11:26:21 |
| Random Forest | Scikit-learn | Classification | Tabular | 0.7933042 | 0.4973400 | 0.6716564 | 0.7195024 | 3.4908552 | 2024-03-09 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.5840071 | 4.0756451 | 0.7165922 | 0.4692368 | 2.3438521 | 2024-03-10 11:26:21 |
| SVM | TensorFlow | Regression | Text | 0.8684369 | 0.7358534 | 0.3069467 | 0.7633827 | 1.2099560 | 2024-03-11 11:26:21 |
| K-Means | Keras | Classification | Text | 0.9313985 | 0.7164398 | 0.6248666 | 0.4703223 | 3.1084545 | 2024-03-12 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | 0.6083699 | 0.8316122 | 0.9744495 | 0.6023239 | 1.3390048 | 2024-03-13 11:26:21 |
| Random Forest | TensorFlow | Regression | Time Series | 0.5478573 | 0.9341548 | 0.6633226 | 0.4857052 | 2.9303956 | 2024-03-14 11:26:21 |
| K-Means | TensorFlow | Classification | Tabular | 0.5118193 | 0.4476440 | 0.7742796 | 0.8152442 | 1.8601229 | 2024-03-15 11:26:21 |
| Neural Network | Scikit-learn | Regression | Time Series | 0.8209858 | 0.8388979 | 0.5183047 | 0.5237513 | 4.1354059 | 2024-03-16 11:26:21 |
| K-Means | TensorFlow | Clustering | Tabular | 0.8035470 | 0.5124472 | 0.8417940 | 0.6351156 | 4.1214112 | 2024-03-17 11:26:21 |
| K-Means | PyTorch | Classification | Text | 0.7733487 | 0.9149062 | 0.8410450 | 9.3740487 | 2.4390170 | 2024-03-18 11:26:21 |
| Neural Network | TensorFlow | Regression | Image | 0.6159735 | 0.8914381 | 0.6649072 | 0.5225898 | 1.8190136 | 2024-03-19 11:26:21 |
| Random Forest | PyTorch | Classification | Time Series | 0.6954530 | 0.7244763 | 0.9832097 | 0.7046830 | 1.8765430 | 2024-03-20 11:26:21 |
| Neural Network | PyTorch | Regression | Image | 0.7972382 | 0.8261457 | 0.3878852 | 0.6515630 | 4.0480012 | 2024-03-21 11:26:21 |
| K-Means | Scikit-learn | Clustering | Tabular | 0.7483834 | 0.5886101 | 0.3118634 | 0.4108744 | 1.7080795 | 2024-03-22 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | 0.9938928 | 0.6827007 | 0.8391107 | 0.8756549 | 1.1243909 | 2024-03-23 11:26:21 |
| K-Means | Scikit-learn | Classification | Image | 0.5682199 | 0.8929821 | 0.8650089 | 0.4414275 | 0.5149729 | 2024-03-24 11:26:21 |
| Neural Network | TensorFlow | Regression | Time Series | NA | 0.6755591 | 0.3841451 | 0.6845530 | 2.3867963 | 2024-03-25 11:26:21 |
| Neural Network | TensorFlow | Classification | Text | 0.7021594 | 0.6146790 | 4.8590798 | 0.7364070 | 2.4624473 | 2024-03-26 11:26:21 |
| K-Means | Keras | Classification | Image | 0.7140998 | 0.6965275 | 0.3122867 | 0.7770559 | 4.2204527 | 2024-03-27 11:26:21 |
| SVM | TensorFlow | Regression | Text | 0.8587989 | 0.8969496 | 0.5053160 | 0.8131790 | 1.1840696 | 2024-03-28 11:26:21 |
| SVM | Scikit-learn | Regression | Tabular | 0.8462181 | 0.6011248 | 0.8411993 | 0.5519634 | 1.9668572 | 2024-03-29 11:26:21 |
| Neural Network | PyTorch | Regression | Image | 0.9956280 | 0.5042570 | 0.6625724 | 0.4059873 | 4.0611881 | 2024-03-30 11:26:21 |
| Neural Network | TensorFlow | Clustering | Time Series | 0.5641971 | 0.8272084 | 5.4366692 | 0.8340662 | 4.1357557 | 2024-03-31 11:26:21 |
| SVM | Keras | Classification | Tabular | 0.5520548 | 0.8955869 | 0.5602175 | 0.7213940 | 1.9845943 | 2024-04-01 11:26:21 |
| SVM | Scikit-learn | Classification | Tabular | 0.8621694 | 0.4603825 | 0.3009475 | NA | 2.3497077 | 2024-04-02 11:26:21 |
| K-Means | TensorFlow | Clustering | Time Series | 0.7891935 | 0.5439245 | 0.5098604 | 0.8928330 | 1.5861009 | 2024-04-03 11:26:21 |
| K-Means | Scikit-learn | Regression | Image | 0.6370803 | 0.4851832 | 0.7525211 | NA | NA | 2024-04-04 11:26:21 |
| SVM | Keras | Clustering | Image | 0.5397097 | 0.6087648 | 0.9819361 | 0.6910568 | 0.6813545 | 2024-04-05 11:26:21 |
| K-Means | Keras | Clustering | Tabular | 0.5428291 | 0.6702106 | 0.8929426 | 0.6001770 | 4.6753722 | 2024-04-06 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | 0.9470954 | 0.8492958 | 0.3165165 | 0.8749349 | NA | 2024-04-07 11:26:21 |
| Random Forest | TensorFlow | Classification | Text | 0.5959337 | 0.7906886 | 0.9289924 | 0.6707765 | 2.7047035 | 2024-04-08 11:26:21 |
| K-Means | PyTorch | Clustering | Time Series | 0.6616858 | 0.7725571 | 0.8482389 | 0.5100653 | 0.2600294 | 2024-04-09 11:26:21 |
| Neural Network | Keras | Classification | Text | 0.6133282 | 0.6114250 | 0.8462633 | 0.9129844 | 2.4304213 | 2024-04-10 11:26:21 |
| K-Means | TensorFlow | Classification | Image | 0.6774982 | 0.9048685 | 0.6205970 | 9.2953593 | 2.0979657 | 2024-04-11 11:26:21 |
| K-Means | TensorFlow | Classification | Tabular | 0.5347119 | 0.6827723 | 0.5786037 | 0.6797859 | 0.8899722 | 2024-04-12 11:26:21 |
| SVM | PyTorch | Regression | Time Series | 0.7595299 | 0.9874630 | 0.5120410 | 0.4454198 | 3.3164071 | 2024-04-13 11:26:21 |
| Neural Network | Keras | Classification | Image | 0.5338063 | 0.7804853 | 0.3459864 | 0.6326957 | 4.8586394 | 2024-04-14 11:26:21 |
| K-Means | Keras | Classification | Image | 0.9001783 | 0.4757588 | 0.4597648 | NA | 2.8547252 | 2024-04-15 11:26:21 |
| K-Means | TensorFlow | Regression | Tabular | 0.6168560 | 0.8057065 | 0.4726225 | 0.9410644 | 3.6022911 | 2024-04-16 11:26:21 |
| Random Forest | TensorFlow | Classification | Tabular | 0.7700060 | 0.5950624 | 0.6388539 | 0.5220836 | 4.3540722 | 2024-04-17 11:26:21 |
| K-Means | PyTorch | Clustering | Text | 0.9400395 | 0.8117963 | 0.8232111 | 0.4401842 | 2.1465683 | 2024-04-18 11:26:21 |
| K-Means | PyTorch | Clustering | Time Series | 0.8254387 | 0.4417847 | 0.6316669 | 0.9264103 | 0.6701566 | 2024-04-19 11:26:21 |
| Random Forest | Keras | Classification | Time Series | 0.7664789 | NA | 0.3404911 | 0.6336430 | 3.1026996 | 2024-04-20 11:26:21 |
| SVM | Scikit-learn | Regression | Tabular | 0.6621669 | 0.9134424 | 0.9704529 | 0.7250566 | 46.9856258 | 2024-04-21 11:26:21 |
| K-Means | Keras | Clustering | Text | 0.6665010 | 0.5363077 | 0.9599070 | 0.9808395 | 3.3427109 | 2024-04-22 11:26:21 |
| Random Forest | TensorFlow | Regression | Time Series | 0.8347435 | 0.9022247 | 0.8496327 | 0.4399388 | 0.4763349 | 2024-04-23 11:26:21 |
| Neural Network | Keras | Regression | Tabular | 0.9970697 | 0.5675657 | 0.9939295 | 0.7889908 | 1.8378644 | 2024-04-24 11:26:21 |
| SVM | Scikit-learn | Clustering | Image | NA | 0.7857291 | 0.6811376 | 0.4444633 | 2.7982506 | 2024-04-25 11:26:21 |
| Neural Network | PyTorch | Regression | Time Series | 0.7788917 | 0.8164903 | 0.9739378 | 0.6252814 | 2.0757180 | 2024-04-26 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Text | 0.8653253 | 0.7075929 | 0.3529235 | 0.8822887 | 4.1848557 | 2024-04-27 11:26:21 |
| Random Forest | Scikit-learn | Regression | Text | 0.7326028 | 0.5831864 | 0.5559765 | 0.6600833 | 4.0918879 | 2024-04-28 11:26:21 |
| K-Means | Scikit-learn | Clustering | Text | 0.5300712 | NA | 0.4577669 | 0.9983094 | 3.0979720 | 2024-04-29 11:26:21 |
| Random Forest | PyTorch | Classification | Image | 0.7811484 | 0.4199136 | 0.4370832 | 0.7354356 | 1.9299439 | 2024-04-30 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Time Series | 0.9788126 | 0.5823678 | 0.3985621 | 0.5926973 | 1.3511482 | 2024-05-01 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Time Series | 0.5876515 | 0.7918977 | 0.7356895 | 5.3206680 | 0.6187842 | 2024-05-02 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Image | NA | 0.9629829 | 0.8469253 | 0.6103123 | 1.8389986 | 2024-05-03 11:26:21 |
| SVM | TensorFlow | Clustering | Time Series | 0.6004668 | 0.9227227 | 0.7048089 | 0.6235200 | 2.1245453 | 2024-05-04 11:26:21 |
| K-Means | Scikit-learn | Regression | Image | 0.7679138 | 0.8596389 | 0.4028739 | 0.4412282 | 3.4115323 | 2024-05-05 11:26:21 |
| Neural Network | PyTorch | Classification | Image | 0.5483382 | 0.8730684 | 0.8677866 | 0.6217445 | 3.3249831 | 2024-05-06 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Image | NA | 0.7989909 | 0.7451683 | 0.6785431 | 0.4435299 | 2024-05-07 11:26:21 |
| K-Means | Scikit-learn | Regression | Text | 0.8780817 | 0.5561721 | 0.5717160 | NA | 2.0367706 | 2024-05-08 11:26:21 |
| Neural Network | PyTorch | Classification | Time Series | 0.6737858 | 0.9443170 | 0.7718912 | 0.7940377 | 0.9907164 | 2024-05-09 11:26:21 |
| K-Means | Scikit-learn | Classification | Image | 0.8324559 | 0.8024394 | 0.4819335 | 0.8252594 | 0.8683823 | 2024-05-10 11:26:21 |
| Neural Network | PyTorch | Classification | Time Series | 0.8977250 | 0.7362644 | 0.5416345 | 0.4050182 | 4.1537805 | 2024-05-11 11:26:21 |
| Random Forest | Scikit-learn | Classification | Text | 0.9635889 | 0.4665937 | 0.9422401 | 0.5306004 | 0.3053846 | 2024-05-12 11:26:21 |
| Neural Network | PyTorch | Clustering | Image | 0.6173210 | 0.6682333 | 0.5033532 | 0.7970106 | 2.1539961 | 2024-05-13 11:26:21 |
| K-Means | Scikit-learn | Clustering | Tabular | 0.6996580 | 0.6762150 | 0.6289918 | 0.6903936 | 0.9241979 | 2024-05-14 11:26:21 |
| K-Means | Scikit-learn | Regression | Text | 0.5762080 | 0.9187382 | 0.9230988 | 0.4031881 | 4.4087181 | 2024-05-15 11:26:21 |
| K-Means | TensorFlow | Regression | Tabular | 0.9962418 | 0.7279889 | 0.7954049 | 0.8826968 | 28.2949852 | 2024-05-16 11:26:21 |
| K-Means | TensorFlow | Regression | Time Series | 0.9635005 | 0.6282403 | 0.3435423 | 0.8636857 | 1.2337651 | 2024-05-17 11:26:21 |
| K-Means | Scikit-learn | Classification | Time Series | 0.7699786 | 0.9860802 | 0.4031121 | 0.7290515 | 2.5618937 | 2024-05-18 11:26:21 |
| SVM | Scikit-learn | Classification | Text | 0.9210166 | NA | 0.3054891 | 0.4398780 | 3.6842832 | 2024-05-19 11:26:21 |
| SVM | Keras | Clustering | Time Series | 0.7604790 | 0.6535291 | 0.7416453 | 0.8599090 | 4.7947782 | 2024-05-20 11:26:21 |
| Neural Network | PyTorch | Classification | Tabular | NA | 0.4252148 | 0.6133714 | 0.7502282 | 1.1801854 | 2024-05-21 11:26:21 |
| K-Means | Keras | Classification | Image | 0.5445622 | 0.8439425 | 0.3939924 | 0.8691760 | 4.4422907 | 2024-05-22 11:26:21 |
| Neural Network | TensorFlow | Classification | Image | 0.8776352 | 0.9508459 | 0.9705539 | 0.8510482 | 4.6780902 | 2024-05-23 11:26:21 |
| K-Means | TensorFlow | Regression | Image | 0.5638567 | NA | 0.6707618 | NA | 4.5904607 | 2024-05-24 11:26:21 |
| K-Means | TensorFlow | Regression | Text | 0.9130338 | 0.9150050 | 0.4693253 | 0.7108048 | 3.2135303 | 2024-05-25 11:26:21 |
| SVM | TensorFlow | Clustering | Time Series | 0.8910140 | 0.5753309 | 0.6504225 | 0.4841947 | 3.1861212 | 2024-05-26 11:26:21 |
| SVM | Scikit-learn | Classification | Tabular | 0.8543723 | 0.9464621 | 0.7757342 | 0.8026945 | 2.0757321 | 2024-05-27 11:26:21 |
| Random Forest | TensorFlow | Classification | Time Series | 0.5180802 | 0.8523771 | 0.3533675 | 0.7722843 | 3.7867496 | 2024-05-28 11:26:21 |
| SVM | Scikit-learn | Clustering | Time Series | 0.6515642 | 0.8829441 | 0.4922927 | 0.8453519 | 2.7033407 | 2024-05-29 11:26:21 |
| Random Forest | Scikit-learn | Classification | Time Series | 0.6315563 | 0.4108039 | 0.8648758 | 0.5019286 | 3.4194465 | 2024-05-30 11:26:21 |
| Random Forest | TensorFlow | Classification | Image | 0.6800682 | 0.9776862 | 0.6217654 | 0.5169851 | 2.1994437 | 2024-05-31 11:26:21 |
| Random Forest | TensorFlow | Regression | Tabular | 0.5438214 | 0.8360026 | 0.6826040 | 0.9342457 | 3.6843327 | 2024-06-01 11:26:21 |
| Random Forest | PyTorch | Clustering | Tabular | 0.9684789 | 0.5828507 | 0.6029712 | NA | 4.1399104 | 2024-06-02 11:26:21 |
| Random Forest | Keras | Clustering | Time Series | 0.7769011 | 0.8976368 | 0.3307300 | 0.9449371 | 0.8175377 | 2024-06-03 11:26:21 |
| Neural Network | TensorFlow | Clustering | Image | 0.6527622 | 0.5689127 | 0.4160248 | 0.8552292 | 4.1813556 | 2024-06-04 11:26:21 |
| SVM | PyTorch | Clustering | Image | 0.6984908 | 0.9236523 | 0.6118791 | 0.7582932 | 2.7474530 | 2024-06-05 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | 0.7236013 | 0.4675482 | 0.4464298 | 0.7924796 | 4.2388630 | 2024-06-06 11:26:21 |
| K-Means | Keras | Regression | Image | 0.8002972 | 0.8222116 | 0.3349853 | 0.9334768 | 2.2133784 | 2024-06-07 11:26:21 |
| SVM | Scikit-learn | Classification | Image | 0.7578397 | 0.7244191 | 0.8905548 | 0.7472841 | 1.9572984 | 2024-06-08 11:26:21 |
| SVM | TensorFlow | Classification | Image | 0.9596960 | 0.4579207 | 0.9868348 | 0.7794348 | 4.5837976 | 2024-06-09 11:26:21 |
| Random Forest | Keras | Classification | Text | 0.7484817 | 0.5451363 | 0.8552000 | 0.4938860 | 1.3307294 | 2024-06-10 11:26:21 |
| Neural Network | TensorFlow | Classification | Time Series | 0.9960790 | 0.4074424 | 0.8976494 | 0.6843527 | 4.2366404 | 2024-06-11 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | 0.9257125 | 0.6812608 | 0.4693840 | 0.8298383 | 2.4717008 | 2024-06-12 11:26:21 |
| Neural Network | PyTorch | Clustering | Tabular | 0.6042553 | 0.5807592 | 0.9724389 | 0.5625656 | 2.6190677 | 2024-06-13 11:26:21 |
| K-Means | Scikit-learn | Regression | Text | 0.9652976 | 0.7590145 | 0.4378480 | 0.5213525 | 1.6114946 | 2024-06-14 11:26:21 |
| SVM | PyTorch | Classification | Time Series | 0.5581832 | 0.5783427 | 0.9660009 | 0.5882961 | 2.9079068 | 2024-06-15 11:26:21 |
| K-Means | PyTorch | Regression | Tabular | 0.9087249 | 0.5799515 | 0.9963735 | 0.5449004 | 1.6935349 | 2024-06-16 11:26:21 |
| Neural Network | Keras | Clustering | Image | 0.6903116 | 0.8459159 | 0.7982060 | 0.5289476 | 0.2924201 | 2024-06-17 11:26:21 |
| Neural Network | TensorFlow | Clustering | Tabular | 0.9389872 | 0.4288857 | 0.9868006 | 0.6549105 | 1.4327932 | 2024-06-18 11:26:21 |
| K-Means | TensorFlow | Regression | Image | 0.9340283 | 0.9417370 | 0.6986778 | 0.9447031 | 0.1312573 | 2024-06-19 11:26:21 |
| Neural Network | Keras | Regression | Text | NA | 0.9113583 | 0.4816792 | 0.7042358 | 4.8939385 | 2024-06-20 11:26:21 |
| K-Means | Scikit-learn | Regression | Image | 0.8950152 | 0.8006828 | 0.6058971 | 0.5127522 | 4.8321613 | 2024-06-21 11:26:21 |
| SVM | TensorFlow | Classification | Tabular | 0.6523396 | 0.7559329 | 0.7154927 | 0.4461830 | 2.0362610 | 2024-06-22 11:26:21 |
| K-Means | PyTorch | Regression | Tabular | 0.5404596 | 0.9353815 | 0.3511571 | 0.8176937 | 3.6690172 | 2024-06-23 11:26:21 |
| SVM | TensorFlow | Regression | Tabular | 0.7014901 | 0.5111981 | 0.7356403 | 0.6296793 | 1.7944499 | 2024-06-24 11:26:21 |
| K-Means | PyTorch | Regression | Text | 0.5867623 | 0.4473815 | 0.9868245 | 0.8930986 | 3.3883565 | 2024-06-25 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Text | 0.8474755 | 0.5437061 | 0.4330754 | 0.7957065 | 4.0466072 | 2024-06-26 11:26:21 |
| K-Means | Keras | Clustering | Time Series | 0.6730499 | 0.8767470 | 0.8548166 | 0.8777449 | 4.7391056 | 2024-06-27 11:26:21 |
| Neural Network | TensorFlow | Classification | Tabular | 0.9878051 | 0.4208022 | 0.9355292 | 0.5631676 | 2.0610402 | 2024-06-28 11:26:21 |
| K-Means | Keras | Classification | Image | 0.8204860 | 0.7496841 | 0.9605911 | 0.8154154 | 3.9381601 | 2024-06-29 11:26:21 |
| K-Means | TensorFlow | Clustering | Text | 0.9112403 | 0.9972625 | 0.9720949 | 0.5584372 | 1.4030494 | 2024-06-30 11:26:21 |
| Random Forest | PyTorch | Clustering | Tabular | 0.5662623 | 0.9134177 | 0.6650217 | 0.9634411 | NA | 2024-07-01 11:26:21 |
| Neural Network | Keras | Clustering | Tabular | 0.9310072 | NA | 0.9841155 | 0.7818246 | NA | 2024-07-02 11:26:21 |
| Random Forest | TensorFlow | Regression | Text | 0.9613786 | 0.4381845 | 0.8301171 | 0.5947071 | 3.0572912 | 2024-07-03 11:26:21 |
| Neural Network | PyTorch | Regression | Time Series | 0.7435310 | 0.8988241 | 0.4131700 | 0.5617071 | 3.3315793 | 2024-07-04 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.8031265 | 0.7593871 | 0.6338301 | 0.5145560 | 3.4714232 | 2024-07-05 11:26:21 |
| SVM | Keras | Regression | Text | 0.8824049 | 0.4689598 | 0.8028318 | 0.8167846 | 0.6898991 | 2024-07-06 11:26:21 |
| K-Means | TensorFlow | Regression | Image | 0.5874193 | 0.4563144 | 0.4731382 | 0.5312294 | 4.6987959 | 2024-07-07 11:26:21 |
| Neural Network | TensorFlow | Classification | Image | 0.7512830 | 0.9457761 | 0.7484306 | 0.7571820 | 0.9877981 | 2024-07-08 11:26:21 |
| Neural Network | TensorFlow | Clustering | Time Series | 0.6993315 | 0.8015202 | 0.7666203 | 0.5587810 | 3.1514917 | 2024-07-09 11:26:21 |
| K-Means | PyTorch | Clustering | Tabular | 0.5731870 | 0.8975721 | 0.4138898 | 0.7971814 | 1.1911128 | 2024-07-10 11:26:21 |
| Neural Network | TensorFlow | Regression | Text | 0.6837672 | 0.9273873 | 0.6955597 | 0.8889638 | 1.6053604 | 2024-07-11 11:26:21 |
| Neural Network | Keras | Clustering | Time Series | 0.5340862 | 0.7430634 | 0.8401388 | 0.8668151 | 2.7776469 | 2024-07-12 11:26:21 |
| Random Forest | Keras | Classification | Image | 0.5129060 | 0.7104678 | NA | 0.8565108 | 2.1442478 | 2024-07-13 11:26:21 |
| Neural Network | TensorFlow | Classification | Text | 0.5675831 | 0.6582564 | 0.3084722 | 0.5126334 | 0.8851054 | 2024-07-14 11:26:21 |
| SVM | TensorFlow | Classification | Tabular | 0.9815576 | 0.5901680 | 0.3063269 | 0.4530310 | 0.9375884 | 2024-07-15 11:26:21 |
| SVM | Scikit-learn | Clustering | Image | 0.7747648 | 0.6607576 | 0.5499205 | 0.8193693 | 2.1489095 | 2024-07-16 11:26:21 |
| K-Means | Keras | Clustering | Image | 0.9829111 | 0.8643278 | 0.9483358 | 0.6210083 | 3.8106866 | 2024-07-17 11:26:21 |
| Neural Network | PyTorch | Clustering | Image | 0.7162489 | 0.7611541 | 0.4600737 | 0.6594077 | 4.5000268 | 2024-07-18 11:26:21 |
| K-Means | Scikit-learn | Clustering | Time Series | 0.6559081 | 0.9355140 | 0.7440546 | 0.4186895 | 0.5122255 | 2024-07-19 11:26:21 |
| SVM | TensorFlow | Clustering | Tabular | 0.7530709 | 0.6660280 | 0.4554531 | 0.5557459 | 2.0262091 | 2024-07-20 11:26:21 |
| Neural Network | TensorFlow | Classification | Text | 0.7197558 | 0.7642537 | 0.5251690 | 0.4202058 | 0.5912033 | 2024-07-21 11:26:21 |
| Neural Network | Keras | Regression | Time Series | 0.5528323 | 0.7787845 | 0.8936295 | 0.9275115 | 0.1813032 | 2024-07-22 11:26:21 |
| K-Means | TensorFlow | Classification | Time Series | 0.8204132 | 0.7550183 | 0.8102030 | 0.5460380 | 3.3424215 | 2024-07-23 11:26:21 |
| SVM | PyTorch | Clustering | Time Series | 0.6080191 | 0.8215803 | 0.3667795 | 0.7344023 | 3.0520023 | 2024-07-24 11:26:21 |
| Neural Network | Keras | Classification | Image | 0.8097940 | 0.5424601 | 0.6000914 | 0.4233876 | 0.8987937 | 2024-07-25 11:26:21 |
| K-Means | TensorFlow | Clustering | Tabular | 0.8251005 | 0.7074183 | 0.3204188 | NA | 1.2456829 | 2024-07-26 11:26:21 |
| SVM | Keras | Clustering | Time Series | 0.5760124 | 0.4625349 | 0.6366231 | 0.5938164 | 0.2161680 | 2024-07-27 11:26:21 |
| K-Means | TensorFlow | Classification | Text | 0.5306748 | 0.6307068 | 0.7637038 | 0.9387515 | 4.1912413 | 2024-07-28 11:26:21 |
| Random Forest | TensorFlow | Clustering | Tabular | 0.8903808 | 0.6926002 | 0.3829518 | 0.9328709 | 4.8759839 | 2024-07-29 11:26:21 |
| Neural Network | TensorFlow | Regression | Text | 0.7299002 | 0.7913346 | 0.5022195 | 0.5951744 | 0.7637517 | 2024-07-30 11:26:21 |
| Random Forest | PyTorch | Clustering | Tabular | 0.5290819 | 0.9703186 | 0.5784856 | 0.9405765 | 1.2342285 | 2024-07-31 11:26:21 |
| SVM | PyTorch | Regression | Text | 0.9974332 | 0.7603906 | 0.9436717 | 0.9976946 | 4.3552067 | 2024-08-01 11:26:21 |
| Random Forest | Keras | Regression | Image | 0.5288903 | 0.8461563 | 0.9952785 | 0.8952494 | 4.6396726 | 2024-08-02 11:26:21 |
| Random Forest | TensorFlow | Clustering | Time Series | 0.8475176 | 0.7037596 | 0.3314379 | 0.9069228 | 2.1561742 | 2024-08-03 11:26:21 |
| SVM | TensorFlow | Classification | Text | 0.9918395 | 0.7804624 | 0.8327055 | 0.5494052 | 0.3480501 | 2024-08-04 11:26:21 |
| K-Means | PyTorch | Regression | Tabular | 0.6195901 | 0.4425593 | 0.5602068 | 0.7460214 | 0.2930200 | 2024-08-05 11:26:21 |
| Random Forest | TensorFlow | Classification | Time Series | 0.5711247 | 0.5526349 | NA | 0.4403535 | 2.9123543 | 2024-08-06 11:26:21 |
| K-Means | PyTorch | Regression | Time Series | 0.5606925 | 0.6171119 | 0.8279855 | 0.4569508 | 20.2518603 | 2024-08-07 11:26:21 |
| Random Forest | TensorFlow | Clustering | Tabular | 0.6516376 | 0.6834960 | 0.9428985 | 0.9993356 | 0.2403969 | 2024-08-08 11:26:21 |
| K-Means | Keras | Clustering | Text | 0.5505229 | 0.4273892 | 0.9656500 | 0.5959835 | 2.9579367 | 2024-08-09 11:26:21 |
| SVM | TensorFlow | Regression | Image | 0.8460807 | 0.4840145 | 0.7039858 | NA | 0.1553824 | 2024-08-10 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | 0.5311459 | 0.5660886 | 5.4998481 | 0.8839990 | 3.9581226 | 2024-08-11 11:26:21 |
| SVM | Scikit-learn | Regression | Text | 0.7547111 | 0.9829196 | 0.8512844 | 0.9148140 | 1.6015208 | 2024-08-12 11:26:21 |
| Random Forest | PyTorch | Clustering | Time Series | 0.9983484 | 0.5988082 | 0.4757010 | 0.9985770 | 0.2972642 | 2024-08-13 11:26:21 |
| SVM | TensorFlow | Classification | Time Series | 0.9069851 | 0.6892246 | 0.6948518 | 0.5448979 | 2.9829259 | 2024-08-14 11:26:21 |
| SVM | TensorFlow | Clustering | Time Series | 0.8076097 | 0.5176586 | NA | 0.4242105 | 2.0486536 | 2024-08-15 11:26:21 |
| Random Forest | Scikit-learn | Classification | Time Series | 0.6531268 | NA | 0.7596418 | 0.6467149 | 4.8716220 | 2024-08-16 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Text | 0.8119479 | 0.5684099 | 0.4682792 | 0.4780484 | 2.7665734 | 2024-08-17 11:26:21 |
| Random Forest | PyTorch | Classification | Time Series | 0.7635207 | 0.5241955 | 0.4341151 | 0.4134555 | 1.4486720 | 2024-08-18 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Tabular | 0.7130417 | 0.7099436 | 0.9427674 | 0.6162561 | 3.5762504 | 2024-08-19 11:26:21 |
| K-Means | Keras | Classification | Time Series | 0.5653552 | 0.4033035 | 0.3712626 | 0.8702430 | 1.4303108 | 2024-08-20 11:26:21 |
| Neural Network | Scikit-learn | Regression | Image | 0.9433021 | 0.4045984 | 0.6541705 | 0.7397084 | 4.5307921 | 2024-08-21 11:26:21 |
| Neural Network | Keras | Regression | Time Series | NA | 0.5314413 | 0.4545969 | 0.5876668 | 1.9361377 | 2024-08-22 11:26:21 |
| SVM | PyTorch | Classification | Text | 0.5973113 | 0.4220328 | 0.3272510 | 0.7926052 | 2.7943560 | 2024-08-23 11:26:21 |
| Random Forest | PyTorch | Clustering | Image | 0.6838797 | 0.4648155 | 0.3252132 | 0.5392109 | 0.3478082 | 2024-08-24 11:26:21 |
| K-Means | PyTorch | Classification | Text | 0.7070649 | 0.6033164 | 0.4226552 | 0.4086289 | 2.1879941 | 2024-08-25 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.9137689 | 0.8815514 | 0.9067392 | 0.8586120 | NA | 2024-08-26 11:26:21 |
| Neural Network | PyTorch | Regression | Image | 0.8668072 | 0.7432292 | 0.4977351 | 0.7742459 | 4.0476791 | 2024-08-27 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.8846524 | NA | 0.9653214 | 0.8573816 | 1.1991609 | 2024-08-28 11:26:21 |
| SVM | Keras | Regression | Text | 0.5055156 | 5.7609329 | 0.7071356 | 0.4233628 | 1.2077874 | 2024-08-29 11:26:21 |
| Random Forest | PyTorch | Clustering | Time Series | 0.7080770 | 0.9590522 | 0.6056299 | 0.9022718 | 4.1047960 | 2024-08-30 11:26:21 |
| SVM | Scikit-learn | Clustering | Tabular | 0.7406721 | 0.6382090 | 0.7060622 | 0.7717157 | 4.6589523 | 2024-08-31 11:26:21 |
| Random Forest | TensorFlow | Clustering | Text | 0.5095961 | 0.4522557 | 0.6616889 | 0.7380372 | 0.5672685 | 2024-09-01 11:26:21 |
| Neural Network | PyTorch | Classification | Text | 0.6299066 | 0.7702399 | 0.8311434 | 7.7476843 | 2.3052873 | 2024-09-02 11:26:21 |
| K-Means | Scikit-learn | Classification | Text | 0.8801449 | 0.4683030 | 0.4977472 | NA | 1.7535260 | 2024-09-03 11:26:21 |
| Neural Network | TensorFlow | Regression | Time Series | 0.5685549 | 0.6071339 | 0.5471353 | 0.7521619 | 4.3663734 | 2024-09-04 11:26:21 |
| K-Means | PyTorch | Regression | Tabular | 0.7676551 | 7.0444716 | 0.9258660 | 0.7485702 | 0.5092721 | 2024-09-05 11:26:21 |
| Random Forest | PyTorch | Classification | Image | 0.6076009 | 0.9245335 | 0.9625196 | 0.9944075 | 1.1345171 | 2024-09-06 11:26:21 |
| SVM | Keras | Regression | Time Series | NA | 0.6961279 | 0.9247908 | 0.8540381 | 3.7870949 | 2024-09-07 11:26:21 |
| Neural Network | TensorFlow | Classification | Time Series | 0.6206007 | 0.8213553 | 0.5936136 | 0.6653763 | 0.3513399 | 2024-09-08 11:26:21 |
| K-Means | TensorFlow | Clustering | Tabular | 0.9879369 | 0.9956901 | 0.8462559 | 0.8244382 | 2.5134234 | 2024-09-09 11:26:21 |
| Neural Network | Scikit-learn | Regression | Tabular | 0.9007686 | 0.4788935 | NA | 0.6335383 | 2.2663245 | 2024-09-10 11:26:21 |
| K-Means | Keras | Regression | Time Series | 0.9797883 | 0.5648389 | 0.6482779 | 0.5373253 | 1.7385658 | 2024-09-11 11:26:21 |
| Neural Network | Keras | Classification | Image | 0.7439270 | 0.6367456 | 0.4432761 | 0.7581111 | 2.0334043 | 2024-09-12 11:26:21 |
| K-Means | Keras | Regression | Time Series | 0.5548681 | 0.6530969 | 0.7137912 | 0.9569093 | 2.6967089 | 2024-09-13 11:26:21 |
| K-Means | Keras | Regression | Tabular | 0.7739797 | 0.6466126 | 0.4302951 | 0.9574837 | 0.8907001 | 2024-09-14 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | 0.7271887 | NA | 0.5320953 | 0.6051432 | 2.9027798 | 2024-09-15 11:26:21 |
| K-Means | TensorFlow | Regression | Tabular | 0.9221785 | 0.8284196 | 0.8985211 | 0.7166065 | 4.0466184 | 2024-09-16 11:26:21 |
| Random Forest | PyTorch | Clustering | Image | 0.5490413 | 0.7647431 | 0.4449531 | 0.5269815 | 3.8247886 | 2024-09-17 11:26:21 |
No.filas : 560
No. Columnas : 10
Nombres de las variables: “Algorithm”, “Framework”,
“Problem_Type”, “Dataset_Type”, “Accuracy”, “Precision”, “Recall”
“F1_Score”, “Training_Time” y “Date”
- Tipos de variables:
cualitativas nominales: “Algorithm”, “Framework”,
“Problem_Type”, “Dataset_Type”, “Date”
cualitativas Ordinales : no hay
cuantitativas: “Accuracy”, “Precision”, “Recall”,
“F1_Score”, “Training_Time”
Ninguna variable necesita un tipo de conversión, porque cada una está registrada con su tipo de dato correspondiente.
Analizaremos cada una de las variables, viendo sus medidas descriptivas, con el fin de tener una visión rápida de los datos y detectar patrones, anomalías o tendencias. Además, usaremos gráficas de barras para las variables categóricas e histogramas para las numéricas.
La frecuencia de cada categoría de esta variable es la siguiente:
library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Algorithm))
colnames(tabla_frecuencia)[1] <- "Algorithm"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
| Algorithm | Frecuencia |
|---|---|
| K-Means | 163 |
| Neural Network | 135 |
| Random Forest | 126 |
| SVM | 136 |
Se puede notar que todos los algoritmos tienen casi la misma ocurrencia. Además, la moda de esta variable categórica es K_Means.
GRÁFICA:
library(ggplot2)
library(dplyr)
tabla_1 <- datos %>%
dplyr::group_by(Algorithm) %>%
dplyr::summarise(Total=n()) %>%
dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
dplyr::arrange(Algorithm)
G1<-ggplot(tabla_1, aes(x =Algorithm, y=Total) )+
geom_bar(width = 0.7, stat="identity",
position=position_dodge(), fill="cyan4")+
ylim(c(0,170))+
labs(x="Algoritmo", y="Frecuencia")+
geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
vjust=-0.9,
color="black",
hjust=0.5,
position=position_dodge(0.9),
angle=0,
size=4.5)+
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
theme_bw(base_size=16)+
facet_wrap(~"Distribución variable Algorithm")
G1
La frecuencia de cada categoría de esta variable es la siguiente:
library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Framework))
colnames(tabla_frecuencia)[1] <- "Framework"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
| Framework | Frecuencia |
|---|---|
| Keras | 124 |
| PyTorch | 135 |
| Scikit-learn | 134 |
| TensorFlow | 167 |
Se puede apreciar que todos los frameworks son usados casi con la misma ocurrencia, la moda de esta variable es TensorFlow.
GRÁFICA
library(ggplot2)
library(dplyr)
tabla_2 <- datos %>%
dplyr::group_by(Framework) %>%
dplyr::summarise(Total=n()) %>%
dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
dplyr::arrange(Framework)
G2<-ggplot(tabla_2, aes(x = Framework, y=Total) )+
geom_bar(width = 0.7, stat="identity",
position=position_dodge(), fill="cyan4")+
ylim(c(0,180))+
labs(x="Framework", y="Frecuencia")+
geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
vjust=-0.9,
color="black",
hjust=0.5,
position=position_dodge(0.9),
angle=0,
size=4.5)+
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
theme_bw(base_size=16)+
facet_wrap(~"Distribución variable Framework")
G2
La frecuencia de cada categoría de esta variable es la siguiente:
library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Problem_Type))
colnames(tabla_frecuencia)[1] <- "Problem_Type"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
| Problem_Type | Frecuencia |
|---|---|
| Classification | 175 |
| Clustering | 196 |
| Regression | 189 |
Se observa que todos tipos de problemas abordados por los modelos se utilizan con una frecuencia similar, siendo Clustering la moda de esta variable.
GRÁFICA
library(ggplot2)
library(dplyr)
tabla_3 <- datos %>%
dplyr::group_by(Problem_Type) %>%
dplyr::summarise(Total=n()) %>%
dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
dplyr::arrange(Problem_Type)
G3<-ggplot(tabla_3, aes(x = Problem_Type, y=Total) )+
geom_bar(width = 0.7, stat="identity",
position=position_dodge(), fill="cyan4")+
ylim(c(0,210))+
labs(x="Problem_Type", y="Frecuencia")+
geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
vjust=-0.9,
color="black",
hjust=0.5,
position=position_dodge(0.9),
angle=0,
size=4.5)+
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
theme_bw(base_size=16)+
facet_wrap(~"Distribución variable Problem_Type")
G3
La frecuencia de cada categoría de esta variable es la siguiente:
library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Dataset_Type))
colnames(tabla_frecuencia)[1] <- "Dataset_Type"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")
| Dataset_Type | Frecuencia |
|---|---|
| Image | 157 |
| Tabular | 136 |
| Text | 143 |
| Time Series | 124 |
Se aprecia que todos los tipos de datos usados en el entrenamiento del modelo tienen casi la misma frecuencia, siendo Image el tipo de dato más usado.
GRÁFICA
library(ggplot2)
library(dplyr)
tabla_4 <- datos %>%
dplyr::group_by(Dataset_Type) %>%
dplyr::summarise(Total=n()) %>%
dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
dplyr::arrange(Dataset_Type)
G4<-ggplot(tabla_4, aes(x = Dataset_Type, y=Total) )+
geom_bar(width = 0.7, stat="identity",
position=position_dodge(), fill="cyan4")+
ylim(c(0,165))+
labs(x="Dataset_Type", y="Frecuencia")+
geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
vjust=-0.9,
color="black",
hjust=0.5,
position=position_dodge(0.9),
angle=0,
size=4.5)+
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
theme_bw(base_size=16)+
facet_wrap(~"Distribución variable Dataset_Type")
G4
**Filtrar la base para compenderla mejor*
library(dplyr)
df_clustering <- filter(datos, Problem_Type == "Clustering" & Algorithm == "K-Means")
str(df_clustering)
## tibble [58 × 10] (S3: tbl_df/tbl/data.frame)
## $ Algorithm : chr [1:58] "K-Means" "K-Means" "K-Means" "K-Means" ...
## $ Framework : chr [1:58] "Keras" "Keras" "PyTorch" "Keras" ...
## $ Problem_Type : chr [1:58] "Clustering" "Clustering" "Clustering" "Clustering" ...
## $ Dataset_Type : chr [1:58] "Time Series" "Text" "Text" "Text" ...
## $ Accuracy : num [1:58] 0.744 0.84 0.699 NA 0.96 ...
## $ Precision : num [1:58] 0.49 0.663 0.782 0.511 0.504 ...
## $ Recall : num [1:58] 0.877 0.558 0.775 0.691 0.54 ...
## $ F1_Score : num [1:58] 0.441 0.569 0.984 0.991 0.853 ...
## $ Training_Time: num [1:58] NA 3.485 2.33 0.355 0.256 ...
## $ Date : POSIXct[1:58], format: "2023-03-09 11:26:21" "2023-03-22 11:26:21" ...
Descripción: en total hay 58 pruebas donde se usó el algoritmo de K-means para abordar problemas de Clustering.
library(dplyr)
df_clustering <- filter(datos, Training_Time < 2 & Training_Time>1)
str(df_clustering)
## tibble [112 × 10] (S3: tbl_df/tbl/data.frame)
## $ Algorithm : chr [1:112] "SVM" "K-Means" "SVM" "K-Means" ...
## $ Framework : chr [1:112] "PyTorch" "PyTorch" "PyTorch" "PyTorch" ...
## $ Problem_Type : chr [1:112] "Regression" "Regression" "Clustering" "Classification" ...
## $ Dataset_Type : chr [1:112] "Image" "Text" "Text" "Image" ...
## $ Accuracy : num [1:112] 0.897 0.975 0.885 0.566 0.715 ...
## $ Precision : num [1:112] 9.732 0.423 0.612 0.797 0.981 ...
## $ Recall : num [1:112] 0.781 0.826 0.507 0.386 0.493 ...
## $ F1_Score : num [1:112] 0.793 0.477 0.89 0.884 0.5 ...
## $ Training_Time: num [1:112] 1.93 1.45 1.46 1.84 1.14 ...
## $ Date : POSIXct[1:112], format: "2023-03-18 11:26:21" "2023-03-25 11:26:21" ...
Descripción: en total hay 112 pruebas donde el Training_time estuvo entre 1 y 2. Esto representa un porcentaje alto porque se cuentan con 560 pruebas.
library(dplyr)
df_clustering <- filter(datos, Precision < 0.5)
str(df_clustering)
## tibble [89 × 10] (S3: tbl_df/tbl/data.frame)
## $ Algorithm : chr [1:89] "K-Means" "K-Means" "Random Forest" "Random Forest" ...
## $ Framework : chr [1:89] "Keras" "PyTorch" "Keras" "Keras" ...
## $ Problem_Type : chr [1:89] "Clustering" "Regression" "Classification" "Regression" ...
## $ Dataset_Type : chr [1:89] "Time Series" "Text" "Image" "Text" ...
## $ Accuracy : num [1:89] 0.744 0.975 0.992 0.947 0.826 ...
## $ Precision : num [1:89] 0.49 0.423 0.44 0.49 0.425 ...
## $ Recall : num [1:89] 0.877 0.826 0.716 0.745 NA ...
## $ F1_Score : num [1:89] 0.441 0.477 0.583 0.538 0.535 ...
## $ Training_Time: num [1:89] NA 1.449 4.203 0.194 4.207 ...
## $ Date : POSIXct[1:89], format: "2023-03-09 11:26:21" "2023-03-25 11:26:21" ...
Descripción: en total hay 89 pruebas donde la precision fue menor al 0.5. Si esta medida representa porcentaje, entonces 89 pruebas tuvieron una precision menor al 50%.
df_filtered <- filter(datos, Algorithm == "SVM", Dataset_Type == "Image", Accuracy > 0.7)
str(df_clustering)
## tibble [89 × 10] (S3: tbl_df/tbl/data.frame)
## $ Algorithm : chr [1:89] "K-Means" "K-Means" "Random Forest" "Random Forest" ...
## $ Framework : chr [1:89] "Keras" "PyTorch" "Keras" "Keras" ...
## $ Problem_Type : chr [1:89] "Clustering" "Regression" "Classification" "Regression" ...
## $ Dataset_Type : chr [1:89] "Time Series" "Text" "Image" "Text" ...
## $ Accuracy : num [1:89] 0.744 0.975 0.992 0.947 0.826 ...
## $ Precision : num [1:89] 0.49 0.423 0.44 0.49 0.425 ...
## $ Recall : num [1:89] 0.877 0.826 0.716 0.745 NA ...
## $ F1_Score : num [1:89] 0.441 0.477 0.583 0.538 0.535 ...
## $ Training_Time: num [1:89] NA 1.449 4.203 0.194 4.207 ...
## $ Date : POSIXct[1:89], format: "2023-03-09 11:26:21" "2023-03-25 11:26:21" ...
Descripción: en total hay 89 pruebas donde se usó el algoritmo de SVM, el tipo de dato fue Image y tuvo una precisión en el conjunto de prueba mayor al 0.7.
df_filtered <- datos %>%
filter(Problem_Type == "Clustering",
Framework == "Keras",
Precision > 0.6,
Accuracy > 0.8)
str(df_filtered)
## tibble [15 × 10] (S3: tbl_df/tbl/data.frame)
## $ Algorithm : chr [1:15] "SVM" "SVM" "K-Means" "K-Means" ...
## $ Framework : chr [1:15] "Keras" "Keras" "Keras" "Keras" ...
## $ Problem_Type : chr [1:15] "Clustering" "Clustering" "Clustering" "Clustering" ...
## $ Dataset_Type : chr [1:15] "Text" "Image" "Text" "Tabular" ...
## $ Accuracy : num [1:15] 0.842 0.847 0.84 0.809 8.532 ...
## $ Precision : num [1:15] 0.842 0.872 0.663 0.623 0.837 ...
## $ Recall : num [1:15] 0.875 0.38 0.558 0.538 NA ...
## $ F1_Score : num [1:15] 0.704 0.491 0.569 NA 0.982 ...
## $ Training_Time: num [1:15] 4.042 4.714 3.485 1.224 0.407 ...
## $ Date : POSIXct[1:15], format: "2023-03-11 11:26:21" "2023-03-19 11:26:21" ...
Descripción: en total hay 15 pruebas donde el tipo de problema fie CLustering, el framework usado fue Keras, la precisión fue mayor al 0.6 y la precisión en el conjunto de prueba fue mayor a 0.8.
Se analizará cada variable númerica, viendo sus medidas de localización, medidas de dispersión, medidas de distribución y gráficas. Ya teniendo cada analisis se puede hacer la imputación de los datos faltantes, dado que, para hacer este paso se necesitan saber las medidas anteriores.
Este es el resumen de la variable:
library(pander)
pander(summary(datos$Accuracy))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s |
|---|---|---|---|---|---|---|
| 0.5038 | 0.6236 | 0.7578 | 0.8779 | 0.8824 | 9.718 | 39 |
num_NA <- sum(is.na(datos$Accuracy))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 6.964286
-Medidas de localización:
Esta variable tiene como media 0.8779 y mediana 0.7578.
-Prueba de normalidad:
library(nortest)
ad_test <- ad.test(datos$Accuracy)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos$Accuracy
## A = 131.83, p-value < 2.2e-16
Esta variable numérica no sigue una distribución normal porque se
hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo
que indica que hay una fuerte evidencia para rechazar la hipótesis nula
de normalidad. Además, la gráfica que se hará, reflejará que la variable
no sigue una distribución normal.
-Coeficiete de dispersión:
media<-mean(datos$Accuracy, na.rm = TRUE)
sd_a<-sd(datos$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 107.4343
Un coeficiente mayor a 100% como en este caso, indica una alta
dispersión o variabilidad de los datos respecto a su media.
-Asimetría:
media <- mean(datos$Accuracy, na.rm = TRUE)
mediana <- median(datos$Accuracy, na.rm = TRUE)
desviacion <- sd(datos$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.38177
Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.
GRÁFICA
Podemos ver que los NA’s de esta variable son 39, que corresponden a
un 6.96 porciento del total. Por lo cual, podemos usar el método de
borrado de listas para tratar esta variable.
De esta forma, haremos los histogramas antes y después del tratamiento
para ver si no afecta en nada borrar la lista.
library(dplyr)
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)
dim(datos_omit)
## [1] 448 10
ggp1 <- ggplot(data.frame(value=datos$Accuracy), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Base de datos original") +
xlab("Accuracy") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp2 <- ggplot(data.frame(value=datos_omit$Accuracy), aes(x=value)) +
geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
ggtitle("Después del tratamiento") +
xlab("Accuracy") + ylab("Frecuencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
grid.arrange(ggp1, ggp2, ncol = 2)
ks_test <- ks.test(datos$Accuracy, datos_omit$Accuracy)
print(ks_test)
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: datos$Accuracy and datos_omit$Accuracy
## D = 0.022159, p-value = 0.9998
## alternative hypothesis: two-sided
Este es el resumen de la variable:
library(pander)
pander(summary(datos$Precision))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s |
|---|---|---|---|---|---|---|
| 0.4019 | 0.5632 | 0.7195 | 0.8129 | 0.8596 | 9.732 | 19 |
num_NA <- sum(is.na(datos$Precision))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.392857
-Medidas de localización:
Esta variable tiene como media 0.8129 y mediana 0.7195.
-Prueba de normalidad:
library(nortest)
ad_test <- ad.test(datos$Precision)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos$Precision
## A = 118.96, p-value < 2.2e-16
Esta variable numérica no sigue una distribución normal porque se
hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo
que indica que hay una fuerte evidencia para rechazar la hipótesis nula
de normalidad. Además, la gráfica que se hará, reflejará que la variable
no sigue una distribución normal.
-Coeficiete de dispersión:
media<-mean(datos$Precision, na.rm = TRUE)
sd_a<-sd(datos$Precision, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 104.7427
Un coeficiente mayor a 100% como en este caso, indica una alta
dispersión o variabilidad de los datos respecto a su media.
-Asimetría:
media <- mean(datos$Precision, na.rm = TRUE)
mediana <- median(datos$Precision, na.rm = TRUE)
desviacion <- sd(datos$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3292674
Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.
GRÁFICA
Los NA’s de esta variable son 19, que corresponden a un 3.39 porciento
del total. Por lo cual, podemos usar el método de borrado de listas para
tratar esta variable.
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)
ggp1 <- ggplot(data.frame(value=datos$Precision), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Base de datos original") +
xlab("Precision") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp2 <- ggplot(data.frame(value=datos_omit$Precision), aes(x=value)) +
geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
ggtitle("Después del tratamiento") +
xlab("Precisión") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
grid.arrange(ggp1, ggp2, ncol = 2)
ks_test <- ks.test(datos$Precision, datos_omit$Precision)
print(ks_test)
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: datos$Precision and datos_omit$Precision
## D = 0.019503, p-value = 1
## alternative hypothesis: two-sided
Este es el resumen de la variable:
library(pander)
pander(summary(datos$Recall))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s |
|---|---|---|---|---|---|---|
| 0.3001 | 0.4819 | 0.6493 | 0.7486 | 0.8404 | 9.366 | 20 |
num_NA <- sum(is.na(datos$Recall))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.571429
-Medidas de localización:
Esta variable tiene como media 0.7486 y mediana 0.6493.
-Prueba de normalidad:
library(nortest)
ad_test <- ad.test(datos$Recall)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos$Recall
## A = 98.028, p-value < 2.2e-16
Esta variable numérica no sigue una distribución normal porque se
hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo
que indica que hay una fuerte evidencia para rechazar la hipótesis nula
de normalidad. Además, la gráfica que se hará, reflejará que la variable
no sigue una distribución normal.
-Coeficiete de dispersión:
media<-mean(datos$Recall, na.rm = TRUE)
sd_a<-sd(datos$Recall, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 104.7911
Un coeficiente mayor a 100% como en este caso, indica una alta
dispersión o variabilidad de los datos respecto a su media.
-Asimetría:
media <- mean(datos$Recall, na.rm = TRUE)
mediana <- median(datos$Recall, na.rm = TRUE)
desviacion <- sd(datos$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3797481
Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.
GRÁFICA
Los NA’s de esta variable son 20, que corresponden a un 3.57 porciento
del total. Por lo cual, podemos usar el método de borrado de listas para
tratar esta variable.
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)
ggp1 <- ggplot(data.frame(value=datos$Recall), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Base de datos original") +
xlab("Recall") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp2 <- ggplot(data.frame(value=datos_omit$Recall), aes(x=value)) +
geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
ggtitle("Después del tratamiento") +
xlab("Recall") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
grid.arrange(ggp1, ggp2, ncol = 2)
ks_test <- ks.test(datos$Recall, datos_omit$Recall)
print(ks_test)
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: datos$Recall and datos_omit$Recall
## D = 0.015873, p-value = 1
## alternative hypothesis: two-sided
Este es el resumen de la variable:
library(pander)
pander(summary(datos$F1_Score))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s |
|---|---|---|---|---|---|---|
| 0.4 | 0.5515 | 0.7086 | 0.8122 | 0.8438 | 9.374 | 20 |
num_NA <- sum(is.na(datos$F1_Score))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.571429
-Medidas de localización:
Esta variable tiene como media 0.8122 y mediana 0.7086.
-Prueba de normalidad:
library(nortest)
ad_test <- ad.test(datos$F1_Score)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos$F1_Score
## A = 122.21, p-value < 2.2e-16
Esta variable numérica no sigue una distribución normal porque se
hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo
que indica que hay una fuerte evidencia para rechazar la hipótesis nula
de normalidad. Además, la gráfica que se hará, reflejará que la variable
no sigue una distribución normal.
-Coeficiete de dispersión:
media<-mean(datos$F1_Score, na.rm = TRUE)
sd_a<-sd(datos$F1_Score, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 109.9297
Un coeficiente mayor a 100% como en este caso, indica una alta
dispersión o variabilidad de los datos respecto a su media.
-Asimetría:
media <- mean(datos$F1_Score, na.rm = TRUE)
mediana <- median(datos$F1_Score, na.rm = TRUE)
desviacion <- sd(datos$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3479023
Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.
GRÁFICA
Los NA’s de esta variable son 20, que corresponden a un 3.57 porciento
del total. Por lo cual, podemos usar el método de borrado de listas para
tratar esta variable.
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)
ggp1 <- ggplot(data.frame(value=datos$F1_Score), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Base de datos original") +
xlab("F1_Score") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp2 <- ggplot(data.frame(value=datos_omit$F1_Score), aes(x=value)) +
geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
ggtitle("Después del tratamiento") +
xlab("F1_Score") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
grid.arrange(ggp1, ggp2, ncol = 2)
ks_test <- ks.test(datos$F1_Score, datos_omit$F1_Score)
print(ks_test)
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: datos$F1_Score and datos_omit$F1_Score
## D = 0.021098, p-value = 0.9999
## alternative hypothesis: two-sided
Este es el resumen de la variable:
library(pander)
pander(summary(datos$Training_Time))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s |
|---|---|---|---|---|---|---|
| 0.1032 | 1.244 | 2.435 | 2.991 | 3.813 | 46.99 | 20 |
num_NA <- sum(is.na(datos$Training_Time))
porcentaje_na<-(100*num_NA)/560
porcentaje_na
## [1] 3.571429
-Medidas de localización:
Esta variable tiene como media 2.991 y mediana 2.435.
-Prueba de normalidad:
library(nortest)
ad_test <- ad.test(datos$Training_Time)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos$Training_Time
## A = 80.983, p-value < 2.2e-16
Esta variable numérica no sigue una distribución normal porque se
hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo
que indica que hay una fuerte evidencia para rechazar la hipótesis nula
de normalidad. Además, la gráfica que se hará, reflejará que la variable
no sigue una distribución normal.
-Coeficiete de dispersión:
media<-mean(datos$Training_Time, na.rm = TRUE)
sd_a<-sd(datos$Training_Time, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 147.9032
Un coeficiente mayor a 100% como en este caso, indica una alta
dispersión o variabilidad de los datos respecto a su media.
-Asimetría:
media <- mean(datos$Training_Time, na.rm = TRUE)
mediana <- median(datos$Training_Time, na.rm = TRUE)
desviacion <- sd(datos$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3772653
Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.
GRÁFICA
Usaremos el método de borrado de listas para tratar esta variable,
veremos si siguen la misma distribución mediante la prueba de
Kolmogorov-Smirnov (KS Test). Si es el caso, podemos tratar los datos
NA’s mediante la eliminación de filas .
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)
ggp1 <- ggplot(data.frame(value=datos$Training_Time), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Base de datos original") +
xlab("Training_Time") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp2 <- ggplot(data.frame(value=datos_omit$Training_Time), aes(x=value)) +
geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
ggtitle("Después del tratamiento") +
xlab("Training_Time") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
grid.arrange(ggp1, ggp2, ncol = 2)
ks_test <- ks.test(datos$Training_Time, datos_omit$Training_Time)
print(ks_test)
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: datos$Training_Time and datos_omit$Training_Time
## D = 0.013938, p-value = 1
## alternative hypothesis: two-sided
Se puede apreciar que todas las variables numéricas tienen un comportamiento similar con respecto a sus medidas de localización, dispersión, distribución y naturaleza de las gráficas que son sesgadas hacia las derecha. Además, al hacer la prueba Kolmogorov-Smirnov (KS Test) para ver si las distribuciones de las gráficas originales y las que no tienen NA’s, son iguales estadísticamente, podemos ver que los valores p son mayores a 0.05 y los D cercanos a 0, teniendo como conclusión que las distribuciones son iguales.
Gracias al analisis anterior donde se comparó las gráficas antes y después del borrado de listas correspondiente de cada variable, se puede concluir que no afecta en nada la gráfica el borrado de estos NA’s. Sin embargo, a continuación se mirará individualmente cada uno de estos casos para ver la falta de datos no está relacionada con los datos y en realidad son NA’s del tipo MCAR(Missing Completely At Random).
Eliminación de los NA de cada variable
datos_sin_na <- na.omit(datos)
Visualización
filas_na <- datos[!complete.cases(datos), ]
library(knitr)
library(kableExtra)
kable(filas_na, caption="Base de datos sin NA's") %>%
kable_styling(full_width=F) %>%
column_spec(2, width="20em") %>%
scroll_box(width="900px", height="450px")
| Algorithm | Framework | Problem_Type | Dataset_Type | Accuracy | Precision | Recall | F1_Score | Training_Time | Date |
|---|---|---|---|---|---|---|---|---|---|
| SVM | Scikit-learn | Regression | Time Series | 0.6618051 | 0.6929447 | NA | 0.4426950 | 4.9785924 | 2023-03-08 11:26:21 |
| K-Means | Keras | Clustering | Time Series | 0.7443216 | 0.4900292 | 0.8766533 | 0.4414046 | NA | 2023-03-09 11:26:21 |
| Neural Network | PyTorch | Regression | Text | 0.9985623 | 0.6366858 | 0.3357948 | 0.9014956 | NA | 2023-03-14 11:26:21 |
| SVM | Keras | Regression | Time Series | NA | 0.8710099 | 0.3416673 | 0.8161708 | 3.4064529 | 2023-03-16 11:26:21 |
| Random Forest | Keras | Regression | Text | 0.5818119 | 0.9352508 | NA | 0.8626737 | 3.4199049 | 2023-03-17 11:26:21 |
| Neural Network | PyTorch | Regression | Text | NA | 0.5528024 | 0.3847175 | 0.6551369 | 3.5159654 | 2023-03-23 11:26:21 |
| K-Means | TensorFlow | Regression | Time Series | NA | 0.7073332 | 0.7288014 | 0.8376069 | 3.0875174 | 2023-04-06 11:26:21 |
| Neural Network | Scikit-learn | Classification | Text | 0.8258334 | 0.4250037 | NA | 0.5345761 | 4.2069594 | 2023-04-08 11:26:21 |
| Random Forest | Scikit-learn | Regression | Image | NA | 0.8297940 | 0.9317173 | 8.4513780 | 4.8085087 | 2023-04-10 11:26:21 |
| K-Means | TensorFlow | Classification | Image | 5.9276276 | NA | 0.3994960 | 0.5817585 | 2.0692473 | 2023-04-13 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Image | 0.8882984 | 0.8425050 | 0.5954240 | 0.8275728 | NA | 2023-04-15 11:26:21 |
| SVM | Scikit-learn | Clustering | Time Series | 0.8805120 | 0.6098219 | 0.7040399 | NA | 2.9576349 | 2023-04-18 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Tabular | 0.8131102 | 0.8647921 | NA | 0.9411641 | 3.0049169 | 2023-04-19 11:26:21 |
| K-Means | Keras | Clustering | Text | NA | 0.5111173 | 0.6910497 | 0.9909150 | 0.3547478 | 2023-04-21 11:26:21 |
| Random Forest | Scikit-learn | Regression | Image | 0.7407612 | 0.8586236 | 0.8919231 | 0.7966086 | NA | 2023-04-25 11:26:21 |
| SVM | PyTorch | Regression | Time Series | 0.7457973 | 0.8615339 | 0.8522896 | NA | 4.4418578 | 2023-05-01 11:26:21 |
| K-Means | Keras | Classification | Image | 0.5321045 | 0.7747981 | NA | 0.9713331 | 1.0636026 | 2023-05-02 11:26:21 |
| Random Forest | TensorFlow | Classification | Image | 0.6210225 | 0.4768574 | NA | 0.8373886 | 0.5170071 | 2023-05-20 11:26:21 |
| Neural Network | PyTorch | Regression | Tabular | 0.9933313 | 0.6029605 | 0.3178134 | 0.9170145 | NA | 2023-05-21 11:26:21 |
| K-Means | Keras | Clustering | Tabular | 0.8090779 | 0.6233002 | 0.5384229 | NA | 1.2238923 | 2023-05-24 11:26:21 |
| SVM | Scikit-learn | Clustering | Time Series | 0.6753135 | 0.5184823 | 0.4525248 | NA | 3.8205333 | 2023-05-31 11:26:21 |
| Random Forest | TensorFlow | Classification | Time Series | 0.7837704 | 0.9168220 | NA | 0.8717234 | 1.6986442 | 2023-06-05 11:26:21 |
| Random Forest | Keras | Clustering | Text | 8.5323786 | 0.8370944 | NA | 0.9821543 | 0.4065677 | 2023-06-12 11:26:21 |
| Random Forest | TensorFlow | Regression | Time Series | 0.9354846 | 0.9624328 | NA | 0.9802212 | NA | 2023-06-18 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | NA | 0.8355879 | 0.9218826 | 0.9175843 | 2.8607010 | 2023-06-22 11:26:21 |
| SVM | Scikit-learn | Clustering | Text | 0.8657948 | 0.7050163 | 0.5382710 | 5.8587272 | NA | 2023-06-30 11:26:21 |
| K-Means | TensorFlow | Clustering | Image | NA | 0.5785291 | 0.6789853 | 0.5740273 | 0.5031344 | 2023-07-01 11:26:21 |
| Neural Network | Scikit-learn | Classification | Text | 0.5301760 | 0.7390132 | 0.4079015 | NA | 2.3519619 | 2023-07-02 11:26:21 |
| K-Means | PyTorch | Classification | Image | 0.8293539 | NA | NA | 0.8044120 | 3.9075415 | 2023-07-11 11:26:21 |
| Random Forest | PyTorch | Regression | Tabular | 0.7490979 | NA | 0.5397136 | 0.4311015 | 4.3247946 | 2023-07-12 11:26:21 |
| Neural Network | TensorFlow | Clustering | Time Series | 0.7776818 | 0.4595069 | 4.8917338 | NA | 1.6379899 | 2023-07-13 11:26:21 |
| K-Means | PyTorch | Classification | Image | NA | 0.8800426 | 0.6903962 | 0.5840660 | 4.2165296 | 2023-07-15 11:26:21 |
| Neural Network | PyTorch | Classification | Tabular | NA | 0.7325359 | 0.9956939 | 0.5714550 | 2.4675862 | 2023-07-23 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | NA | 0.9748441 | 0.5765964 | 0.9666691 | 2.3245506 | 2023-07-29 11:26:21 |
| SVM | TensorFlow | Regression | Tabular | NA | 0.5961590 | 0.6328822 | 0.8028875 | 0.7174099 | 2023-07-31 11:26:21 |
| SVM | Scikit-learn | Clustering | Image | NA | 0.5833625 | 0.4594248 | 0.5193953 | 4.7620796 | 2023-08-02 11:26:21 |
| Neural Network | Scikit-learn | Classification | Time Series | NA | 0.4019310 | 0.9139634 | 0.9824059 | 28.9729934 | 2023-08-13 11:26:21 |
| SVM | PyTorch | Classification | Time Series | 0.8157801 | 0.6132958 | 0.4041572 | 0.6421606 | NA | 2023-08-15 11:26:21 |
| K-Means | Scikit-learn | Classification | Image | NA | 0.4557944 | 0.9866912 | 0.8227327 | 0.9959051 | 2023-08-17 11:26:21 |
| K-Means | TensorFlow | Classification | Image | NA | 0.7529214 | 0.6383852 | 0.6536372 | 4.1459837 | 2023-08-18 11:26:21 |
| SVM | Keras | Regression | Time Series | 0.9856975 | NA | 0.5000485 | 0.5231998 | 2.8991763 | 2023-08-22 11:26:21 |
| Random Forest | Keras | Clustering | Tabular | 0.6796167 | 0.6989534 | 0.9865175 | 0.4453502 | NA | 2023-08-30 11:26:21 |
| SVM | Scikit-learn | Classification | Text | 0.7964754 | NA | 0.5853089 | 0.9708539 | 0.9581034 | 2023-08-31 11:26:21 |
| SVM | TensorFlow | Clustering | Image | 0.5817619 | 0.4351306 | 0.8792632 | 0.5783745 | NA | 2023-09-01 11:26:21 |
| SVM | Scikit-learn | Clustering | Text | 0.9847062 | 0.8709382 | NA | 0.7594268 | 1.1692169 | 2023-09-03 11:26:21 |
| Neural Network | Keras | Clustering | Tabular | NA | 0.8246086 | 0.9692330 | 0.7741893 | 4.3829517 | 2023-09-04 11:26:21 |
| Random Forest | Keras | Clustering | Tabular | NA | 0.6604116 | 0.9251631 | 0.8056158 | 3.1739118 | 2023-09-10 11:26:21 |
| SVM | TensorFlow | Regression | Image | NA | 0.5323672 | 0.9184253 | 0.7660045 | 0.8450380 | 2023-09-12 11:26:21 |
| SVM | Scikit-learn | Regression | Text | 0.7598870 | 0.8413979 | NA | 0.5626578 | 1.8209750 | 2023-09-14 11:26:21 |
| Neural Network | PyTorch | Clustering | Text | 0.9144417 | 0.9598680 | 0.9484678 | NA | 2.4820394 | 2023-09-16 11:26:21 |
| SVM | PyTorch | Regression | Tabular | NA | 0.7855391 | 0.3133813 | 0.9680402 | 4.6116243 | 2023-09-17 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Tabular | 0.5559598 | 0.8990182 | NA | 0.9745486 | 2.0495419 | 2023-10-02 11:26:21 |
| SVM | PyTorch | Regression | Image | 0.5610550 | 0.6577941 | 0.3999952 | 0.4611359 | NA | 2023-10-08 11:26:21 |
| SVM | TensorFlow | Classification | Time Series | NA | 0.6897813 | 0.7295229 | 0.6604126 | 0.2710666 | 2023-10-14 11:26:21 |
| K-Means | Scikit-learn | Regression | Tabular | NA | 0.8367636 | 0.3543545 | 0.8340687 | 4.2119839 | 2023-10-17 11:26:21 |
| Random Forest | TensorFlow | Clustering | Tabular | 0.8195600 | 0.6621104 | NA | 0.7536724 | 0.2223611 | 2023-10-19 11:26:21 |
| Random Forest | PyTorch | Classification | Time Series | 0.8334321 | 0.4812124 | 0.9633816 | NA | 0.6468057 | 2023-10-24 11:26:21 |
| K-Means | Keras | Classification | Time Series | 0.6009267 | NA | 0.3538035 | 0.5490258 | 4.0270843 | 2023-10-29 11:26:21 |
| SVM | TensorFlow | Regression | Image | NA | 0.5481873 | 0.7949605 | 0.5485359 | 0.7148829 | 2023-11-01 11:26:21 |
| SVM | PyTorch | Classification | Text | NA | 0.8660263 | 0.4967031 | 0.4486895 | 1.9978803 | 2023-11-07 11:26:21 |
| K-Means | TensorFlow | Clustering | Text | 0.9654743 | NA | 0.7483332 | 0.7700702 | 0.2475090 | 2023-11-15 11:26:21 |
| K-Means | Scikit-learn | Classification | Time Series | 0.8256165 | 0.7361009 | 0.9220614 | 0.9524600 | NA | 2023-11-21 11:26:21 |
| SVM | Scikit-learn | Regression | Text | 0.9392578 | 0.6783595 | NA | 0.8232765 | 3.5606062 | 2023-12-02 11:26:21 |
| K-Means | Keras | Regression | Image | NA | 0.4688613 | 0.5223108 | 0.7015787 | 1.0100921 | 2023-12-13 11:26:21 |
| SVM | Keras | Clustering | Time Series | 0.6980863 | 0.4406010 | NA | 0.9562664 | 0.4028615 | 2023-12-16 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | NA | 0.8247011 | 0.4933187 | 0.4632359 | 0.5525666 | 2023-12-17 11:26:21 |
| Neural Network | PyTorch | Classification | Image | NA | 0.6749804 | 0.8875369 | 0.7931042 | 0.8373141 | 2023-12-20 11:26:21 |
| Neural Network | Keras | Classification | Tabular | 0.6866259 | NA | 0.3026739 | 0.5561420 | 4.8435743 | 2023-12-21 11:26:21 |
| Random Forest | Keras | Classification | Text | 0.8097452 | NA | 0.5521637 | 0.7985316 | 3.3433768 | 2023-12-26 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.9316668 | 0.7272591 | 0.5193435 | NA | NA | 2023-12-29 11:26:21 |
| Neural Network | PyTorch | Regression | Image | 0.7395909 | 0.7075016 | 0.9243105 | 0.5735125 | NA | 2023-12-31 11:26:21 |
| Neural Network | TensorFlow | Clustering | Image | NA | 0.4311739 | 0.5641226 | 0.7090034 | 0.1337020 | 2024-01-03 11:26:21 |
| Random Forest | PyTorch | Clustering | Time Series | 0.9777618 | NA | 0.3030543 | 0.9716890 | 1.6380026 | 2024-01-11 11:26:21 |
| SVM | PyTorch | Classification | Image | 9.4901527 | 0.6707436 | 0.8327265 | 0.9257917 | NA | 2024-01-16 11:26:21 |
| K-Means | Keras | Classification | Image | NA | 0.9909938 | 0.9655409 | 0.4615408 | 2.2067923 | 2024-01-17 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Tabular | 0.6536450 | 0.7940681 | 0.6502808 | NA | 2.2258352 | 2024-01-26 11:26:21 |
| Random Forest | Scikit-learn | Classification | Time Series | NA | 0.7807495 | 0.8952436 | 0.4011953 | 2.8763658 | 2024-02-05 11:26:21 |
| K-Means | Keras | Classification | Image | NA | 0.7198173 | 0.9161097 | 0.4968177 | 1.7012941 | 2024-02-09 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.7417728 | NA | 0.5067110 | 0.9345133 | 0.6118271 | 2024-02-13 11:26:21 |
| K-Means | Scikit-learn | Clustering | Tabular | 0.6767107 | 0.8821621 | 0.4873149 | NA | 1.5521523 | 2024-02-28 11:26:21 |
| Random Forest | TensorFlow | Classification | Text | NA | 0.6564938 | 0.5830028 | 0.7788514 | 0.3585498 | 2024-03-08 11:26:21 |
| Neural Network | TensorFlow | Regression | Time Series | NA | 0.6755591 | 0.3841451 | 0.6845530 | 2.3867963 | 2024-03-25 11:26:21 |
| SVM | Scikit-learn | Classification | Tabular | 0.8621694 | 0.4603825 | 0.3009475 | NA | 2.3497077 | 2024-04-02 11:26:21 |
| K-Means | Scikit-learn | Regression | Image | 0.6370803 | 0.4851832 | 0.7525211 | NA | NA | 2024-04-04 11:26:21 |
| K-Means | Scikit-learn | Clustering | Image | 0.9470954 | 0.8492958 | 0.3165165 | 0.8749349 | NA | 2024-04-07 11:26:21 |
| K-Means | Keras | Classification | Image | 0.9001783 | 0.4757588 | 0.4597648 | NA | 2.8547252 | 2024-04-15 11:26:21 |
| Random Forest | Keras | Classification | Time Series | 0.7664789 | NA | 0.3404911 | 0.6336430 | 3.1026996 | 2024-04-20 11:26:21 |
| SVM | Scikit-learn | Clustering | Image | NA | 0.7857291 | 0.6811376 | 0.4444633 | 2.7982506 | 2024-04-25 11:26:21 |
| K-Means | Scikit-learn | Clustering | Text | 0.5300712 | NA | 0.4577669 | 0.9983094 | 3.0979720 | 2024-04-29 11:26:21 |
| Random Forest | Scikit-learn | Clustering | Image | NA | 0.9629829 | 0.8469253 | 0.6103123 | 1.8389986 | 2024-05-03 11:26:21 |
| Neural Network | Scikit-learn | Clustering | Image | NA | 0.7989909 | 0.7451683 | 0.6785431 | 0.4435299 | 2024-05-07 11:26:21 |
| K-Means | Scikit-learn | Regression | Text | 0.8780817 | 0.5561721 | 0.5717160 | NA | 2.0367706 | 2024-05-08 11:26:21 |
| SVM | Scikit-learn | Classification | Text | 0.9210166 | NA | 0.3054891 | 0.4398780 | 3.6842832 | 2024-05-19 11:26:21 |
| Neural Network | PyTorch | Classification | Tabular | NA | 0.4252148 | 0.6133714 | 0.7502282 | 1.1801854 | 2024-05-21 11:26:21 |
| K-Means | TensorFlow | Regression | Image | 0.5638567 | NA | 0.6707618 | NA | 4.5904607 | 2024-05-24 11:26:21 |
| Random Forest | PyTorch | Clustering | Tabular | 0.9684789 | 0.5828507 | 0.6029712 | NA | 4.1399104 | 2024-06-02 11:26:21 |
| Neural Network | Keras | Regression | Text | NA | 0.9113583 | 0.4816792 | 0.7042358 | 4.8939385 | 2024-06-20 11:26:21 |
| Random Forest | PyTorch | Clustering | Tabular | 0.5662623 | 0.9134177 | 0.6650217 | 0.9634411 | NA | 2024-07-01 11:26:21 |
| Neural Network | Keras | Clustering | Tabular | 0.9310072 | NA | 0.9841155 | 0.7818246 | NA | 2024-07-02 11:26:21 |
| Random Forest | Keras | Classification | Image | 0.5129060 | 0.7104678 | NA | 0.8565108 | 2.1442478 | 2024-07-13 11:26:21 |
| K-Means | TensorFlow | Clustering | Tabular | 0.8251005 | 0.7074183 | 0.3204188 | NA | 1.2456829 | 2024-07-26 11:26:21 |
| Random Forest | TensorFlow | Classification | Time Series | 0.5711247 | 0.5526349 | NA | 0.4403535 | 2.9123543 | 2024-08-06 11:26:21 |
| SVM | TensorFlow | Regression | Image | 0.8460807 | 0.4840145 | 0.7039858 | NA | 0.1553824 | 2024-08-10 11:26:21 |
| SVM | TensorFlow | Clustering | Time Series | 0.8076097 | 0.5176586 | NA | 0.4242105 | 2.0486536 | 2024-08-15 11:26:21 |
| Random Forest | Scikit-learn | Classification | Time Series | 0.6531268 | NA | 0.7596418 | 0.6467149 | 4.8716220 | 2024-08-16 11:26:21 |
| Neural Network | Keras | Regression | Time Series | NA | 0.5314413 | 0.4545969 | 0.5876668 | 1.9361377 | 2024-08-22 11:26:21 |
| SVM | PyTorch | Clustering | Text | 0.9137689 | 0.8815514 | 0.9067392 | 0.8586120 | NA | 2024-08-26 11:26:21 |
| Random Forest | Keras | Clustering | Text | 0.8846524 | NA | 0.9653214 | 0.8573816 | 1.1991609 | 2024-08-28 11:26:21 |
| K-Means | Scikit-learn | Classification | Text | 0.8801449 | 0.4683030 | 0.4977472 | NA | 1.7535260 | 2024-09-03 11:26:21 |
| SVM | Keras | Regression | Time Series | NA | 0.6961279 | 0.9247908 | 0.8540381 | 3.7870949 | 2024-09-07 11:26:21 |
| Neural Network | Scikit-learn | Regression | Tabular | 0.9007686 | 0.4788935 | NA | 0.6335383 | 2.2663245 | 2024-09-10 11:26:21 |
| Random Forest | TensorFlow | Clustering | Image | 0.7271887 | NA | 0.5320953 | 0.6051432 | 2.9027798 | 2024-09-15 11:26:21 |
Se puede apreciar que la falta de datos no está relacionada con los datos. No se sabe a ciencia cierta si estos datos faltantes fueron borrados al azar o se perdieron algunos formularios, además, los datos no comparten un patrón claro, por ejemplo: que los datos faltantes en Accuracy son solo los que tienen SVM en Algorithm.
Ahora analizaremos como cambian las medidas de localización,
dispersion y distribución.
#### Accuracy
Localización:
summary(datos$Accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.5038 0.6236 0.7578 0.8779 0.8824 9.7181 39
summary(datos_sin_na$Accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5038 0.6183 0.7490 0.8458 0.8698 9.7181
Dispersión:
media<-mean(datos$Accuracy, na.rm = TRUE)
sd_a<-sd(datos$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 107.4343
media<-mean(datos_sin_na$Accuracy, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 97.14993
Distribución:
media <- mean(datos$Accuracy, na.rm = TRUE)
mediana <- median(datos$Accuracy, na.rm = TRUE)
desviacion <- sd(datos$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.38177
media <- mean(datos_sin_na$Accuracy, na.rm = TRUE)
mediana <- median(datos_sin_na$Accuracy, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3532361
Localización:
summary(datos$Precision)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.4019 0.5632 0.7195 0.8129 0.8596 9.7320 19
summary(datos_sin_na$Precision)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4031 0.5661 0.7275 0.8387 0.8670 9.7320
Dispersión:
media <- mean(datos$Precision, na.rm = TRUE)
sd_p <- sd(datos$Precision, na.rm = TRUE)
cv_p <- (sd_p / media) * 100
cv_p
## [1] 104.7427
media <- mean(datos_sin_na$Precision, na.rm = TRUE)
sd_p <- sd(datos_sin_na$Precision, na.rm = TRUE)
cv_p <- (sd_p / media) * 100
cv_p
## [1] 110.986
Distribución:
media <- mean(datos$Precision, na.rm = TRUE)
mediana <- median(datos$Precision, na.rm = TRUE)
desviacion <- sd(datos$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3292674
media <- mean(datos_sin_na$Precision, na.rm = TRUE)
mediana <- median(datos_sin_na$Precision, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.358299
Localización:
summary(datos$Recall)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.3001 0.4819 0.6493 0.7486 0.8404 9.3662 20
summary(datos_sin_na$Recall)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3001 0.4898 0.6513 0.7611 0.8365 9.3662
Dispersión:
media <- mean(datos$Recall, na.rm = TRUE)
sd_r <- sd(datos$Recall, na.rm = TRUE)
cv_r <- (sd_r / media) * 100
cv_r
## [1] 104.7911
media <- mean(datos_sin_na$Recall, na.rm = TRUE)
sd_r <- sd(datos_sin_na$Recall, na.rm = TRUE)
cv_r <- (sd_r / media) * 100
cv_r
## [1] 109.2378
Distribución:
media <- mean(datos$Recall, na.rm = TRUE)
mediana <- median(datos$Recall, na.rm = TRUE)
desviacion <- sd(datos$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3797481
media <- mean(datos_sin_na$Recall, na.rm = TRUE)
mediana <- median(datos_sin_na$Recall, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3961938
Localización:
summary(datos$F1_Score)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.4000 0.5515 0.7086 0.8122 0.8438 9.3740 20
summary(datos_sin_na$F1_Score)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4000 0.5486 0.7031 0.8014 0.8396 9.3740
Dispersión:
media <- mean(datos$F1_Score, na.rm = TRUE)
sd_f1 <- sd(datos$F1_Score, na.rm = TRUE)
cv_f1 <- (sd_f1 / media) * 100
cv_f1
## [1] 109.9297
media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
sd_f1 <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
cv_f1 <- (sd_f1 / media) * 100
cv_f1
## [1] 109.1754
Distribución:
media <- mean(datos$F1_Score, na.rm = TRUE)
mediana <- median(datos$F1_Score, na.rm = TRUE)
desviacion <- sd(datos$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3479023
media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
mediana <- median(datos_sin_na$F1_Score, na.rm = TRUE)
desviacion <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3369211
Localización:
summary(datos$Training_Time)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.1032 1.2441 2.4347 2.9910 3.8131 46.9856 20
summary(datos_sin_na$Training_Time)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1032 1.2982 2.4809 3.0598 3.8446 46.9856
Dispersión:
media <- mean(datos$Training_Time, na.rm = TRUE)
sd_tt <- sd(datos$Training_Time, na.rm = TRUE)
cv_tt <- (sd_tt / media) * 100
cv_tt
## [1] 147.9032
media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
sd_tt <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
cv_tt <- (sd_tt / media) * 100
cv_tt
## [1] 151.8546
Distribución:
media <- mean(datos$Training_Time, na.rm = TRUE)
mediana <- median(datos$Training_Time, na.rm = TRUE)
desviacion <- sd(datos$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3772653
media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
mediana <- median(datos_sin_na$Training_Time, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.3737476
Conclusion: Se puede concluir que el tratamiento de datos NA’s no afecta la distribucuión de los datos y el cambio que sufrieron las medidas no afecta en nada el análisis.
Para la detección de datos atípicos usaremos el gráfico de caja y bigotes, y el filtro de Hampel se usará por encima de los percentiles porque este sirve más para variables que no siguen una distribución normal o tiene colas muy largas, como es el caso. Además, trabaja con medidas más robustas como la media y la mediana
Gráfica de caja y bigotes
ggplot(datos_sin_na) +
aes(x = "", y = Accuracy) +
geom_boxplot(fill = "#0c4c8a") +
theme_minimal()
Filtro de Hampel
lower_bound <- median(datos_sin_na$Accuracy) - 3 * mad(datos_sin_na$Accuracy, constant = 1)
lower_bound
## [1] 0.3595111
upper_bound <- median(datos_sin_na$Accuracy) + 3 * mad(datos_sin_na$Accuracy, constant = 1)
upper_bound
## [1] 1.13852
outlier_ind <- which(datos_sin_na$Accuracy < lower_bound | datos_sin_na$Accuracy > upper_bound)
outlier_ind
## [1] 15 77 110 112 196 232 239
datos_sin_na[outlier_ind, ]
## # A tibble: 7 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Random Forest PyTorch Regression Text 9.72 0.782 0.548
## 2 Neural Network PyTorch Regression Time Series 5.26 0.506 0.829
## 3 K-Means TensorFlow Regression Tabular 7.13 0.521 0.441
## 4 SVM Scikit-lea… Regression Tabular 5.20 0.489 0.680
## 5 SVM TensorFlow Classificat… Image 8.29 0.798 0.753
## 6 Random Forest PyTorch Clustering Tabular 7.90 0.521 0.363
## 7 SVM Scikit-lea… Regression Tabular 5.98 0.928 0.799
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.
library(EnvStats)
##
## Adjuntando el paquete: 'EnvStats'
## The following objects are masked from 'package:stats':
##
## predict, predict.lm
test <- rosnerTest(datos_sin_na$Accuracy, k = 7)
test$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 0.8457621 0.8216572 9.718080 15 10.79808 3.833870 TRUE
## 2 1 0.8259135 0.7069241 8.294427 196 10.56480 3.833271 TRUE
## 3 2 0.8091679 0.6125670 7.900862 232 11.57701 3.832670 TRUE
## 4 3 0.7932315 0.5124044 7.127467 110 12.36179 3.832068 TRUE
## 5 4 0.7789652 0.4151830 5.978890 239 12.52442 3.831464 TRUE
## 6 5 0.7672273 0.3338475 5.259856 77 13.45713 3.830859 TRUE
## 7 6 0.7570629 0.2565839 5.200546 112 17.31786 3.830252 TRUE
Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.
Capping
Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.
caps <- quantile(datos_sin_na$Accuracy, probs=c(.05, .95), na.rm = T)
datos_sin_na$Accuracy[datos_sin_na$Accuracy < lower_bound] <- caps[1]
datos_sin_na$Accuracy[datos_sin_na$Accuracy > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 7 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Random Forest PyTorch Regression Text 0.988 0.782 0.548
## 2 Neural Network PyTorch Regression Time Series 0.988 0.506 0.829
## 3 K-Means TensorFlow Regression Tabular 0.988 0.521 0.441
## 4 SVM Scikit-lea… Regression Tabular 0.988 0.489 0.680
## 5 SVM TensorFlow Classificat… Image 0.988 0.798 0.753
## 6 Random Forest PyTorch Clustering Tabular 0.988 0.521 0.363
## 7 SVM Scikit-lea… Regression Tabular 0.988 0.928 0.799
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Grafica de accuracy sin Outeliers
ggplot(datos_sin_na, aes(y = Accuracy)) +
geom_boxplot(outlier.colour = "orange", outlier.shape = 16, outlier.size = 2, fill = "skyblue", color = "darkblue") +
labs(title = "Caja de Bigote de Accuracy sin outeliers",
y = "Accuracy") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
Gráfica de caja y bigotes
ggplot(datos_sin_na) +
aes(x = "", y = Precision) +
geom_boxplot(fill = "darkred") +
theme_minimal()
Filtro de Hampel
lower_bound <- median(datos_sin_na$Precision) - 3 * mad(datos_sin_na$Precision, constant = 1)
lower_bound
## [1] 0.2777772
upper_bound <- median(datos_sin_na$Precision) + 3 * mad(datos_sin_na$Precision, constant = 1)
upper_bound
## [1] 1.177228
outlier_ind <- which(datos_sin_na$Precision < lower_bound | datos_sin_na$Precision > upper_bound)
outlier_ind
## [1] 6 96 111 157 223 241 250 288 433 439
length(outlier_ind)
## [1] 10
datos_sin_na[outlier_ind, ]
## # A tibble: 10 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 SVM PyTorch Regression Image 0.897 9.73 0.781
## 2 SVM Scikit-le… Classificat… Image 0.591 4.06 0.482
## 3 K-Means TensorFlow Clustering Time Series 0.686 6.21 0.328
## 4 K-Means Scikit-le… Classificat… Time Series 0.603 4.15 7.74
## 5 Neural Network Keras Clustering Text 0.537 9.67 0.819
## 6 SVM TensorFlow Classificat… Image 0.824 5.43 0.976
## 7 Random Forest TensorFlow Regression Text 0.902 8.93 0.603
## 8 SVM PyTorch Clustering Text 0.584 4.08 0.717
## 9 SVM Keras Regression Text 0.506 5.76 0.707
## 10 K-Means PyTorch Regression Tabular 0.768 7.04 0.926
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.
library(EnvStats)
test <- rosnerTest(datos_sin_na$Precision, k = 10)
test$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 0.8386718 0.9308086 9.732008 6 9.55442 3.833870 TRUE
## 2 1 0.8187762 0.8310328 9.674189 223 10.65591 3.833271 TRUE
## 3 2 0.7989210 0.7180191 8.932619 250 11.32797 3.832670 TRUE
## 4 3 0.7806431 0.6061150 7.044472 439 10.33439 3.832068 TRUE
## 5 4 0.7665353 0.5286183 6.207645 111 10.29308 3.831464 TRUE
## 6 5 0.7542529 0.4614512 5.760933 433 10.84986 3.830859 TRUE
## 7 6 0.7429256 0.3955383 5.432777 241 11.85688 3.830252 TRUE
## 8 7 0.7322910 0.3266570 4.145151 157 10.44784 3.829643 TRUE
## 9 8 0.7245345 0.2834703 4.075645 288 11.82174 3.829033 TRUE
## 10 9 0.7169010 0.2341822 4.055990 96 14.25851 3.828422 TRUE
Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.
Capping
Para el tratamiento de valoresa atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.
caps <- quantile(datos_sin_na$Precision, probs=c(.05, .95), na.rm = T)
datos_sin_na$Precision[datos_sin_na$Precision < lower_bound] <- caps[1]
datos_sin_na$Precision[datos_sin_na$Precision > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 10 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 SVM PyTorch Regression Image 0.897 0.982 0.781
## 2 SVM Scikit-le… Classificat… Image 0.591 0.982 0.482
## 3 K-Means TensorFlow Clustering Time Series 0.686 0.982 0.328
## 4 K-Means Scikit-le… Classificat… Time Series 0.603 0.982 7.74
## 5 Neural Network Keras Clustering Text 0.537 0.982 0.819
## 6 SVM TensorFlow Classificat… Image 0.824 0.982 0.976
## 7 Random Forest TensorFlow Regression Text 0.902 0.982 0.603
## 8 SVM PyTorch Clustering Text 0.584 0.982 0.717
## 9 SVM Keras Regression Text 0.506 0.982 0.707
## 10 K-Means PyTorch Regression Tabular 0.768 0.982 0.926
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Grafica de Precision sin Outeliers
ggplot(datos_sin_na, aes(y = Precision)) +
geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 2, fill = "gold", color = "darkorange") +
labs(title = "Caja de Bigote de Precision sin Outeliers",
y = "Precision") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
Gráfica de caja y bigotes
ggplot(datos_sin_na) +
aes(x = "", y = Recall) +
geom_boxplot(fill = "#0c4") +
theme_minimal()
Filtro de Hampel
lower_bound <- median(datos_sin_na$Recall) - 3 * mad(datos_sin_na$Recall, constant = 1)
lower_bound
## [1] 0.1155031
upper_bound <- median(datos_sin_na$Recall) + 3 * mad(datos_sin_na$Recall, constant = 1)
upper_bound
## [1] 1.187093
outlier_ind <- which(datos_sin_na$Recall < lower_bound | datos_sin_na$Recall > upper_bound)
outlier_ind
## [1] 4 88 114 157 221 270 303 308 420
length(outlier_ind)
## [1] 9
datos_sin_na[outlier_ind, ]
## # A tibble: 9 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 K-Means PyTorch Regression Image 0.637 0.626 7.45
## 2 K-Means PyTorch Clustering Image 0.822 0.725 5.73
## 3 K-Means Scikit-lea… Clustering Image 0.719 0.998 9.37
## 4 K-Means Scikit-lea… Classificat… Time Series 0.603 0.982 7.74
## 5 Neural Network PyTorch Classificat… Text 0.724 0.449 3.44
## 6 K-Means Keras Clustering Image 0.801 0.521 5.77
## 7 Neural Network TensorFlow Classificat… Text 0.702 0.615 4.86
## 8 Neural Network TensorFlow Clustering Time Series 0.564 0.827 5.44
## 9 Random Forest TensorFlow Clustering Image 0.531 0.566 5.50
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.
library(EnvStats)
test <- rosnerTest(datos_sin_na$Recall, k = 9)
test$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 0.7610971 0.8314054 9.366182 114 10.350047 3.833870 TRUE
## 2 1 0.7418463 0.7255257 7.737749 157 9.642529 3.833271 TRUE
## 3 2 0.7261605 0.6460189 7.454810 4 10.415561 3.832670 TRUE
## 4 3 0.7110399 0.5622109 5.765916 270 8.991068 3.832068 TRUE
## 5 4 0.6996551 0.5089064 5.726373 88 9.877490 3.831464 TRUE
## 6 5 0.6883081 0.4497505 5.499848 420 10.698244 3.830859 TRUE
## 7 6 0.6774222 0.3874519 5.436669 308 12.283452 3.830252 TRUE
## 8 7 0.6666303 0.3144283 4.859080 303 13.333562 3.829643 TRUE
## 9 8 0.6571020 0.2428199 3.438827 221 11.455922 3.829033 TRUE
Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.
Capping
Para el tratamiento de valoresa atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.
caps <- quantile(datos_sin_na$Recall, probs=c(.05, .95), na.rm = T)
datos_sin_na$Recall[datos_sin_na$Recall < lower_bound] <- caps[1]
datos_sin_na$Recall[datos_sin_na$Recall > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 9 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 K-Means PyTorch Regression Image 0.637 0.626 0.976
## 2 K-Means PyTorch Clustering Image 0.822 0.725 0.976
## 3 K-Means Scikit-lea… Clustering Image 0.719 0.998 0.976
## 4 K-Means Scikit-lea… Classificat… Time Series 0.603 0.982 0.976
## 5 Neural Network PyTorch Classificat… Text 0.724 0.449 0.976
## 6 K-Means Keras Clustering Image 0.801 0.521 0.976
## 7 Neural Network TensorFlow Classificat… Text 0.702 0.615 0.976
## 8 Neural Network TensorFlow Clustering Time Series 0.564 0.827 0.976
## 9 Random Forest TensorFlow Clustering Image 0.531 0.566 0.976
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Grafica de Recall sin Outeliers
ggplot(datos_sin_na, aes(y = Recall)) +
geom_boxplot(outlier.colour = "blue", outlier.shape = 16, outlier.size = 2, fill = "lightgreen", color = "darkgreen") +
labs(title = "Caja de Bigote de Recall",
y = "Recall") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
Gráfica de caja y bigotes
ggplot(datos_sin_na) +
aes(x = "", y = F1_Score) +
geom_boxplot(fill = "#0C945689") +
theme_minimal()
Filtro de Hampel
lower_bound <- median(datos_sin_na$F1_Score) - 3 * mad(datos_sin_na$F1_Score, constant = 1)
lower_bound
## [1] 0.2672388
upper_bound <- median(datos_sin_na$F1_Score) + 3 * mad(datos_sin_na$F1_Score, constant = 1)
upper_bound
## [1] 1.139019
outlier_ind <- which(datos_sin_na$F1_Score < lower_bound | datos_sin_na$F1_Score > upper_bound)
outlier_ind
## [1] 160 230 267 281 296 316 333 437
length(outlier_ind)
## [1] 8
datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 SVM TensorFlow Regression Tabular 0.766 0.579 0.817
## 2 Neural Network Scikit-lea… Clustering Image 0.671 0.676 0.937
## 3 SVM PyTorch Clustering Image 0.522 0.566 0.817
## 4 K-Means TensorFlow Regression Tabular 0.637 0.651 0.565
## 5 K-Means PyTorch Classificat… Text 0.773 0.915 0.841
## 6 K-Means TensorFlow Classificat… Image 0.677 0.905 0.621
## 7 Random Forest Scikit-lea… Clustering Time Series 0.588 0.792 0.736
## 8 Neural Network PyTorch Classificat… Text 0.630 0.770 0.831
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.
library(EnvStats)
test <- rosnerTest(datos_sin_na$F1_Score, k = 8)
test$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 0.8013885 0.8749189 9.374049 296 9.798234 3.833870 TRUE
## 2 1 0.7822103 0.7759213 9.295359 316 10.971665 3.833271 TRUE
## 3 2 0.7631225 0.6634602 8.178579 281 11.176942 3.832670 TRUE
## 4 3 0.7464585 0.5630661 7.747684 437 12.434110 3.832068 TRUE
## 5 4 0.7306900 0.4548205 5.499742 230 10.485569 3.831464 TRUE
## 6 5 0.7199247 0.3946604 5.320668 333 11.657473 3.830859 TRUE
## 7 6 0.7095157 0.3286397 5.131244 160 13.454635 3.830252 TRUE
## 8 7 0.6994892 0.2524146 4.632073 267 15.579855 3.829643 TRUE
Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.
Capping Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.
caps <- quantile(datos_sin_na$F1_Score, probs=c(.05, .95), na.rm = T)
datos_sin_na$F1_Score[datos_sin_na$F1_Score < lower_bound] <- caps[1]
datos_sin_na$F1_Score[datos_sin_na$F1_Score > upper_bound] <- caps[2]
datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 SVM TensorFlow Regression Tabular 0.766 0.579 0.817
## 2 Neural Network Scikit-lea… Clustering Image 0.671 0.676 0.937
## 3 SVM PyTorch Clustering Image 0.522 0.566 0.817
## 4 K-Means TensorFlow Regression Tabular 0.637 0.651 0.565
## 5 K-Means PyTorch Classificat… Text 0.773 0.915 0.841
## 6 K-Means TensorFlow Classificat… Image 0.677 0.905 0.621
## 7 Random Forest Scikit-lea… Clustering Time Series 0.588 0.792 0.736
## 8 Neural Network PyTorch Classificat… Text 0.630 0.770 0.831
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Grafico sin Outeliers de F1_Score
ggplot(datos_sin_na, aes(y = F1_Score)) +
geom_boxplot(outlier.colour = "purple", outlier.shape = 16, outlier.size = 2, fill = "yellow", color = "orange") +
labs(title = "Caja de Bigote de F1 Score",
y = "F1 Score") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
Gráfica de caja y bigotes
ggplot(datos_sin_na) +
aes(x = "", y = Training_Time) +
geom_boxplot(fill = "#F39C12") +
theme_minimal()
Filtro de Hampel
lower_bound <- median(datos_sin_na$Training_Time) - 3 * mad(datos_sin_na$Training_Time, constant = 1)
lower_bound
## [1] -1.363486
upper_bound <- median(datos_sin_na$Training_Time) + 3 * mad(datos_sin_na$Training_Time, constant = 1)
upper_bound
## [1] 6.325322
outlier_ind <- which(datos_sin_na$Training_Time < lower_bound | datos_sin_na$Training_Time > upper_bound)
outlier_ind
## [1] 100 109 201 214 217 324 344 417
length(outlier_ind)
## [1] 8
datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Neural Network PyTorch Regression Tabular 0.987 0.733 0.706
## 2 Neural Network Scikit-lea… Classificat… Time Series 0.871 0.847 0.734
## 3 Neural Network PyTorch Clustering Tabular 0.524 0.556 0.737
## 4 Neural Network Keras Clustering Tabular 0.718 0.551 0.394
## 5 Random Forest TensorFlow Classificat… Text 0.772 0.714 0.824
## 6 SVM Scikit-lea… Regression Tabular 0.662 0.913 0.970
## 7 K-Means TensorFlow Regression Tabular 0.996 0.728 0.795
## 8 K-Means PyTorch Regression Time Series 0.561 0.617 0.828
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.
library(EnvStats)
test <- rosnerTest(datos_sin_na$Training_Time, k = 8)
test$all.stats
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 3.059781 4.646419 46.98563 324 9.453699 3.833870 TRUE
## 2 1 2.961513 4.159537 46.83874 217 10.548585 3.833271 TRUE
## 3 2 2.863133 3.606191 44.58645 100 11.569913 3.832670 TRUE
## 4 3 2.769373 3.017332 44.35790 201 13.783214 3.832068 TRUE
## 5 4 2.675705 2.282925 28.29499 344 11.222129 3.831464 TRUE
## 6 5 2.617874 1.932676 20.93344 109 9.476788 3.830859 TRUE
## 7 6 2.576436 1.726646 20.25186 417 10.236856 3.830252 TRUE
## 8 7 2.536356 1.508782 13.79138 214 7.459672 3.829643 TRUE
Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.
Capping
Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.
caps <- quantile(datos_sin_na$Training_Time, probs=c(.05, .95), na.rm = T)
datos_sin_na$Training_Time[datos_sin_na$Training_Time < lower_bound] <- caps[1]
datos_sin_na$Training_Time[datos_sin_na$Training_Time > upper_bound] <- caps[2]
Grafico sin outeliers de Training time
datos_sin_na[outlier_ind, ]
## # A tibble: 8 × 10
## Algorithm Framework Problem_Type Dataset_Type Accuracy Precision Recall
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Neural Network PyTorch Regression Tabular 0.987 0.733 0.706
## 2 Neural Network Scikit-lea… Classificat… Time Series 0.871 0.847 0.734
## 3 Neural Network PyTorch Clustering Tabular 0.524 0.556 0.737
## 4 Neural Network Keras Clustering Tabular 0.718 0.551 0.394
## 5 Random Forest TensorFlow Classificat… Text 0.772 0.714 0.824
## 6 SVM Scikit-lea… Regression Tabular 0.662 0.913 0.970
## 7 K-Means TensorFlow Regression Tabular 0.996 0.728 0.795
## 8 K-Means PyTorch Regression Time Series 0.561 0.617 0.828
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>
ggplot(datos_sin_na, aes(y = Training_Time)) +
geom_boxplot(outlier.colour = "brown", outlier.shape = 16, outlier.size = 2, fill = "lightcoral", color = "darkred") +
labs(title = "Caja de Bigote de Training Time",
y = "Training Time") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
Medidas de tendencia central
library(pander)
pander(summary(datos_sin_na$Accuracy))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 0.5038 | 0.6183 | 0.749 | 0.7507 | 0.8698 | 0.9997 |
Coeficiente de dispersión
media<-mean(datos_sin_na$Accuracy, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 19.56421
Asimetría
media <- mean(datos_sin_na$Accuracy, na.rm = TRUE)
mediana <- median(datos_sin_na$Accuracy, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.03537286
Prueba de normalidad
library(nortest)
ad_test <- ad.test(datos_sin_na$Accuracy)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos_sin_na$Accuracy
## A = 4.9922, p-value = 2.319e-12
Gráfica
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Accuracy), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Histograma de Accuracy") +
xlab("Accuracy") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Medidas de tendencia central
library(pander)
pander(summary(datos_sin_na$Precision))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 0.4031 | 0.5661 | 0.7275 | 0.7154 | 0.867 | 0.999 |
Coeficiente de dispersión
media<-mean(datos_sin_na$Precision, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Precision, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a
## [1] 24.35969
Asimetría
media <- mean(datos_sin_na$Precision, na.rm = TRUE)
mediana <- median(datos_sin_na$Precision, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] -0.2088443
Prueba de normalidad
library(nortest)
ad_test <- ad.test(datos_sin_na$Precision)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos_sin_na$Precision
## A = 5.5097, p-value = 1.331e-13
Gráfica
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Precision), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Histograma de Precision") +
xlab("Precision") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Medidas de tendencia central
library(pander)
pander(summary(datos_sin_na$Recall))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 0.3001 | 0.4898 | 0.6513 | 0.6573 | 0.8365 | 0.9985 |
Coeficiente de dispersión
media <- mean(datos_sin_na$Recall, na.rm = TRUE)
sd_a <- sd(datos_sin_na$Recall, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a
## [1] 31.41571
Asimetría
media <- mean(datos_sin_na$Recall, na.rm = TRUE)
mediana <- median(datos_sin_na$Recall, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.08711636
Prueba de normalidad
library(nortest)
ad_test <- ad.test(datos_sin_na$Recall)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos_sin_na$Recall
## A = 5.1701, p-value = 8.675e-13
Gráfica
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Recall), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Histograma de Recall") +
xlab("Recall") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Medidas de tendencia central
library(pander)
pander(summary(datos_sin_na$F1_Score))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 0.4 | 0.5486 | 0.7031 | 0.6955 | 0.8396 | 0.9993 |
Coeficiente de dispersión
media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
sd_a <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a
## [1] 24.64901
Asimetría
media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
mediana <- median(datos_sin_na$F1_Score, na.rm = TRUE)
desviacion <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] -0.1334735
Prueba de normalidad
library(nortest)
ad_test <- ad.test(datos_sin_na$F1_Score)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos_sin_na$F1_Score
## A = 5.019, p-value = 2e-12
Gráfica
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$F1_Score), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Histograma de F1_Score") +
xlab("F1_Score") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Medidas de tendencia central
library(pander)
pander(summary(datos_sin_na$Training_Time))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 0.1032 | 1.298 | 2.481 | 2.552 | 3.845 | 4.998 |
Coeficiente de dispersión
media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
sd_a <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a
## [1] 56.11634
Asimetría
media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
mediana <- median(datos_sin_na$Training_Time, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson
## [1] 0.1492231
Prueba de normalidad
library(nortest)
ad_test <- ad.test(datos_sin_na$Training_Time)
print(ad_test)
##
## Anderson-Darling normality test
##
## data: datos_sin_na$Training_Time
## A = 6.1063, p-value = 4.996e-15
Gráfica
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Training_Time), aes(x=value)) +
geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
ggtitle("Histograma de Training_Time") +
xlab("Training_Time") + ylab("Frequencia") +
theme_ipsum() +
theme(plot.title = element_text(size=15))
ggp1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Recordemos que nuestra pregunta problema es: ¿Cómo varía la precisión (Precision) de los diferentes algoritmos de aprendizaje automático según el tipo de problema (Problem_Type)?
Para resolver esta pregunta haremos 2 tipos de graficas:
ggplot(datos_sin_na, aes(x = Problem_Type, y = Precision)) +
geom_jitter(aes(color = Problem_Type), size = 3, width = 0.2) +
labs(title = "Precisión por Tipo de Problema",
x = "Tipo de Problema",
y = "Precisión") +
theme_minimal()
Este gráfico muestra la distribución de las precisiones individuales para cada tipo de problema:clasificación, agrupamiento y regresión.
En el eje vertical se observa la precisión, que varía entre 0.4 y 1.0, mientras que el eje horizontal clasifica los tres tipos de problemas. Cada punto representa la precisión de un modelo en su respectiva categoría. Se puede ver que los modelos de clasificación y agrupamiento generalmente tienen una precisión más alta y concentrada en valores cercanos a 1.0, mientras que los modelos de regresión presentan una mayor dispersión y una precisión generalmente más baja.
ggplot(datos_sin_na, aes(x = Problem_Type, y = Precision, fill = Problem_Type)) +
stat_summary(fun = mean, geom = "bar", position = "dodge") +
labs(title = "Media de Precisión por Tipo de Problema",
x = "Tipo de Problema",
y = "Media de Precisión") +
theme_minimal()
Este gráfico presenta el promedio de las precisiones obtenidas, con regresión ligeramente superior a clasificación y clustering, aunque la diferencia es mínima. Esto sugiere que los modelos en cada tipo de problema tienden a obtener niveles de precisión comparables en promedio.
Analisis combinado
Variabilidad: La primera gráfica (strip plot) muestra la dispersión de la precisión de cada modelo para los tres tipos de problemas. Aquí se observa que los modelos de clasificación y clustering tienden a tener una precisión bastante concentrada en niveles altos, mientras que los modelos de regresión presentan una mayor variabilidad, con algunos modelos alcanzando precisiones bajas.
La segunda gráfica (gráfico de barras) presenta las medias de precisión para cada tipo de problema. En esta gráfica se observa que, en promedio, los tres tipos de problemas tienen medias de precisión similares, alrededor de 0.65, con los problemas de regresión mostrando una ligera ventaja.
Conclusión
El análisis de la precisión de diferentes algoritmos de aprendizaje automático muestra que los métodos de regresión son más efectivos en comparación con los de clasificación y clustering, presentando una mayor mediana y un rango más amplio de valores. Mientras que la clasificación y el clustering muestran un rendimiento similar y menor.