EDA IA

En este análisis exploratorio de datos (EDA), se buscará comprender el comportamiento de distintos algoritmos de Inteligencia Artificial (IA), frameworks, tipos de problemas y conjuntos de datos mediante el análisis de métricas clave como precisión, recall, F1_Score y tiempo de entrenamiento. Este proceso permitirá identificar patrones, relaciones y posibles optimizaciones en el uso de diferentes herramientas de IA, con el fin de mejorar el rendimiento de los modelos en distintos contextos.

La pregunta que se abordará es:

¿Cómo varía la precisión (Precision) de los diferentes algoritmos de aprendizaje automático según el tipo de problema (Problem_Type)?

Esta pregunta nos permitirá evaluar la eficiencia de los distintos frameworks al implementar redes neuronales, enfocándonos en cómo se desempeñan en términos de tiempo, un factor crucial en el entrenamiento de modelos.

Diccionario de variables que se encuentran en nuestra base de datos:

Algorithm (categórica): Tipo de algoritmo de IA utilizado(‘Neural Network’, ‘Random Forest’, ‘SVM’, ‘K-Means’)
Framework (categórica): Framework o biblioteca utilizada para la implementación del modelo de IA(‘TensorFlow’, ‘PyTorch’, ‘Keras’,‘Scikit-learn’)
Problem_Type (categórica): Tipo de problema abordado por el modelo.(‘Classification’, ‘Regression’, ‘Clustering’)
Dataset_Type (categórica): Tipo de datos utilizados en el entrenamiento del modelo(‘Image’, ‘Text’, ‘Tabular’, ‘Time Series’.)
Accuracy (numérica, continua): Precisión del modelo en el conjunto de prueba (entre 0 y 1).}
Precision (numérica, continua): Precisión del modelo (valor entre 0 y 1).
Recall (numérica, continua): Sensibilidad o capacidad del modelo para identificar correctamente los positivos (entre 0 y 1).
F1_Score (numérica, continua): Medida armónica entre precisión y recall (entre 0 y 1).
Training_Time (numérica, continua): Tiempo de entrenamiento del modelo en horas.
Date (fecha): Fecha en la que se realizó la evaluación del modelo, cubriendo el último año.

Cargar la base de datos

¿Qué framework presenta el menor tiempo de entrenamiento (Training_Time) promedio al trabajar con redes neuronales (Neural Networks) en problemas de clasificación?

library(readxl)
rut<-"C:/Users/User/Documents/PROYECTOS RSTUDIO/EDA semana uribe/Dataset_IA_corte_II.xlsx"
datos<-read_excel(rut)
head(datos)

## # A tibble: 6 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 SVM            Scikit-lea… Regression   Time Series     0.662     0.693 NA    
## 2 K-Means        Keras       Clustering   Time Series     0.744     0.490  0.877
## 3 Neural Network Keras       Clustering   Image           0.885     0.595  0.969
## 4 SVM            Keras       Clustering   Text            0.842     0.842  0.875
## 5 SVM            Scikit-lea… Regression   Tabular         0.723     0.686  0.301
## 6 K-Means        PyTorch     Regression   Image           0.637     0.626  7.45 
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Estructura del Dataset

names(datos)

##  [1] "Algorithm"     "Framework"     "Problem_Type"  "Dataset_Type" 
##  [5] "Accuracy"      "Precision"     "Recall"        "F1_Score"     
##  [9] "Training_Time" "Date"

dim(datos)

## [1] 560  10

str(datos)

## tibble [560 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:560] "SVM" "K-Means" "Neural Network" "SVM" ...
##  $ Framework    : chr [1:560] "Scikit-learn" "Keras" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:560] "Regression" "Clustering" "Clustering" "Clustering" ...
##  $ Dataset_Type : chr [1:560] "Time Series" "Time Series" "Image" "Text" ...
##  $ Accuracy     : num [1:560] 0.662 0.744 0.885 0.842 0.723 ...
##  $ Precision    : num [1:560] 0.693 0.49 0.595 0.842 0.686 ...
##  $ Recall       : num [1:560] NA 0.877 0.969 0.875 0.301 ...
##  $ F1_Score     : num [1:560] 0.443 0.441 0.964 0.704 0.646 ...
##  $ Training_Time: num [1:560] 4.98 NA 3.28 4.04 3.6 ...
##  $ Date         : POSIXct[1:560], format: "2023-03-08 11:26:21" "2023-03-09 11:26:21" ...

Base de datos completa

library(knitr)
library(kableExtra)
kable(datos, caption="Base de datos, Inteligencia Artificial(IA)") %>%
  kable_styling(full_width=F) %>%
  column_spec(2, width="20em") %>%
  scroll_box(width="900px", height="450px")

Base de datos, Inteligencia Artificial(IA)
Algorithm	Framework	Problem_Type	Dataset_Type	Accuracy	Precision	Recall	F1_Score	Training_Time	Date
SVM	Scikit-learn	Regression	Time Series	0.6618051	0.6929447	NA	0.4426950	4.9785924	2023-03-08 11:26:21
K-Means	Keras	Clustering	Time Series	0.7443216	0.4900292	0.8766533	0.4414046	NA	2023-03-09 11:26:21
Neural Network	Keras	Clustering	Image	0.8852037	0.5948056	0.9685424	0.9644707	3.2825938	2023-03-10 11:26:21
SVM	Keras	Clustering	Text	0.8416477	0.8424142	0.8748388	0.7041523	4.0416289	2023-03-11 11:26:21
SVM	Scikit-learn	Regression	Tabular	0.7229514	0.6856109	0.3010956	0.6456472	3.6039908	2023-03-12 11:26:21
K-Means	PyTorch	Regression	Image	0.6368133	0.6255330	7.4548096	0.8865271	3.0064753	2023-03-13 11:26:21
Neural Network	PyTorch	Regression	Text	0.9985623	0.6366858	0.3357948	0.9014956	NA	2023-03-14 11:26:21
Neural Network	Scikit-learn	Regression	Image	0.7130907	0.6756681	0.4803251	0.5993146	2.3283453	2023-03-15 11:26:21
SVM	Keras	Regression	Time Series	NA	0.8710099	0.3416673	0.8161708	3.4064529	2023-03-16 11:26:21
Random Forest	Keras	Regression	Text	0.5818119	0.9352508	NA	0.8626737	3.4199049	2023-03-17 11:26:21
SVM	PyTorch	Regression	Image	0.8974048	9.7320081	0.7806129	0.7927904	1.9283008	2023-03-18 11:26:21
SVM	Keras	Clustering	Image	0.8468411	0.8721420	0.3801413	0.4909570	4.7142907	2023-03-19 11:26:21
SVM	TensorFlow	Clustering	Tabular	0.6103848	0.5892441	0.5686872	0.9255299	0.9200495	2023-03-20 11:26:21
SVM	PyTorch	Clustering	Image	0.5411905	0.8128808	0.6193656	0.7234567	2.5517613	2023-03-21 11:26:21
K-Means	Keras	Clustering	Text	0.8402497	0.6625619	0.5583371	0.5694835	3.4853315	2023-03-22 11:26:21
Neural Network	PyTorch	Regression	Text	NA	0.5528024	0.3847175	0.6551369	3.5159654	2023-03-23 11:26:21
K-Means	TensorFlow	Classification	Tabular	0.6366298	0.9045229	0.5932635	0.4225427	3.2783309	2023-03-24 11:26:21
K-Means	PyTorch	Regression	Text	0.9754318	0.4230558	0.8258246	0.4767201	1.4489122	2023-03-25 11:26:21
K-Means	PyTorch	Classification	Time Series	0.5755289	0.9410572	0.3497054	0.8593281	0.8654122	2023-03-26 11:26:21
SVM	PyTorch	Clustering	Text	0.7161674	0.6768865	0.3561260	0.4000070	3.2161076	2023-03-27 11:26:21
Random Forest	PyTorch	Regression	Text	9.7180796	0.7823209	0.5483399	0.6499395	3.0365804	2023-03-28 11:26:21
Neural Network	TensorFlow	Clustering	Image	0.7098637	0.7956124	0.9592080	0.7135061	0.9788445	2023-03-29 11:26:21
Random Forest	Scikit-learn	Regression	Image	0.8192630	0.9370706	0.7680009	0.4327807	3.5551818	2023-03-30 11:26:21
K-Means	PyTorch	Clustering	Text	0.6987972	0.7820018	0.7750690	0.9838469	2.3303428	2023-03-31 11:26:21
K-Means	PyTorch	Classification	Tabular	0.6371076	0.7683602	0.5533440	0.5356752	3.3720269	2023-04-01 11:26:21
Random Forest	Keras	Classification	Image	0.9919888	0.4399912	0.7155626	0.5825192	4.2030987	2023-04-02 11:26:21
Random Forest	PyTorch	Classification	Tabular	0.7046670	0.7110448	0.3070918	0.5823655	0.9316693	2023-04-03 11:26:21
Random Forest	Keras	Regression	Text	0.9470496	0.4901014	0.7452672	0.5382500	0.1938099	2023-04-04 11:26:21
K-Means	Scikit-learn	Classification	Text	0.6149773	0.8424603	0.9393009	0.4008843	3.9176027	2023-04-05 11:26:21
K-Means	TensorFlow	Regression	Time Series	NA	0.7073332	0.7288014	0.8376069	3.0875174	2023-04-06 11:26:21
Neural Network	TensorFlow	Classification	Image	0.5155670	0.8081367	0.9115890	0.9801073	3.5282339	2023-04-07 11:26:21
Neural Network	Scikit-learn	Classification	Text	0.8258334	0.4250037	NA	0.5345761	4.2069594	2023-04-08 11:26:21
K-Means	TensorFlow	Regression	Time Series	0.6842632	0.4508752	0.3843909	0.7978283	4.0337736	2023-04-09 11:26:21
Random Forest	Scikit-learn	Regression	Image	NA	0.8297940	0.9317173	8.4513780	4.8085087	2023-04-10 11:26:21
Random Forest	Scikit-learn	Classification	Text	0.7366050	0.4432506	0.3465107	0.9090552	2.7260819	2023-04-11 11:26:21
Neural Network	Scikit-learn	Regression	Text	0.9840967	0.4427540	0.6737772	0.6535775	2.4936286	2023-04-12 11:26:21
K-Means	TensorFlow	Classification	Image	5.9276276	NA	0.3994960	0.5817585	2.0692473	2023-04-13 11:26:21
Neural Network	Keras	Regression	Image	0.9343116	0.9739008	0.3081946	0.5951771	0.8530861	2023-04-14 11:26:21
Neural Network	Scikit-learn	Clustering	Image	0.8882984	0.8425050	0.5954240	0.8275728	NA	2023-04-15 11:26:21
SVM	PyTorch	Clustering	Text	0.8854609	0.6119508	0.5065285	0.8900677	1.4573614	2023-04-16 11:26:21
SVM	TensorFlow	Clustering	Text	0.9223916	0.5779213	0.6402003	0.5089684	4.6144068	2023-04-17 11:26:21
SVM	Scikit-learn	Clustering	Time Series	0.8805120	0.6098219	0.7040399	NA	2.9576349	2023-04-18 11:26:21
Random Forest	Scikit-learn	Clustering	Tabular	0.8131102	0.8647921	NA	0.9411641	3.0049169	2023-04-19 11:26:21
K-Means	PyTorch	Classification	Image	0.5656224	0.7968224	0.3861023	0.8840161	1.8391308	2023-04-20 11:26:21
K-Means	Keras	Clustering	Text	NA	0.5111173	0.6910497	0.9909150	0.3547478	2023-04-21 11:26:21
K-Means	Keras	Clustering	Text	0.9604239	0.5044656	0.5402171	0.8525490	0.2555445	2023-04-22 11:26:21
K-Means	Keras	Clustering	Image	0.8083252	0.4590374	0.8104214	0.6359171	2.1743422	2023-04-23 11:26:21
SVM	Scikit-learn	Clustering	Tabular	0.8982686	0.7961816	0.7566042	0.7543827	0.5143704	2023-04-24 11:26:21
Random Forest	Scikit-learn	Regression	Image	0.7407612	0.8586236	0.8919231	0.7966086	NA	2023-04-25 11:26:21
Random Forest	Scikit-learn	Classification	Tabular	0.5586541	0.5590279	0.7847450	0.4470735	4.4167677	2023-04-26 11:26:21
SVM	TensorFlow	Clustering	Text	0.5625929	0.4125670	0.6009518	0.7266982	4.4262826	2023-04-27 11:26:21
Random Forest	PyTorch	Clustering	Tabular	0.8427826	0.4493030	0.7710766	0.8255925	3.3293225	2023-04-28 11:26:21
SVM	Scikit-learn	Clustering	Time Series	0.7151529	0.9807160	0.4927668	0.5003928	1.1350360	2023-04-29 11:26:21
K-Means	PyTorch	Classification	Text	0.6002624	0.5772669	0.5144194	0.8683790	4.3286845	2023-04-30 11:26:21
SVM	PyTorch	Regression	Time Series	0.7457973	0.8615339	0.8522896	NA	4.4418578	2023-05-01 11:26:21
K-Means	Keras	Classification	Image	0.5321045	0.7747981	NA	0.9713331	1.0636026	2023-05-02 11:26:21
K-Means	TensorFlow	Regression	Image	0.7909857	0.6291638	0.8588662	0.4254534	3.7133878	2023-05-03 11:26:21
Neural Network	Scikit-learn	Clustering	Image	0.6344967	0.5234124	0.8756957	0.5591957	1.5039458	2023-05-04 11:26:21
SVM	Keras	Clustering	Tabular	0.8987796	0.4728319	0.9002936	0.7609323	4.0329375	2023-05-05 11:26:21
Neural Network	Keras	Classification	Text	0.6551810	0.7690078	0.9416447	0.5779359	4.9864657	2023-05-06 11:26:21
SVM	Scikit-learn	Classification	Image	0.7276101	0.8647803	0.6016897	0.8286545	0.2471274	2023-05-07 11:26:21
SVM	PyTorch	Regression	Image	0.5058103	0.7863426	0.5232068	0.8554032	4.4970927	2023-05-08 11:26:21
Neural Network	Keras	Regression	Text	0.5362234	0.7181813	0.7075391	0.4615096	3.1508896	2023-05-09 11:26:21
Neural Network	PyTorch	Classification	Image	0.6962468	0.4251707	0.5598207	0.7083127	4.8695748	2023-05-10 11:26:21
SVM	TensorFlow	Regression	Time Series	0.7399694	0.9810933	0.7207519	0.7053343	2.3785466	2023-05-11 11:26:21
Random Forest	PyTorch	Classification	Text	0.8000103	0.8792285	0.7939101	0.6215685	4.2522009	2023-05-12 11:26:21
K-Means	TensorFlow	Regression	Text	0.6458313	0.5756932	0.7818835	0.9597549	0.4057309	2023-05-13 11:26:21
Neural Network	PyTorch	Classification	Text	0.8474909	0.9879822	0.5621870	0.8965038	1.7428673	2023-05-14 11:26:21
K-Means	Keras	Classification	Time Series	0.9300612	0.7611290	0.4168021	0.8183256	0.4283810	2023-05-15 11:26:21
Random Forest	Keras	Classification	Text	0.8899255	0.7494536	0.6013705	0.8285960	4.8791814	2023-05-16 11:26:21
Random Forest	TensorFlow	Regression	Time Series	0.5198094	0.8488439	0.3998159	0.6770297	4.1031721	2023-05-17 11:26:21
Random Forest	Scikit-learn	Regression	Text	0.7402535	0.8870619	0.9230679	0.9525967	4.2774824	2023-05-18 11:26:21
Neural Network	TensorFlow	Regression	Tabular	0.5524651	0.7938872	0.5421142	0.8167573	4.6960296	2023-05-19 11:26:21
Random Forest	TensorFlow	Classification	Image	0.6210225	0.4768574	NA	0.8373886	0.5170071	2023-05-20 11:26:21
Neural Network	PyTorch	Regression	Tabular	0.9933313	0.6029605	0.3178134	0.9170145	NA	2023-05-21 11:26:21
Random Forest	PyTorch	Regression	Image	0.5712478	0.9568502	0.7520757	0.5644430	0.4480710	2023-05-22 11:26:21
K-Means	TensorFlow	Classification	Time Series	0.7494441	0.5347694	0.7458316	0.8842425	1.1328862	2023-05-23 11:26:21
K-Means	Keras	Clustering	Tabular	0.8090779	0.6233002	0.5384229	NA	1.2238923	2023-05-24 11:26:21
SVM	PyTorch	Classification	Text	0.8512325	0.6592461	0.3501983	0.6072052	2.3986253	2023-05-25 11:26:21
K-Means	Scikit-learn	Regression	Image	0.7798243	0.6636430	0.5867402	0.6013663	1.4159546	2023-05-26 11:26:21
SVM	PyTorch	Classification	Image	0.5048854	0.7677637	0.5178522	0.9871153	0.5947927	2023-05-27 11:26:21
K-Means	PyTorch	Clustering	Tabular	0.6632307	0.9658455	0.7739844	0.9139223	0.9207063	2023-05-28 11:26:21
Neural Network	Scikit-learn	Classification	Time Series	0.7588558	0.5444156	0.7240455	0.8207019	0.8216503	2023-05-29 11:26:21
K-Means	TensorFlow	Regression	Tabular	0.5439332	0.4729008	0.5552156	0.8362341	4.8680763	2023-05-30 11:26:21
SVM	Scikit-learn	Clustering	Time Series	0.6753135	0.5184823	0.4525248	NA	3.8205333	2023-05-31 11:26:21
SVM	TensorFlow	Regression	Time Series	0.5166016	0.9321549	0.9916252	0.9682544	4.8420312	2023-06-01 11:26:21
Random Forest	TensorFlow	Clustering	Image	0.5392892	0.7874865	0.6178011	0.6977553	2.2547284	2023-06-02 11:26:21
Neural Network	TensorFlow	Clustering	Tabular	0.6984616	0.5715441	0.7817920	0.6283106	1.4640056	2023-06-03 11:26:21
K-Means	Scikit-learn	Classification	Tabular	0.5663579	0.8895682	0.3983871	0.4978212	4.0116367	2023-06-04 11:26:21
Random Forest	TensorFlow	Classification	Time Series	0.7837704	0.9168220	NA	0.8717234	1.6986442	2023-06-05 11:26:21
K-Means	Keras	Regression	Tabular	0.8447325	0.9079086	0.3192757	0.8406664	1.5669785	2023-06-06 11:26:21
K-Means	Scikit-learn	Classification	Text	0.9002933	0.9513559	0.6538186	0.6306130	1.2392794	2023-06-07 11:26:21
Random Forest	Keras	Classification	Text	0.6000751	0.5513446	0.9748122	0.4151160	0.7354270	2023-06-08 11:26:21
Random Forest	TensorFlow	Clustering	Tabular	0.5837413	0.8530252	0.5689447	0.9033984	1.3566690	2023-06-09 11:26:21
Random Forest	PyTorch	Classification	Tabular	0.5522839	0.6763237	0.3272972	0.4068508	1.8410668	2023-06-10 11:26:21
Random Forest	TensorFlow	Clustering	Time Series	0.8182151	0.9051991	0.3216691	0.8222199	3.4025716	2023-06-11 11:26:21
Random Forest	Keras	Clustering	Text	8.5323786	0.8370944	NA	0.9821543	0.4065677	2023-06-12 11:26:21
K-Means	Keras	Clustering	Text	0.5157931	0.8658685	0.4120175	0.6625968	1.1314840	2023-06-13 11:26:21
Random Forest	Scikit-learn	Regression	Image	0.9681061	0.7936971	0.3163465	0.5409840	4.0642158	2023-06-14 11:26:21
Neural Network	PyTorch	Regression	Time Series	5.2598564	0.5064573	0.8293495	0.8229226	0.8199174	2023-06-15 11:26:21
SVM	Scikit-learn	Regression	Image	0.7706482	0.7270162	0.6209661	0.8902769	1.7789612	2023-06-16 11:26:21
Random Forest	Keras	Clustering	Text	0.8545303	0.9908018	0.5024713	0.7278582	4.3368287	2023-06-17 11:26:21
Random Forest	TensorFlow	Regression	Time Series	0.9354846	0.9624328	NA	0.9802212	NA	2023-06-18 11:26:21
K-Means	PyTorch	Regression	Tabular	0.8570435	0.4259042	0.3812973	0.4310012	0.5054105	2023-06-19 11:26:21
Random Forest	TensorFlow	Classification	Time Series	0.9008640	0.4988889	0.9691432	0.7028774	2.4742362	2023-06-20 11:26:21
Random Forest	TensorFlow	Regression	Tabular	0.6697251	0.4790373	0.5197767	0.8310724	1.5818556	2023-06-21 11:26:21
Random Forest	PyTorch	Regression	Tabular	NA	0.8355879	0.9218826	0.9175843	2.8607010	2023-06-22 11:26:21
K-Means	Scikit-learn	Clustering	Time Series	0.5400574	0.8906712	0.7220600	0.5075534	4.0386440	2023-06-23 11:26:21
Random Forest	TensorFlow	Clustering	Tabular	0.9474083	0.5281068	0.8786952	0.8800021	0.7720276	2023-06-24 11:26:21
SVM	Keras	Clustering	Image	0.7737962	0.7035116	0.9888092	0.7316242	2.9454255	2023-06-25 11:26:21
K-Means	TensorFlow	Clustering	Text	0.9086489	0.9044218	0.5018838	0.6379322	2.5769929	2023-06-26 11:26:21
SVM	PyTorch	Clustering	Tabular	0.7261591	0.8396809	0.9727948	0.4790290	0.8059861	2023-06-27 11:26:21
K-Means	PyTorch	Clustering	Image	0.8217888	0.7253423	5.7263733	0.9191775	3.1575926	2023-06-28 11:26:21
Random Forest	PyTorch	Regression	Tabular	0.7632013	0.7542086	0.5698583	0.4943639	1.4408380	2023-06-29 11:26:21
SVM	Scikit-learn	Clustering	Text	0.8657948	0.7050163	0.5382710	5.8587272	NA	2023-06-30 11:26:21
K-Means	TensorFlow	Clustering	Image	NA	0.5785291	0.6789853	0.5740273	0.5031344	2023-07-01 11:26:21
Neural Network	Scikit-learn	Classification	Text	0.5301760	0.7390132	0.4079015	NA	2.3519619	2023-07-02 11:26:21
Random Forest	Scikit-learn	Regression	Text	0.6235516	0.8133312	0.6875980	0.8036218	1.6016710	2023-07-03 11:26:21
K-Means	PyTorch	Regression	Image	0.5797723	0.9239937	0.6791934	0.8780088	4.1296679	2023-07-04 11:26:21
Neural Network	PyTorch	Regression	Image	0.9358918	0.7817748	0.8333313	0.5502807	0.3791958	2023-07-05 11:26:21
K-Means	Scikit-learn	Clustering	Image	0.6096070	0.8566729	0.8835550	0.7749245	2.1511756	2023-07-06 11:26:21
Neural Network	TensorFlow	Clustering	Time Series	0.9879326	0.4960430	0.6083089	0.7430476	2.3463140	2023-07-07 11:26:21
Random Forest	Keras	Clustering	Image	0.6684479	0.6769345	0.5116333	0.8996982	3.6525298	2023-07-08 11:26:21
SVM	Scikit-learn	Classification	Image	0.5910590	4.0559897	0.4815344	0.9436522	2.9140091	2023-07-09 11:26:21
Neural Network	TensorFlow	Clustering	Image	0.8948493	0.5480073	0.4362367	0.4072941	3.3693405	2023-07-10 11:26:21
K-Means	PyTorch	Classification	Image	0.8293539	NA	NA	0.8044120	3.9075415	2023-07-11 11:26:21
Random Forest	PyTorch	Regression	Tabular	0.7490979	NA	0.5397136	0.4311015	4.3247946	2023-07-12 11:26:21
Neural Network	TensorFlow	Clustering	Time Series	0.7776818	0.4595069	4.8917338	NA	1.6379899	2023-07-13 11:26:21
K-Means	Keras	Classification	Text	0.8596009	0.6408966	0.9764922	0.5725796	2.7363113	2023-07-14 11:26:21
K-Means	PyTorch	Classification	Image	NA	0.8800426	0.6903962	0.5840660	4.2165296	2023-07-15 11:26:21
K-Means	PyTorch	Clustering	Tabular	0.9981670	0.5224214	0.5430939	0.6117751	4.9483105	2023-07-16 11:26:21
Neural Network	PyTorch	Regression	Tabular	0.9873966	0.7330510	0.7063273	0.7727755	44.5864462	2023-07-17 11:26:21
Neural Network	TensorFlow	Regression	Tabular	0.8251628	0.8398428	0.3974377	0.6004300	1.9201896	2023-07-18 11:26:21
Neural Network	Scikit-learn	Regression	Text	0.5997712	0.7695913	0.6108306	0.8396194	1.0563159	2023-07-19 11:26:21
SVM	PyTorch	Classification	Image	0.8401141	0.5128148	0.7383640	0.6427164	2.4980234	2023-07-20 11:26:21
Neural Network	Scikit-learn	Classification	Time Series	0.5360992	0.6132307	0.6422285	0.4410119	3.7340431	2023-07-21 11:26:21
Neural Network	Keras	Classification	Time Series	0.5153263	0.8702751	0.5812452	0.8702559	2.5135884	2023-07-22 11:26:21
Neural Network	PyTorch	Classification	Tabular	NA	0.7325359	0.9956939	0.5714550	2.4675862	2023-07-23 11:26:21
SVM	PyTorch	Regression	Text	0.7313115	0.4031378	0.9162203	0.6596601	4.2073291	2023-07-24 11:26:21
Neural Network	Keras	Clustering	Text	0.9341363	0.8565945	0.7363842	0.8112663	1.8708785	2023-07-25 11:26:21
K-Means	PyTorch	Classification	Text	0.8635845	0.4211868	0.6985642	0.5994737	4.3129939	2023-07-26 11:26:21
Neural Network	Scikit-learn	Classification	Time Series	0.8713533	0.8474403	0.7344623	0.4339514	20.9334352	2023-07-27 11:26:21
K-Means	TensorFlow	Regression	Tabular	7.1274667	0.5214883	0.4409185	0.6243526	1.7083422	2023-07-28 11:26:21
K-Means	Scikit-learn	Clustering	Image	NA	0.9748441	0.5765964	0.9666691	2.3245506	2023-07-29 11:26:21
K-Means	TensorFlow	Clustering	Time Series	0.6855194	6.2076445	0.3276217	0.7850406	3.8359901	2023-07-30 11:26:21
SVM	TensorFlow	Regression	Tabular	NA	0.5961590	0.6328822	0.8028875	0.7174099	2023-07-31 11:26:21
SVM	Scikit-learn	Regression	Tabular	5.2005460	0.4893328	0.6801172	0.7793693	1.0624542	2023-08-01 11:26:21
SVM	Scikit-learn	Clustering	Image	NA	0.5833625	0.4594248	0.5193953	4.7620796	2023-08-02 11:26:21
Neural Network	Scikit-learn	Classification	Image	0.7893377	0.9259905	0.9748202	0.6510003	0.9599090	2023-08-03 11:26:21
K-Means	Scikit-learn	Clustering	Image	0.7193077	0.9978006	9.3661823	0.8505639	2.8818639	2023-08-04 11:26:21
SVM	Keras	Regression	Text	0.8626288	0.6209857	0.8055003	0.4608237	2.9386409	2023-08-05 11:26:21
SVM	PyTorch	Classification	Image	0.7433345	0.6691664	0.6733706	0.5667117	2.4991599	2023-08-06 11:26:21
Neural Network	TensorFlow	Classification	Tabular	0.9367116	0.8332426	0.9089784	0.5657915	3.2592517	2023-08-07 11:26:21
SVM	Scikit-learn	Regression	Image	0.9503509	0.9317175	0.3914566	0.6592114	1.2261510	2023-08-08 11:26:21
Neural Network	Scikit-learn	Regression	Tabular	0.7108605	0.7558266	0.8533569	0.9882212	2.8080451	2023-08-09 11:26:21
Random Forest	Scikit-learn	Regression	Time Series	0.6384139	0.6349154	0.3873746	0.4405015	1.9236490	2023-08-10 11:26:21
SVM	Keras	Clustering	Text	0.7961752	0.6475731	0.8559475	0.7112206	3.3421696	2023-08-11 11:26:21
Random Forest	Scikit-learn	Clustering	Time Series	0.9561817	0.8173709	0.4930373	0.5076188	0.7919954	2023-08-12 11:26:21
Neural Network	Scikit-learn	Classification	Time Series	NA	0.4019310	0.9139634	0.9824059	28.9729934	2023-08-13 11:26:21
K-Means	Scikit-learn	Regression	Tabular	0.8114833	0.7717536	0.9608295	0.4679821	1.0078247	2023-08-14 11:26:21
SVM	PyTorch	Classification	Time Series	0.8157801	0.6132958	0.4041572	0.6421606	NA	2023-08-15 11:26:21
Neural Network	Keras	Classification	Tabular	0.8665565	0.8765184	0.6238729	0.8427310	1.1716780	2023-08-16 11:26:21
K-Means	Scikit-learn	Classification	Image	NA	0.4557944	0.9866912	0.8227327	0.9959051	2023-08-17 11:26:21
K-Means	TensorFlow	Classification	Image	NA	0.7529214	0.6383852	0.6536372	4.1459837	2023-08-18 11:26:21
Random Forest	TensorFlow	Regression	Tabular	0.9545163	0.6885837	0.9044833	0.6079145	1.4999671	2023-08-19 11:26:21
Neural Network	TensorFlow	Classification	Image	0.5898416	0.7853953	0.7121121	0.6385674	4.6429065	2023-08-20 11:26:21
K-Means	Keras	Regression	Text	0.6187717	0.4389122	0.5627309	0.5585658	4.8526400	2023-08-21 11:26:21
SVM	Keras	Regression	Time Series	0.9856975	NA	0.5000485	0.5231998	2.8991763	2023-08-22 11:26:21
SVM	TensorFlow	Clustering	Text	0.5904885	0.7368908	0.4422562	0.6898238	0.8007055	2023-08-23 11:26:21
Random Forest	Scikit-learn	Regression	Time Series	0.9271925	0.7363961	0.8332587	0.5611203	1.9352864	2023-08-24 11:26:21
K-Means	Keras	Clustering	Image	0.7461389	0.7620926	0.5705784	0.5724770	4.0088881	2023-08-25 11:26:21
Neural Network	TensorFlow	Regression	Text	0.6236155	0.8058808	0.6578928	0.7940536	1.9002146	2023-08-26 11:26:21
SVM	TensorFlow	Regression	Image	0.9353750	0.8829934	0.6446278	0.9811224	0.5263844	2023-08-27 11:26:21
K-Means	Scikit-learn	Regression	Time Series	0.7226526	0.5618924	0.7040953	0.7621823	2.8282461	2023-08-28 11:26:21
K-Means	PyTorch	Clustering	Tabular	0.7574087	0.8950296	0.9059040	0.4461877	4.2410922	2023-08-29 11:26:21
Random Forest	Keras	Clustering	Tabular	0.6796167	0.6989534	0.9865175	0.4453502	NA	2023-08-30 11:26:21
SVM	Scikit-learn	Classification	Text	0.7964754	NA	0.5853089	0.9708539	0.9581034	2023-08-31 11:26:21
SVM	TensorFlow	Clustering	Image	0.5817619	0.4351306	0.8792632	0.5783745	NA	2023-09-01 11:26:21
Neural Network	TensorFlow	Clustering	Image	0.6955408	0.6005430	0.8351695	0.4552402	1.1804709	2023-09-02 11:26:21
SVM	Scikit-learn	Clustering	Text	0.9847062	0.8709382	NA	0.7594268	1.1692169	2023-09-03 11:26:21
Neural Network	Keras	Clustering	Tabular	NA	0.8246086	0.9692330	0.7741893	4.3829517	2023-09-04 11:26:21
SVM	PyTorch	Classification	Time Series	0.8283683	0.8731690	0.4403322	0.7891029	1.3233779	2023-09-05 11:26:21
Random Forest	PyTorch	Clustering	Text	0.6625950	0.7103614	0.3764849	0.5604412	1.3899120	2023-09-06 11:26:21
SVM	TensorFlow	Regression	Time Series	0.8867366	0.6641194	0.8977734	0.4090664	0.1032016	2023-09-07 11:26:21
Neural Network	Keras	Regression	Tabular	0.5654368	0.4884715	0.6074049	0.9790092	4.3662783	2023-09-08 11:26:21
Neural Network	PyTorch	Clustering	Image	0.9849105	0.5969157	0.8928782	0.5505358	3.9837146	2023-09-09 11:26:21
Random Forest	Keras	Clustering	Tabular	NA	0.6604116	0.9251631	0.8056158	3.1739118	2023-09-10 11:26:21
SVM	Keras	Regression	Text	0.6180252	0.4531603	0.3437203	0.8239779	3.7763022	2023-09-11 11:26:21
SVM	TensorFlow	Regression	Image	NA	0.5323672	0.9184253	0.7660045	0.8450380	2023-09-12 11:26:21
Random Forest	PyTorch	Classification	Text	0.5848790	0.7589352	0.6138233	0.5877444	2.3457856	2023-09-13 11:26:21
SVM	Scikit-learn	Regression	Text	0.7598870	0.8413979	NA	0.5626578	1.8209750	2023-09-14 11:26:21
SVM	TensorFlow	Regression	Tabular	0.6685016	0.9990085	0.7386148	0.7586010	0.5595050	2023-09-15 11:26:21
Neural Network	PyTorch	Clustering	Text	0.9144417	0.9598680	0.9484678	NA	2.4820394	2023-09-16 11:26:21
SVM	PyTorch	Regression	Tabular	NA	0.7855391	0.3133813	0.9680402	4.6116243	2023-09-17 11:26:21
SVM	PyTorch	Classification	Image	0.6243571	0.6527488	0.6337904	0.4635435	0.2962897	2023-09-18 11:26:21
Random Forest	PyTorch	Regression	Tabular	0.8085725	0.7817064	0.7814054	0.4928972	1.5281585	2023-09-19 11:26:21
Random Forest	PyTorch	Clustering	Image	0.8533886	0.8713910	0.8058949	0.9668418	1.1169506	2023-09-20 11:26:21
K-Means	Keras	Clustering	Tabular	0.5835210	0.4710017	0.7847727	0.8419211	1.2669186	2023-09-21 11:26:21
Neural Network	TensorFlow	Regression	Time Series	0.5838096	0.6459429	0.3941046	0.9297963	4.5513293	2023-09-22 11:26:21
SVM	Keras	Clustering	Time Series	0.5183357	0.9038814	0.5095769	0.5215796	2.3935384	2023-09-23 11:26:21
SVM	Scikit-learn	Classification	Text	0.8682010	0.6302998	0.5511009	0.7525515	2.3848671	2023-09-24 11:26:21
K-Means	PyTorch	Classification	Tabular	0.8319023	0.7431234	0.8631060	0.8206838	3.8268206	2023-09-25 11:26:21
SVM	TensorFlow	Regression	Tabular	0.7373154	0.7526616	0.4951319	0.8080671	0.8570118	2023-09-26 11:26:21
Neural Network	PyTorch	Regression	Image	0.9220852	0.5106858	0.4474935	0.6448910	2.4876002	2023-09-27 11:26:21
K-Means	TensorFlow	Regression	Text	0.9028351	0.6173413	0.9702136	0.4092369	2.2069275	2023-09-28 11:26:21
Neural Network	Keras	Clustering	Tabular	0.7926772	0.6007068	0.3062043	0.7497556	3.0248624	2023-09-29 11:26:21
K-Means	TensorFlow	Classification	Text	0.9341356	0.4157180	0.9984746	0.5518609	4.9978327	2023-09-30 11:26:21
K-Means	Scikit-learn	Classification	Time Series	0.6029206	4.1451506	7.7377491	0.6701525	3.8702996	2023-10-01 11:26:21
Random Forest	Scikit-learn	Clustering	Tabular	0.5559598	0.8990182	NA	0.9745486	2.0495419	2023-10-02 11:26:21
Neural Network	TensorFlow	Classification	Tabular	0.6348748	0.5638425	0.5062336	0.6394212	4.1550100	2023-10-03 11:26:21
SVM	TensorFlow	Regression	Text	0.5285434	0.7108473	0.3100207	0.9038810	0.9364710	2023-10-04 11:26:21
SVM	TensorFlow	Regression	Tabular	0.7655848	0.5792353	0.8165087	5.1312436	0.2492721	2023-10-05 11:26:21
Neural Network	TensorFlow	Regression	Text	0.9683028	0.9644075	0.8839012	0.8034763	1.1017970	2023-10-06 11:26:21
SVM	Scikit-learn	Classification	Tabular	0.5196718	0.5555781	0.8183333	0.9862042	1.7680166	2023-10-07 11:26:21
SVM	PyTorch	Regression	Image	0.5610550	0.6577941	0.3999952	0.4611359	NA	2023-10-08 11:26:21
Neural Network	TensorFlow	Clustering	Text	0.7260995	0.9236382	0.8273995	0.4049920	3.1141328	2023-10-09 11:26:21
K-Means	TensorFlow	Clustering	Tabular	0.9669375	0.9051601	0.8382459	0.6601497	4.5619680	2023-10-10 11:26:21
Neural Network	Scikit-learn	Classification	Time Series	0.6580781	0.5116609	0.7609784	0.4555753	2.5982846	2023-10-11 11:26:21
K-Means	Scikit-learn	Clustering	Tabular	0.7536174	0.8815860	0.8362812	0.8490306	2.5562507	2023-10-12 11:26:21
SVM	Keras	Clustering	Time Series	0.5207864	0.6749121	0.8921450	0.9487292	0.3461907	2023-10-13 11:26:21
SVM	TensorFlow	Classification	Time Series	NA	0.6897813	0.7295229	0.6604126	0.2710666	2023-10-14 11:26:21
SVM	Keras	Regression	Image	0.9933151	0.4800880	0.3620233	0.5552270	2.8006838	2023-10-15 11:26:21
Random Forest	Scikit-learn	Regression	Text	0.9825593	0.4483609	0.6413395	0.6606420	2.2470752	2023-10-16 11:26:21
K-Means	Scikit-learn	Regression	Tabular	NA	0.8367636	0.3543545	0.8340687	4.2119839	2023-10-17 11:26:21
Random Forest	PyTorch	Classification	Image	0.9759059	0.6978767	0.5852801	0.4054325	0.8873300	2023-10-18 11:26:21
Random Forest	TensorFlow	Clustering	Tabular	0.8195600	0.6621104	NA	0.7536724	0.2223611	2023-10-19 11:26:21
Neural Network	Keras	Regression	Time Series	0.9339591	0.8377049	0.3462069	0.7679750	2.3002900	2023-10-20 11:26:21
Random Forest	Scikit-learn	Regression	Text	0.7273699	0.8593077	0.5441744	0.7826129	1.2636449	2023-10-21 11:26:21
Neural Network	Scikit-learn	Classification	Tabular	0.7577980	0.4953449	0.3776987	0.5452132	0.3431104	2023-10-22 11:26:21
Neural Network	Keras	Regression	Tabular	0.7444233	0.7661351	0.8657646	0.8284316	3.6505316	2023-10-23 11:26:21
Random Forest	PyTorch	Classification	Time Series	0.8334321	0.4812124	0.9633816	NA	0.6468057	2023-10-24 11:26:21
K-Means	Scikit-learn	Clustering	Tabular	0.5698256	0.8508251	0.3506215	0.5195622	3.0807656	2023-10-25 11:26:21
K-Means	TensorFlow	Classification	Tabular	0.5149868	0.7941731	0.9685806	0.9264820	1.4771438	2023-10-26 11:26:21
K-Means	Scikit-learn	Classification	Time Series	0.6539650	0.9739688	0.6658036	0.8432339	0.9501306	2023-10-27 11:26:21
K-Means	TensorFlow	Classification	Image	0.8523404	0.4413748	0.5096960	0.4082473	1.9607825	2023-10-28 11:26:21
K-Means	Keras	Classification	Time Series	0.6009267	NA	0.3538035	0.5490258	4.0270843	2023-10-29 11:26:21
Random Forest	TensorFlow	Clustering	Image	0.8367162	0.5693122	0.6504370	0.5286439	2.0212585	2023-10-30 11:26:21
Random Forest	Keras	Clustering	Text	0.9849560	0.5570234	0.8561609	0.5624838	3.7787713	2023-10-31 11:26:21
SVM	TensorFlow	Regression	Image	NA	0.5481873	0.7949605	0.5485359	0.7148829	2023-11-01 11:26:21
K-Means	Keras	Clustering	Image	0.8363011	0.9437527	0.3351582	0.4375158	3.8865800	2023-11-02 11:26:21
Random Forest	TensorFlow	Clustering	Text	0.7218751	0.5497277	0.3510313	0.6753643	1.2609733	2023-11-03 11:26:21
SVM	PyTorch	Regression	Image	0.9340711	0.5631698	0.5820113	0.8396401	3.4192263	2023-11-04 11:26:21
K-Means	TensorFlow	Clustering	Time Series	0.5885749	0.8556390	0.5067033	0.7640390	2.8723296	2023-11-05 11:26:21
Neural Network	TensorFlow	Classification	Image	0.8463130	0.6698439	0.4626690	0.8037230	4.6528091	2023-11-06 11:26:21
SVM	PyTorch	Classification	Text	NA	0.8660263	0.4967031	0.4486895	1.9978803	2023-11-07 11:26:21
Random Forest	Scikit-learn	Regression	Time Series	0.9723071	0.4392197	0.8624379	0.9708944	0.4242094	2023-11-08 11:26:21
Neural Network	Scikit-learn	Regression	Text	0.8416240	0.6925427	0.9504596	0.9030951	0.1938619	2023-11-09 11:26:21
Neural Network	TensorFlow	Regression	Text	0.7485874	0.4201682	0.5835719	0.8830542	4.1516069	2023-11-10 11:26:21
Neural Network	PyTorch	Classification	Tabular	0.8089236	0.4375919	0.9342777	0.8937903	2.6712046	2023-11-11 11:26:21
SVM	TensorFlow	Clustering	Tabular	0.9344525	0.9438625	0.5250470	0.9596263	3.8986959	2023-11-12 11:26:21
Random Forest	Keras	Classification	Text	0.7853049	0.4835472	0.6335059	0.7265524	1.2486854	2023-11-13 11:26:21
Neural Network	Keras	Regression	Text	0.5151935	0.7194524	0.4582203	0.5201692	1.7909104	2023-11-14 11:26:21
K-Means	TensorFlow	Clustering	Text	0.9654743	NA	0.7483332	0.7700702	0.2475090	2023-11-15 11:26:21
Neural Network	Scikit-learn	Clustering	Tabular	0.8447634	0.6084060	0.9852868	0.8457289	4.8113124	2023-11-16 11:26:21
Neural Network	Scikit-learn	Classification	Tabular	0.8382567	0.9399000	0.7224452	0.8427504	3.3725527	2023-11-17 11:26:21
SVM	TensorFlow	Regression	Image	0.6078376	0.4130940	0.5504699	0.7128694	4.6727049	2023-11-18 11:26:21
SVM	TensorFlow	Classification	Image	8.2944274	0.7982738	0.7534722	0.4410752	1.4009324	2023-11-19 11:26:21
Random Forest	Keras	Clustering	Text	0.6969322	0.9780367	0.3860445	0.6226678	3.0961717	2023-11-20 11:26:21
K-Means	Scikit-learn	Classification	Time Series	0.8256165	0.7361009	0.9220614	0.9524600	NA	2023-11-21 11:26:21
SVM	TensorFlow	Classification	Time Series	0.5532965	0.9620935	0.6521588	0.7506693	1.6559935	2023-11-22 11:26:21
Neural Network	Keras	Regression	Tabular	0.8289227	0.4313547	0.6145448	0.7229992	4.2557353	2023-11-23 11:26:21
Random Forest	Scikit-learn	Regression	Tabular	0.9997069	0.6512760	0.7101054	0.5613121	4.7410912	2023-11-24 11:26:21
Neural Network	PyTorch	Clustering	Tabular	0.5241060	0.5560947	0.7373487	0.6210694	44.3579008	2023-11-25 11:26:21
Neural Network	Keras	Classification	Text	0.9885871	0.8384926	0.3502431	0.9372077	3.7214282	2023-11-26 11:26:21
SVM	Scikit-learn	Classification	Time Series	0.7034540	0.9887783	0.7778321	0.7998778	1.4595769	2023-11-27 11:26:21
Random Forest	Scikit-learn	Classification	Image	0.9353767	0.5539180	0.4693522	0.8724322	1.4799186	2023-11-28 11:26:21
K-Means	Keras	Clustering	Text	0.8911927	0.7925048	0.7997668	0.6726073	4.8204656	2023-11-29 11:26:21
K-Means	PyTorch	Clustering	Image	0.7835081	0.5188586	0.8757744	0.7781483	0.1503940	2023-11-30 11:26:21
SVM	Keras	Regression	Image	0.8692246	0.7391982	0.8627710	0.5490302	3.6086449	2023-12-01 11:26:21
SVM	Scikit-learn	Regression	Text	0.9392578	0.6783595	NA	0.8232765	3.5606062	2023-12-02 11:26:21
Random Forest	TensorFlow	Regression	Image	0.7020702	0.9832032	0.6641189	0.6565608	3.1511788	2023-12-03 11:26:21
K-Means	TensorFlow	Classification	Time Series	0.6635166	0.7651164	0.4000132	0.6655273	4.9515333	2023-12-04 11:26:21
K-Means	Keras	Classification	Image	0.8337967	0.6097038	0.8427423	0.7895934	1.6282092	2023-12-05 11:26:21
K-Means	Scikit-learn	Regression	Time Series	0.9039230	0.4684575	0.4899866	0.9617684	1.7658621	2023-12-06 11:26:21
Neural Network	TensorFlow	Classification	Time Series	0.8811426	0.4907481	0.6476868	0.4384038	0.4868031	2023-12-07 11:26:21
K-Means	Scikit-learn	Regression	Time Series	0.8989068	0.5351902	0.4989919	0.8948459	2.2697109	2023-12-08 11:26:21
Neural Network	Keras	Clustering	Tabular	0.7177917	0.5505800	0.3936799	0.5754299	13.7913778	2023-12-09 11:26:21
Random Forest	TensorFlow	Regression	Image	0.9089171	0.9103696	0.7406904	0.6663517	1.7825948	2023-12-10 11:26:21
Neural Network	Scikit-learn	Clustering	Text	0.5601045	0.7367337	0.3380324	0.4131480	4.1894018	2023-12-11 11:26:21
Random Forest	TensorFlow	Classification	Text	0.7722445	0.7140345	0.8240517	0.5806276	46.8387412	2023-12-12 11:26:21
K-Means	Keras	Regression	Image	NA	0.4688613	0.5223108	0.7015787	1.0100921	2023-12-13 11:26:21
K-Means	Scikit-learn	Classification	Tabular	0.6622929	0.9160838	0.3000943	0.4337056	1.9252938	2023-12-14 11:26:21
Random Forest	TensorFlow	Clustering	Tabular	0.6832308	0.8336886	0.6577904	0.6946574	4.6502898	2023-12-15 11:26:21
SVM	Keras	Clustering	Time Series	0.6980863	0.4406010	NA	0.9562664	0.4028615	2023-12-16 11:26:21
Random Forest	TensorFlow	Clustering	Image	NA	0.8247011	0.4933187	0.4632359	0.5525666	2023-12-17 11:26:21
SVM	TensorFlow	Classification	Image	0.6942791	0.7261229	0.7948835	0.8586644	0.8976758	2023-12-18 11:26:21
Neural Network	PyTorch	Classification	Text	0.7243468	0.4490352	3.4388274	0.6458026	3.0165429	2023-12-19 11:26:21
Neural Network	PyTorch	Classification	Image	NA	0.6749804	0.8875369	0.7931042	0.8373141	2023-12-20 11:26:21
Neural Network	Keras	Classification	Tabular	0.6866259	NA	0.3026739	0.5561420	4.8435743	2023-12-21 11:26:21
K-Means	PyTorch	Clustering	Image	0.6136348	0.4994647	0.4727767	0.4956954	2.2889012	2023-12-22 11:26:21
Neural Network	Keras	Clustering	Text	0.5365980	9.6741889	0.8186328	0.4962776	2.5574380	2023-12-23 11:26:21
SVM	PyTorch	Regression	Time Series	0.8017243	0.9099852	0.5213891	0.4422952	1.3086272	2023-12-24 11:26:21
Neural Network	TensorFlow	Regression	Text	0.8341064	0.8014134	0.3713247	0.5113880	2.4077052	2023-12-25 11:26:21
Random Forest	Keras	Classification	Text	0.8097452	NA	0.5521637	0.7985316	3.3433768	2023-12-26 11:26:21
Random Forest	Keras	Clustering	Time Series	0.7317470	0.6470593	0.4892753	0.9290144	3.7807680	2023-12-27 11:26:21
K-Means	TensorFlow	Clustering	Text	0.6898929	0.7905841	0.8898983	0.8884754	3.7939536	2023-12-28 11:26:21
Random Forest	Keras	Clustering	Text	0.9316668	0.7272591	0.5193435	NA	NA	2023-12-29 11:26:21
SVM	Keras	Regression	Image	0.7595409	0.4373639	0.8522527	0.4662591	4.5313629	2023-12-30 11:26:21
Neural Network	PyTorch	Regression	Image	0.7395909	0.7075016	0.9243105	0.5735125	NA	2023-12-31 11:26:21
K-Means	Keras	Clustering	Time Series	0.5128210	0.8838422	0.6036722	0.5858841	3.8095358	2024-01-01 11:26:21
Neural Network	Scikit-learn	Clustering	Image	0.6706239	0.6755439	0.9369602	5.4997417	0.3724671	2024-01-02 11:26:21
Neural Network	TensorFlow	Clustering	Image	NA	0.4311739	0.5641226	0.7090034	0.1337020	2024-01-03 11:26:21
SVM	PyTorch	Regression	Text	0.6994114	0.8717669	0.9748549	0.7213323	1.1409917	2024-01-04 11:26:21
Random Forest	PyTorch	Clustering	Tabular	7.9008618	0.5208183	0.3625027	0.6141316	3.3512656	2024-01-05 11:26:21
Random Forest	Scikit-learn	Clustering	Image	0.7668013	0.5551725	0.7809135	0.6122735	2.1148677	2024-01-06 11:26:21
Neural Network	TensorFlow	Classification	Tabular	0.8039525	0.4988238	0.6456698	0.8971334	2.0717907	2024-01-07 11:26:21
K-Means	TensorFlow	Classification	Image	0.8824416	0.5981290	0.5713542	0.8735757	4.4344758	2024-01-08 11:26:21
Random Forest	PyTorch	Regression	Image	0.9064929	0.8540509	0.7428983	0.5846775	4.4882580	2024-01-09 11:26:21
K-Means	Keras	Clustering	Time Series	0.8590615	0.7116315	0.7927000	0.9482731	4.5549588	2024-01-10 11:26:21
Random Forest	PyTorch	Clustering	Time Series	0.9777618	NA	0.3030543	0.9716890	1.6380026	2024-01-11 11:26:21
K-Means	TensorFlow	Classification	Time Series	0.5091163	0.9266980	0.4168672	0.5960455	3.4861735	2024-01-12 11:26:21
SVM	Scikit-learn	Regression	Tabular	5.9788899	0.9277491	0.7991322	0.6126550	1.4310026	2024-01-13 11:26:21
K-Means	PyTorch	Classification	Text	0.5037814	0.9223471	0.7664698	0.7033805	1.0339882	2024-01-14 11:26:21
SVM	TensorFlow	Classification	Image	0.8237374	5.4327773	0.9762333	0.9646725	1.0046990	2024-01-15 11:26:21
SVM	PyTorch	Classification	Image	9.4901527	0.6707436	0.8327265	0.9257917	NA	2024-01-16 11:26:21
K-Means	Keras	Classification	Image	NA	0.9909938	0.9655409	0.4615408	2.2067923	2024-01-17 11:26:21
SVM	PyTorch	Clustering	Tabular	0.9635173	0.8632075	0.7917784	0.6356384	4.1716509	2024-01-18 11:26:21
Neural Network	PyTorch	Clustering	Time Series	0.5301337	0.4163005	0.5086365	0.7320227	0.6874664	2024-01-19 11:26:21
SVM	TensorFlow	Classification	Text	0.9672180	0.4391228	0.3737554	0.7018795	3.7011367	2024-01-20 11:26:21
Random Forest	TensorFlow	Regression	Tabular	0.6758113	0.6783588	0.8472767	0.5163178	2.7041291	2024-01-21 11:26:21
K-Means	TensorFlow	Regression	Time Series	0.5507104	0.9455321	0.7509045	0.9152901	1.5109322	2024-01-22 11:26:21
Neural Network	PyTorch	Regression	Tabular	0.7429359	0.7232211	0.3337302	0.8061645	2.5148160	2024-01-23 11:26:21
K-Means	Keras	Regression	Text	0.6283883	0.6986875	0.5520353	0.9027450	1.5697161	2024-01-24 11:26:21
Random Forest	Scikit-learn	Classification	Time Series	0.6424365	0.4632842	0.9697599	0.9152586	3.0208016	2024-01-25 11:26:21
Random Forest	Scikit-learn	Clustering	Tabular	0.6536450	0.7940681	0.6502808	NA	2.2258352	2024-01-26 11:26:21
Random Forest	TensorFlow	Regression	Text	0.9015129	8.9326190	0.6029104	0.6635264	0.9055871	2024-01-27 11:26:21
SVM	TensorFlow	Classification	Image	0.7695806	0.6282520	0.6203897	0.7663281	0.6712571	2024-01-28 11:26:21
SVM	PyTorch	Regression	Text	0.6556538	0.8653671	0.4462179	0.4962192	2.7788088	2024-01-29 11:26:21
K-Means	Scikit-learn	Regression	Image	0.8051669	0.9786860	0.5580950	0.8041873	4.5218239	2024-01-30 11:26:21
K-Means	TensorFlow	Classification	Tabular	0.8580753	0.5222599	0.5588693	0.5075520	1.7853927	2024-01-31 11:26:21
Neural Network	Keras	Clustering	Time Series	0.6363120	0.7139978	0.3366485	0.8163698	3.6917005	2024-02-01 11:26:21
Neural Network	Keras	Clustering	Time Series	0.7067746	0.5722828	0.8372973	0.5377589	3.3274406	2024-02-02 11:26:21
K-Means	TensorFlow	Classification	Tabular	0.5609430	0.8757127	0.5915531	0.4705308	4.6646191	2024-02-03 11:26:21
SVM	TensorFlow	Clustering	Time Series	0.5905747	0.7465560	0.8755259	0.4991707	4.1228070	2024-02-04 11:26:21
Random Forest	Scikit-learn	Classification	Time Series	NA	0.7807495	0.8952436	0.4011953	2.8763658	2024-02-05 11:26:21
K-Means	Scikit-learn	Classification	Image	0.5907192	0.8787485	0.4483958	0.8312438	3.3194494	2024-02-06 11:26:21
Neural Network	PyTorch	Classification	Tabular	0.7625817	0.6375823	0.7601474	0.8394406	4.5020937	2024-02-07 11:26:21
SVM	Scikit-learn	Clustering	Text	0.8545231	0.9490540	0.6305973	0.7089600	2.0576420	2024-02-08 11:26:21
K-Means	Keras	Classification	Image	NA	0.7198173	0.9161097	0.4968177	1.7012941	2024-02-09 11:26:21
K-Means	Keras	Regression	Tabular	0.7836561	0.4947729	0.4510185	0.4501326	0.1529743	2024-02-10 11:26:21
SVM	TensorFlow	Clustering	Image	0.6282814	0.8175395	0.7744692	0.4114774	4.1501003	2024-02-11 11:26:21
K-Means	Scikit-learn	Clustering	Text	0.9814634	0.8759568	0.7254265	0.4994489	4.0248054	2024-02-12 11:26:21
SVM	PyTorch	Clustering	Text	0.7417728	NA	0.5067110	0.9345133	0.6118271	2024-02-13 11:26:21
Random Forest	Keras	Clustering	Tabular	0.9029963	0.9143076	0.3956206	0.5449221	2.9266012	2024-02-14 11:26:21
SVM	Keras	Regression	Text	0.7751133	0.9436860	0.7561478	0.6125628	2.3735103	2024-02-15 11:26:21
SVM	PyTorch	Clustering	Image	0.5217063	0.5661427	0.8170182	4.6320729	0.6816746	2024-02-16 11:26:21
SVM	PyTorch	Clustering	Text	0.8165757	0.9901129	0.5209391	0.5334168	4.9047843	2024-02-17 11:26:21
K-Means	Keras	Classification	Image	0.9757017	0.4844269	0.7513828	0.7115347	1.1520098	2024-02-18 11:26:21
K-Means	Keras	Clustering	Image	0.8008059	0.5212094	5.7659164	0.7646516	0.4294475	2024-02-19 11:26:21
SVM	PyTorch	Regression	Text	0.9095944	0.5105349	0.7992448	0.5472114	3.0147522	2024-02-20 11:26:21
K-Means	Keras	Regression	Tabular	0.9421032	0.9363938	0.4394507	0.4346393	3.7204467	2024-02-21 11:26:21
Neural Network	TensorFlow	Regression	Time Series	0.6140399	0.7925755	0.9231505	0.6346197	0.2589729	2024-02-22 11:26:21
Neural Network	TensorFlow	Regression	Text	0.6060224	0.4912626	0.5011867	0.5405220	3.3277164	2024-02-23 11:26:21
K-Means	PyTorch	Clustering	Image	0.8054905	0.6641941	0.5574502	0.5317324	2.7082302	2024-02-24 11:26:21
K-Means	Scikit-learn	Clustering	Image	0.7055142	0.7691788	0.3406644	0.9759177	0.6059985	2024-02-25 11:26:21
SVM	PyTorch	Regression	Text	0.9199307	0.4500785	0.3780585	0.7697803	0.9453670	2024-02-26 11:26:21
Random Forest	PyTorch	Regression	Tabular	0.9500116	0.9294498	0.6611033	0.7341271	2.8830741	2024-02-27 11:26:21
K-Means	Scikit-learn	Clustering	Tabular	0.6767107	0.8821621	0.4873149	NA	1.5521523	2024-02-28 11:26:21
Neural Network	Scikit-learn	Regression	Image	0.6184353	0.7031241	0.8848362	0.6573666	4.6977471	2024-02-29 11:26:21
SVM	Scikit-learn	Classification	Image	0.8902628	0.9802760	0.3102854	0.7245430	4.1118271	2024-03-01 11:26:21
K-Means	TensorFlow	Regression	Tabular	0.6374030	0.6506566	0.5653658	8.1785788	4.9194793	2024-03-02 11:26:21
Neural Network	Keras	Regression	Time Series	0.9113072	0.9904662	0.5361419	0.8212877	1.3723871	2024-03-03 11:26:21
Neural Network	TensorFlow	Classification	Text	0.7118691	0.8007520	0.3135293	0.5030163	4.8516458	2024-03-04 11:26:21
Random Forest	Keras	Classification	Image	0.8337749	0.7808028	0.3870579	0.7000677	2.2130460	2024-03-05 11:26:21
SVM	TensorFlow	Clustering	Image	0.5477677	0.4995729	0.5895499	0.6471749	1.8028421	2024-03-06 11:26:21
SVM	Keras	Clustering	Tabular	0.8119297	0.9291567	0.6450052	0.9223162	0.3467127	2024-03-07 11:26:21
Random Forest	TensorFlow	Classification	Text	NA	0.6564938	0.5830028	0.7788514	0.3585498	2024-03-08 11:26:21
Random Forest	Scikit-learn	Classification	Tabular	0.7933042	0.4973400	0.6716564	0.7195024	3.4908552	2024-03-09 11:26:21
SVM	PyTorch	Clustering	Text	0.5840071	4.0756451	0.7165922	0.4692368	2.3438521	2024-03-10 11:26:21
SVM	TensorFlow	Regression	Text	0.8684369	0.7358534	0.3069467	0.7633827	1.2099560	2024-03-11 11:26:21
K-Means	Keras	Classification	Text	0.9313985	0.7164398	0.6248666	0.4703223	3.1084545	2024-03-12 11:26:21
K-Means	Scikit-learn	Clustering	Image	0.6083699	0.8316122	0.9744495	0.6023239	1.3390048	2024-03-13 11:26:21
Random Forest	TensorFlow	Regression	Time Series	0.5478573	0.9341548	0.6633226	0.4857052	2.9303956	2024-03-14 11:26:21
K-Means	TensorFlow	Classification	Tabular	0.5118193	0.4476440	0.7742796	0.8152442	1.8601229	2024-03-15 11:26:21
Neural Network	Scikit-learn	Regression	Time Series	0.8209858	0.8388979	0.5183047	0.5237513	4.1354059	2024-03-16 11:26:21
K-Means	TensorFlow	Clustering	Tabular	0.8035470	0.5124472	0.8417940	0.6351156	4.1214112	2024-03-17 11:26:21
K-Means	PyTorch	Classification	Text	0.7733487	0.9149062	0.8410450	9.3740487	2.4390170	2024-03-18 11:26:21
Neural Network	TensorFlow	Regression	Image	0.6159735	0.8914381	0.6649072	0.5225898	1.8190136	2024-03-19 11:26:21
Random Forest	PyTorch	Classification	Time Series	0.6954530	0.7244763	0.9832097	0.7046830	1.8765430	2024-03-20 11:26:21
Neural Network	PyTorch	Regression	Image	0.7972382	0.8261457	0.3878852	0.6515630	4.0480012	2024-03-21 11:26:21
K-Means	Scikit-learn	Clustering	Tabular	0.7483834	0.5886101	0.3118634	0.4108744	1.7080795	2024-03-22 11:26:21
Random Forest	TensorFlow	Clustering	Image	0.9938928	0.6827007	0.8391107	0.8756549	1.1243909	2024-03-23 11:26:21
K-Means	Scikit-learn	Classification	Image	0.5682199	0.8929821	0.8650089	0.4414275	0.5149729	2024-03-24 11:26:21
Neural Network	TensorFlow	Regression	Time Series	NA	0.6755591	0.3841451	0.6845530	2.3867963	2024-03-25 11:26:21
Neural Network	TensorFlow	Classification	Text	0.7021594	0.6146790	4.8590798	0.7364070	2.4624473	2024-03-26 11:26:21
K-Means	Keras	Classification	Image	0.7140998	0.6965275	0.3122867	0.7770559	4.2204527	2024-03-27 11:26:21
SVM	TensorFlow	Regression	Text	0.8587989	0.8969496	0.5053160	0.8131790	1.1840696	2024-03-28 11:26:21
SVM	Scikit-learn	Regression	Tabular	0.8462181	0.6011248	0.8411993	0.5519634	1.9668572	2024-03-29 11:26:21
Neural Network	PyTorch	Regression	Image	0.9956280	0.5042570	0.6625724	0.4059873	4.0611881	2024-03-30 11:26:21
Neural Network	TensorFlow	Clustering	Time Series	0.5641971	0.8272084	5.4366692	0.8340662	4.1357557	2024-03-31 11:26:21
SVM	Keras	Classification	Tabular	0.5520548	0.8955869	0.5602175	0.7213940	1.9845943	2024-04-01 11:26:21
SVM	Scikit-learn	Classification	Tabular	0.8621694	0.4603825	0.3009475	NA	2.3497077	2024-04-02 11:26:21
K-Means	TensorFlow	Clustering	Time Series	0.7891935	0.5439245	0.5098604	0.8928330	1.5861009	2024-04-03 11:26:21
K-Means	Scikit-learn	Regression	Image	0.6370803	0.4851832	0.7525211	NA	NA	2024-04-04 11:26:21
SVM	Keras	Clustering	Image	0.5397097	0.6087648	0.9819361	0.6910568	0.6813545	2024-04-05 11:26:21
K-Means	Keras	Clustering	Tabular	0.5428291	0.6702106	0.8929426	0.6001770	4.6753722	2024-04-06 11:26:21
K-Means	Scikit-learn	Clustering	Image	0.9470954	0.8492958	0.3165165	0.8749349	NA	2024-04-07 11:26:21
Random Forest	TensorFlow	Classification	Text	0.5959337	0.7906886	0.9289924	0.6707765	2.7047035	2024-04-08 11:26:21
K-Means	PyTorch	Clustering	Time Series	0.6616858	0.7725571	0.8482389	0.5100653	0.2600294	2024-04-09 11:26:21
Neural Network	Keras	Classification	Text	0.6133282	0.6114250	0.8462633	0.9129844	2.4304213	2024-04-10 11:26:21
K-Means	TensorFlow	Classification	Image	0.6774982	0.9048685	0.6205970	9.2953593	2.0979657	2024-04-11 11:26:21
K-Means	TensorFlow	Classification	Tabular	0.5347119	0.6827723	0.5786037	0.6797859	0.8899722	2024-04-12 11:26:21
SVM	PyTorch	Regression	Time Series	0.7595299	0.9874630	0.5120410	0.4454198	3.3164071	2024-04-13 11:26:21
Neural Network	Keras	Classification	Image	0.5338063	0.7804853	0.3459864	0.6326957	4.8586394	2024-04-14 11:26:21
K-Means	Keras	Classification	Image	0.9001783	0.4757588	0.4597648	NA	2.8547252	2024-04-15 11:26:21
K-Means	TensorFlow	Regression	Tabular	0.6168560	0.8057065	0.4726225	0.9410644	3.6022911	2024-04-16 11:26:21
Random Forest	TensorFlow	Classification	Tabular	0.7700060	0.5950624	0.6388539	0.5220836	4.3540722	2024-04-17 11:26:21
K-Means	PyTorch	Clustering	Text	0.9400395	0.8117963	0.8232111	0.4401842	2.1465683	2024-04-18 11:26:21
K-Means	PyTorch	Clustering	Time Series	0.8254387	0.4417847	0.6316669	0.9264103	0.6701566	2024-04-19 11:26:21
Random Forest	Keras	Classification	Time Series	0.7664789	NA	0.3404911	0.6336430	3.1026996	2024-04-20 11:26:21
SVM	Scikit-learn	Regression	Tabular	0.6621669	0.9134424	0.9704529	0.7250566	46.9856258	2024-04-21 11:26:21
K-Means	Keras	Clustering	Text	0.6665010	0.5363077	0.9599070	0.9808395	3.3427109	2024-04-22 11:26:21
Random Forest	TensorFlow	Regression	Time Series	0.8347435	0.9022247	0.8496327	0.4399388	0.4763349	2024-04-23 11:26:21
Neural Network	Keras	Regression	Tabular	0.9970697	0.5675657	0.9939295	0.7889908	1.8378644	2024-04-24 11:26:21
SVM	Scikit-learn	Clustering	Image	NA	0.7857291	0.6811376	0.4444633	2.7982506	2024-04-25 11:26:21
Neural Network	PyTorch	Regression	Time Series	0.7788917	0.8164903	0.9739378	0.6252814	2.0757180	2024-04-26 11:26:21
Neural Network	Scikit-learn	Clustering	Text	0.8653253	0.7075929	0.3529235	0.8822887	4.1848557	2024-04-27 11:26:21
Random Forest	Scikit-learn	Regression	Text	0.7326028	0.5831864	0.5559765	0.6600833	4.0918879	2024-04-28 11:26:21
K-Means	Scikit-learn	Clustering	Text	0.5300712	NA	0.4577669	0.9983094	3.0979720	2024-04-29 11:26:21
Random Forest	PyTorch	Classification	Image	0.7811484	0.4199136	0.4370832	0.7354356	1.9299439	2024-04-30 11:26:21
Random Forest	Scikit-learn	Clustering	Time Series	0.9788126	0.5823678	0.3985621	0.5926973	1.3511482	2024-05-01 11:26:21
Random Forest	Scikit-learn	Clustering	Time Series	0.5876515	0.7918977	0.7356895	5.3206680	0.6187842	2024-05-02 11:26:21
Random Forest	Scikit-learn	Clustering	Image	NA	0.9629829	0.8469253	0.6103123	1.8389986	2024-05-03 11:26:21
SVM	TensorFlow	Clustering	Time Series	0.6004668	0.9227227	0.7048089	0.6235200	2.1245453	2024-05-04 11:26:21
K-Means	Scikit-learn	Regression	Image	0.7679138	0.8596389	0.4028739	0.4412282	3.4115323	2024-05-05 11:26:21
Neural Network	PyTorch	Classification	Image	0.5483382	0.8730684	0.8677866	0.6217445	3.3249831	2024-05-06 11:26:21
Neural Network	Scikit-learn	Clustering	Image	NA	0.7989909	0.7451683	0.6785431	0.4435299	2024-05-07 11:26:21
K-Means	Scikit-learn	Regression	Text	0.8780817	0.5561721	0.5717160	NA	2.0367706	2024-05-08 11:26:21
Neural Network	PyTorch	Classification	Time Series	0.6737858	0.9443170	0.7718912	0.7940377	0.9907164	2024-05-09 11:26:21
K-Means	Scikit-learn	Classification	Image	0.8324559	0.8024394	0.4819335	0.8252594	0.8683823	2024-05-10 11:26:21
Neural Network	PyTorch	Classification	Time Series	0.8977250	0.7362644	0.5416345	0.4050182	4.1537805	2024-05-11 11:26:21
Random Forest	Scikit-learn	Classification	Text	0.9635889	0.4665937	0.9422401	0.5306004	0.3053846	2024-05-12 11:26:21
Neural Network	PyTorch	Clustering	Image	0.6173210	0.6682333	0.5033532	0.7970106	2.1539961	2024-05-13 11:26:21
K-Means	Scikit-learn	Clustering	Tabular	0.6996580	0.6762150	0.6289918	0.6903936	0.9241979	2024-05-14 11:26:21
K-Means	Scikit-learn	Regression	Text	0.5762080	0.9187382	0.9230988	0.4031881	4.4087181	2024-05-15 11:26:21
K-Means	TensorFlow	Regression	Tabular	0.9962418	0.7279889	0.7954049	0.8826968	28.2949852	2024-05-16 11:26:21
K-Means	TensorFlow	Regression	Time Series	0.9635005	0.6282403	0.3435423	0.8636857	1.2337651	2024-05-17 11:26:21
K-Means	Scikit-learn	Classification	Time Series	0.7699786	0.9860802	0.4031121	0.7290515	2.5618937	2024-05-18 11:26:21
SVM	Scikit-learn	Classification	Text	0.9210166	NA	0.3054891	0.4398780	3.6842832	2024-05-19 11:26:21
SVM	Keras	Clustering	Time Series	0.7604790	0.6535291	0.7416453	0.8599090	4.7947782	2024-05-20 11:26:21
Neural Network	PyTorch	Classification	Tabular	NA	0.4252148	0.6133714	0.7502282	1.1801854	2024-05-21 11:26:21
K-Means	Keras	Classification	Image	0.5445622	0.8439425	0.3939924	0.8691760	4.4422907	2024-05-22 11:26:21
Neural Network	TensorFlow	Classification	Image	0.8776352	0.9508459	0.9705539	0.8510482	4.6780902	2024-05-23 11:26:21
K-Means	TensorFlow	Regression	Image	0.5638567	NA	0.6707618	NA	4.5904607	2024-05-24 11:26:21
K-Means	TensorFlow	Regression	Text	0.9130338	0.9150050	0.4693253	0.7108048	3.2135303	2024-05-25 11:26:21
SVM	TensorFlow	Clustering	Time Series	0.8910140	0.5753309	0.6504225	0.4841947	3.1861212	2024-05-26 11:26:21
SVM	Scikit-learn	Classification	Tabular	0.8543723	0.9464621	0.7757342	0.8026945	2.0757321	2024-05-27 11:26:21
Random Forest	TensorFlow	Classification	Time Series	0.5180802	0.8523771	0.3533675	0.7722843	3.7867496	2024-05-28 11:26:21
SVM	Scikit-learn	Clustering	Time Series	0.6515642	0.8829441	0.4922927	0.8453519	2.7033407	2024-05-29 11:26:21
Random Forest	Scikit-learn	Classification	Time Series	0.6315563	0.4108039	0.8648758	0.5019286	3.4194465	2024-05-30 11:26:21
Random Forest	TensorFlow	Classification	Image	0.6800682	0.9776862	0.6217654	0.5169851	2.1994437	2024-05-31 11:26:21
Random Forest	TensorFlow	Regression	Tabular	0.5438214	0.8360026	0.6826040	0.9342457	3.6843327	2024-06-01 11:26:21
Random Forest	PyTorch	Clustering	Tabular	0.9684789	0.5828507	0.6029712	NA	4.1399104	2024-06-02 11:26:21
Random Forest	Keras	Clustering	Time Series	0.7769011	0.8976368	0.3307300	0.9449371	0.8175377	2024-06-03 11:26:21
Neural Network	TensorFlow	Clustering	Image	0.6527622	0.5689127	0.4160248	0.8552292	4.1813556	2024-06-04 11:26:21
SVM	PyTorch	Clustering	Image	0.6984908	0.9236523	0.6118791	0.7582932	2.7474530	2024-06-05 11:26:21
Random Forest	PyTorch	Regression	Tabular	0.7236013	0.4675482	0.4464298	0.7924796	4.2388630	2024-06-06 11:26:21
K-Means	Keras	Regression	Image	0.8002972	0.8222116	0.3349853	0.9334768	2.2133784	2024-06-07 11:26:21
SVM	Scikit-learn	Classification	Image	0.7578397	0.7244191	0.8905548	0.7472841	1.9572984	2024-06-08 11:26:21
SVM	TensorFlow	Classification	Image	0.9596960	0.4579207	0.9868348	0.7794348	4.5837976	2024-06-09 11:26:21
Random Forest	Keras	Classification	Text	0.7484817	0.5451363	0.8552000	0.4938860	1.3307294	2024-06-10 11:26:21
Neural Network	TensorFlow	Classification	Time Series	0.9960790	0.4074424	0.8976494	0.6843527	4.2366404	2024-06-11 11:26:21
Random Forest	PyTorch	Regression	Tabular	0.9257125	0.6812608	0.4693840	0.8298383	2.4717008	2024-06-12 11:26:21
Neural Network	PyTorch	Clustering	Tabular	0.6042553	0.5807592	0.9724389	0.5625656	2.6190677	2024-06-13 11:26:21
K-Means	Scikit-learn	Regression	Text	0.9652976	0.7590145	0.4378480	0.5213525	1.6114946	2024-06-14 11:26:21
SVM	PyTorch	Classification	Time Series	0.5581832	0.5783427	0.9660009	0.5882961	2.9079068	2024-06-15 11:26:21
K-Means	PyTorch	Regression	Tabular	0.9087249	0.5799515	0.9963735	0.5449004	1.6935349	2024-06-16 11:26:21
Neural Network	Keras	Clustering	Image	0.6903116	0.8459159	0.7982060	0.5289476	0.2924201	2024-06-17 11:26:21
Neural Network	TensorFlow	Clustering	Tabular	0.9389872	0.4288857	0.9868006	0.6549105	1.4327932	2024-06-18 11:26:21
K-Means	TensorFlow	Regression	Image	0.9340283	0.9417370	0.6986778	0.9447031	0.1312573	2024-06-19 11:26:21
Neural Network	Keras	Regression	Text	NA	0.9113583	0.4816792	0.7042358	4.8939385	2024-06-20 11:26:21
K-Means	Scikit-learn	Regression	Image	0.8950152	0.8006828	0.6058971	0.5127522	4.8321613	2024-06-21 11:26:21
SVM	TensorFlow	Classification	Tabular	0.6523396	0.7559329	0.7154927	0.4461830	2.0362610	2024-06-22 11:26:21
K-Means	PyTorch	Regression	Tabular	0.5404596	0.9353815	0.3511571	0.8176937	3.6690172	2024-06-23 11:26:21
SVM	TensorFlow	Regression	Tabular	0.7014901	0.5111981	0.7356403	0.6296793	1.7944499	2024-06-24 11:26:21
K-Means	PyTorch	Regression	Text	0.5867623	0.4473815	0.9868245	0.8930986	3.3883565	2024-06-25 11:26:21
Neural Network	Scikit-learn	Clustering	Text	0.8474755	0.5437061	0.4330754	0.7957065	4.0466072	2024-06-26 11:26:21
K-Means	Keras	Clustering	Time Series	0.6730499	0.8767470	0.8548166	0.8777449	4.7391056	2024-06-27 11:26:21
Neural Network	TensorFlow	Classification	Tabular	0.9878051	0.4208022	0.9355292	0.5631676	2.0610402	2024-06-28 11:26:21
K-Means	Keras	Classification	Image	0.8204860	0.7496841	0.9605911	0.8154154	3.9381601	2024-06-29 11:26:21
K-Means	TensorFlow	Clustering	Text	0.9112403	0.9972625	0.9720949	0.5584372	1.4030494	2024-06-30 11:26:21
Random Forest	PyTorch	Clustering	Tabular	0.5662623	0.9134177	0.6650217	0.9634411	NA	2024-07-01 11:26:21
Neural Network	Keras	Clustering	Tabular	0.9310072	NA	0.9841155	0.7818246	NA	2024-07-02 11:26:21
Random Forest	TensorFlow	Regression	Text	0.9613786	0.4381845	0.8301171	0.5947071	3.0572912	2024-07-03 11:26:21
Neural Network	PyTorch	Regression	Time Series	0.7435310	0.8988241	0.4131700	0.5617071	3.3315793	2024-07-04 11:26:21
Random Forest	Keras	Clustering	Text	0.8031265	0.7593871	0.6338301	0.5145560	3.4714232	2024-07-05 11:26:21
SVM	Keras	Regression	Text	0.8824049	0.4689598	0.8028318	0.8167846	0.6898991	2024-07-06 11:26:21
K-Means	TensorFlow	Regression	Image	0.5874193	0.4563144	0.4731382	0.5312294	4.6987959	2024-07-07 11:26:21
Neural Network	TensorFlow	Classification	Image	0.7512830	0.9457761	0.7484306	0.7571820	0.9877981	2024-07-08 11:26:21
Neural Network	TensorFlow	Clustering	Time Series	0.6993315	0.8015202	0.7666203	0.5587810	3.1514917	2024-07-09 11:26:21
K-Means	PyTorch	Clustering	Tabular	0.5731870	0.8975721	0.4138898	0.7971814	1.1911128	2024-07-10 11:26:21
Neural Network	TensorFlow	Regression	Text	0.6837672	0.9273873	0.6955597	0.8889638	1.6053604	2024-07-11 11:26:21
Neural Network	Keras	Clustering	Time Series	0.5340862	0.7430634	0.8401388	0.8668151	2.7776469	2024-07-12 11:26:21
Random Forest	Keras	Classification	Image	0.5129060	0.7104678	NA	0.8565108	2.1442478	2024-07-13 11:26:21
Neural Network	TensorFlow	Classification	Text	0.5675831	0.6582564	0.3084722	0.5126334	0.8851054	2024-07-14 11:26:21
SVM	TensorFlow	Classification	Tabular	0.9815576	0.5901680	0.3063269	0.4530310	0.9375884	2024-07-15 11:26:21
SVM	Scikit-learn	Clustering	Image	0.7747648	0.6607576	0.5499205	0.8193693	2.1489095	2024-07-16 11:26:21
K-Means	Keras	Clustering	Image	0.9829111	0.8643278	0.9483358	0.6210083	3.8106866	2024-07-17 11:26:21
Neural Network	PyTorch	Clustering	Image	0.7162489	0.7611541	0.4600737	0.6594077	4.5000268	2024-07-18 11:26:21
K-Means	Scikit-learn	Clustering	Time Series	0.6559081	0.9355140	0.7440546	0.4186895	0.5122255	2024-07-19 11:26:21
SVM	TensorFlow	Clustering	Tabular	0.7530709	0.6660280	0.4554531	0.5557459	2.0262091	2024-07-20 11:26:21
Neural Network	TensorFlow	Classification	Text	0.7197558	0.7642537	0.5251690	0.4202058	0.5912033	2024-07-21 11:26:21
Neural Network	Keras	Regression	Time Series	0.5528323	0.7787845	0.8936295	0.9275115	0.1813032	2024-07-22 11:26:21
K-Means	TensorFlow	Classification	Time Series	0.8204132	0.7550183	0.8102030	0.5460380	3.3424215	2024-07-23 11:26:21
SVM	PyTorch	Clustering	Time Series	0.6080191	0.8215803	0.3667795	0.7344023	3.0520023	2024-07-24 11:26:21
Neural Network	Keras	Classification	Image	0.8097940	0.5424601	0.6000914	0.4233876	0.8987937	2024-07-25 11:26:21
K-Means	TensorFlow	Clustering	Tabular	0.8251005	0.7074183	0.3204188	NA	1.2456829	2024-07-26 11:26:21
SVM	Keras	Clustering	Time Series	0.5760124	0.4625349	0.6366231	0.5938164	0.2161680	2024-07-27 11:26:21
K-Means	TensorFlow	Classification	Text	0.5306748	0.6307068	0.7637038	0.9387515	4.1912413	2024-07-28 11:26:21
Random Forest	TensorFlow	Clustering	Tabular	0.8903808	0.6926002	0.3829518	0.9328709	4.8759839	2024-07-29 11:26:21
Neural Network	TensorFlow	Regression	Text	0.7299002	0.7913346	0.5022195	0.5951744	0.7637517	2024-07-30 11:26:21
Random Forest	PyTorch	Clustering	Tabular	0.5290819	0.9703186	0.5784856	0.9405765	1.2342285	2024-07-31 11:26:21
SVM	PyTorch	Regression	Text	0.9974332	0.7603906	0.9436717	0.9976946	4.3552067	2024-08-01 11:26:21
Random Forest	Keras	Regression	Image	0.5288903	0.8461563	0.9952785	0.8952494	4.6396726	2024-08-02 11:26:21
Random Forest	TensorFlow	Clustering	Time Series	0.8475176	0.7037596	0.3314379	0.9069228	2.1561742	2024-08-03 11:26:21
SVM	TensorFlow	Classification	Text	0.9918395	0.7804624	0.8327055	0.5494052	0.3480501	2024-08-04 11:26:21
K-Means	PyTorch	Regression	Tabular	0.6195901	0.4425593	0.5602068	0.7460214	0.2930200	2024-08-05 11:26:21
Random Forest	TensorFlow	Classification	Time Series	0.5711247	0.5526349	NA	0.4403535	2.9123543	2024-08-06 11:26:21
K-Means	PyTorch	Regression	Time Series	0.5606925	0.6171119	0.8279855	0.4569508	20.2518603	2024-08-07 11:26:21
Random Forest	TensorFlow	Clustering	Tabular	0.6516376	0.6834960	0.9428985	0.9993356	0.2403969	2024-08-08 11:26:21
K-Means	Keras	Clustering	Text	0.5505229	0.4273892	0.9656500	0.5959835	2.9579367	2024-08-09 11:26:21
SVM	TensorFlow	Regression	Image	0.8460807	0.4840145	0.7039858	NA	0.1553824	2024-08-10 11:26:21
Random Forest	TensorFlow	Clustering	Image	0.5311459	0.5660886	5.4998481	0.8839990	3.9581226	2024-08-11 11:26:21
SVM	Scikit-learn	Regression	Text	0.7547111	0.9829196	0.8512844	0.9148140	1.6015208	2024-08-12 11:26:21
Random Forest	PyTorch	Clustering	Time Series	0.9983484	0.5988082	0.4757010	0.9985770	0.2972642	2024-08-13 11:26:21
SVM	TensorFlow	Classification	Time Series	0.9069851	0.6892246	0.6948518	0.5448979	2.9829259	2024-08-14 11:26:21
SVM	TensorFlow	Clustering	Time Series	0.8076097	0.5176586	NA	0.4242105	2.0486536	2024-08-15 11:26:21
Random Forest	Scikit-learn	Classification	Time Series	0.6531268	NA	0.7596418	0.6467149	4.8716220	2024-08-16 11:26:21
Random Forest	Scikit-learn	Clustering	Text	0.8119479	0.5684099	0.4682792	0.4780484	2.7665734	2024-08-17 11:26:21
Random Forest	PyTorch	Classification	Time Series	0.7635207	0.5241955	0.4341151	0.4134555	1.4486720	2024-08-18 11:26:21
Neural Network	Scikit-learn	Clustering	Tabular	0.7130417	0.7099436	0.9427674	0.6162561	3.5762504	2024-08-19 11:26:21
K-Means	Keras	Classification	Time Series	0.5653552	0.4033035	0.3712626	0.8702430	1.4303108	2024-08-20 11:26:21
Neural Network	Scikit-learn	Regression	Image	0.9433021	0.4045984	0.6541705	0.7397084	4.5307921	2024-08-21 11:26:21
Neural Network	Keras	Regression	Time Series	NA	0.5314413	0.4545969	0.5876668	1.9361377	2024-08-22 11:26:21
SVM	PyTorch	Classification	Text	0.5973113	0.4220328	0.3272510	0.7926052	2.7943560	2024-08-23 11:26:21
Random Forest	PyTorch	Clustering	Image	0.6838797	0.4648155	0.3252132	0.5392109	0.3478082	2024-08-24 11:26:21
K-Means	PyTorch	Classification	Text	0.7070649	0.6033164	0.4226552	0.4086289	2.1879941	2024-08-25 11:26:21
SVM	PyTorch	Clustering	Text	0.9137689	0.8815514	0.9067392	0.8586120	NA	2024-08-26 11:26:21
Neural Network	PyTorch	Regression	Image	0.8668072	0.7432292	0.4977351	0.7742459	4.0476791	2024-08-27 11:26:21
Random Forest	Keras	Clustering	Text	0.8846524	NA	0.9653214	0.8573816	1.1991609	2024-08-28 11:26:21
SVM	Keras	Regression	Text	0.5055156	5.7609329	0.7071356	0.4233628	1.2077874	2024-08-29 11:26:21
Random Forest	PyTorch	Clustering	Time Series	0.7080770	0.9590522	0.6056299	0.9022718	4.1047960	2024-08-30 11:26:21
SVM	Scikit-learn	Clustering	Tabular	0.7406721	0.6382090	0.7060622	0.7717157	4.6589523	2024-08-31 11:26:21
Random Forest	TensorFlow	Clustering	Text	0.5095961	0.4522557	0.6616889	0.7380372	0.5672685	2024-09-01 11:26:21
Neural Network	PyTorch	Classification	Text	0.6299066	0.7702399	0.8311434	7.7476843	2.3052873	2024-09-02 11:26:21
K-Means	Scikit-learn	Classification	Text	0.8801449	0.4683030	0.4977472	NA	1.7535260	2024-09-03 11:26:21
Neural Network	TensorFlow	Regression	Time Series	0.5685549	0.6071339	0.5471353	0.7521619	4.3663734	2024-09-04 11:26:21
K-Means	PyTorch	Regression	Tabular	0.7676551	7.0444716	0.9258660	0.7485702	0.5092721	2024-09-05 11:26:21
Random Forest	PyTorch	Classification	Image	0.6076009	0.9245335	0.9625196	0.9944075	1.1345171	2024-09-06 11:26:21
SVM	Keras	Regression	Time Series	NA	0.6961279	0.9247908	0.8540381	3.7870949	2024-09-07 11:26:21
Neural Network	TensorFlow	Classification	Time Series	0.6206007	0.8213553	0.5936136	0.6653763	0.3513399	2024-09-08 11:26:21
K-Means	TensorFlow	Clustering	Tabular	0.9879369	0.9956901	0.8462559	0.8244382	2.5134234	2024-09-09 11:26:21
Neural Network	Scikit-learn	Regression	Tabular	0.9007686	0.4788935	NA	0.6335383	2.2663245	2024-09-10 11:26:21
K-Means	Keras	Regression	Time Series	0.9797883	0.5648389	0.6482779	0.5373253	1.7385658	2024-09-11 11:26:21
Neural Network	Keras	Classification	Image	0.7439270	0.6367456	0.4432761	0.7581111	2.0334043	2024-09-12 11:26:21
K-Means	Keras	Regression	Time Series	0.5548681	0.6530969	0.7137912	0.9569093	2.6967089	2024-09-13 11:26:21
K-Means	Keras	Regression	Tabular	0.7739797	0.6466126	0.4302951	0.9574837	0.8907001	2024-09-14 11:26:21
Random Forest	TensorFlow	Clustering	Image	0.7271887	NA	0.5320953	0.6051432	2.9027798	2024-09-15 11:26:21
K-Means	TensorFlow	Regression	Tabular	0.9221785	0.8284196	0.8985211	0.7166065	4.0466184	2024-09-16 11:26:21
Random Forest	PyTorch	Clustering	Image	0.5490413	0.7647431	0.4449531	0.5269815	3.8247886	2024-09-17 11:26:21

Estructura general

No.filas : 560
No. Columnas : 10
Nombres de las variables: “Algorithm”, “Framework”, “Problem_Type”, “Dataset_Type”, “Accuracy”, “Precision”, “Recall” “F1_Score”, “Training_Time” y “Date”
- Tipos de variables:
cualitativas nominales: “Algorithm”, “Framework”, “Problem_Type”, “Dataset_Type”, “Date”
cualitativas Ordinales : no hay
cuantitativas: “Accuracy”, “Precision”, “Recall”, “F1_Score”, “Training_Time”

Ninguna variable necesita un tipo de conversión, porque cada una está registrada con su tipo de dato correspondiente.

Resumen estadístico

Analizaremos cada una de las variables, viendo sus medidas descriptivas, con el fin de tener una visión rápida de los datos y detectar patrones, anomalías o tendencias. Además, usaremos gráficas de barras para las variables categóricas e histogramas para las numéricas.

Variables Categóricas

-Algorithm

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Algorithm))
colnames(tabla_frecuencia)[1] <- "Algorithm"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")

Frecuencia de Algoritmos
Algorithm	Frecuencia
K-Means	163
Neural Network	135
Random Forest	126
SVM	136

Se puede notar que todos los algoritmos tienen casi la misma ocurrencia. Además, la moda de esta variable categórica es K_Means.

GRÁFICA:

library(ggplot2)
library(dplyr)

tabla_1 <- datos %>%
  dplyr::group_by(Algorithm) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Algorithm)

G1<-ggplot(tabla_1, aes(x =Algorithm, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,170))+
  labs(x="Algoritmo", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Algorithm")
G1

-Framework

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Framework))
colnames(tabla_frecuencia)[1] <- "Framework"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")

Frecuencia de Algoritmos
Framework	Frecuencia
Keras	124
PyTorch	135
Scikit-learn	134
TensorFlow	167

Se puede apreciar que todos los frameworks son usados casi con la misma ocurrencia, la moda de esta variable es TensorFlow.

GRÁFICA

library(ggplot2)
library(dplyr)

tabla_2 <- datos %>%
  dplyr::group_by(Framework) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Framework)

G2<-ggplot(tabla_2, aes(x = Framework, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,180))+
  labs(x="Framework", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Framework")
G2

-Problem_Type

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Problem_Type))
colnames(tabla_frecuencia)[1] <- "Problem_Type"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")

Frecuencia de Algoritmos
Problem_Type	Frecuencia
Classification	175
Clustering	196
Regression	189

Se observa que todos tipos de problemas abordados por los modelos se utilizan con una frecuencia similar, siendo Clustering la moda de esta variable.

GRÁFICA

library(ggplot2)
library(dplyr)

tabla_3 <- datos %>%
  dplyr::group_by(Problem_Type) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Problem_Type)

G3<-ggplot(tabla_3, aes(x = Problem_Type, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,210))+
  labs(x="Problem_Type", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Problem_Type")
G3

-Dataset_Type

La frecuencia de cada categoría de esta variable es la siguiente:

library(pander)
tabla_frecuencia <- as.data.frame(table(datos$Dataset_Type))
colnames(tabla_frecuencia)[1] <- "Dataset_Type"
colnames(tabla_frecuencia)[2] <- "Frecuencia"
pander(tabla_frecuencia, caption = "Frecuencia de Algoritmos")

Frecuencia de Algoritmos
Dataset_Type	Frecuencia
Image	157
Tabular	136
Text	143
Time Series	124

Se aprecia que todos los tipos de datos usados en el entrenamiento del modelo tienen casi la misma frecuencia, siendo Image el tipo de dato más usado.

GRÁFICA

library(ggplot2)
library(dplyr)

tabla_4 <- datos %>%
  dplyr::group_by(Dataset_Type) %>%
  dplyr::summarise(Total=n()) %>%
  dplyr::mutate(Porcentaje=round(Total/sum(Total)*100, 1)) %>%
  dplyr::arrange(Dataset_Type)

G4<-ggplot(tabla_4, aes(x = Dataset_Type, y=Total) )+
  geom_bar(width = 0.7, stat="identity",
           position=position_dodge(), fill="cyan4")+
  ylim(c(0,165))+
  labs(x="Dataset_Type", y="Frecuencia")+
  geom_text(aes(label=paste0(Total," ", "", "(", Porcentaje, "%", ")")),
            vjust=-0.9,
            color="black",
            hjust=0.5,
            position=position_dodge(0.9),
            angle=0,
            size=4.5)+
  theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1))+
  theme_bw(base_size=16)+
  facet_wrap(~"Distribución variable Dataset_Type")
G4

**Filtrar la base para compenderla mejor*

filtro 1

library(dplyr)
df_clustering <- filter(datos, Problem_Type == "Clustering" & Algorithm == "K-Means")
str(df_clustering)

## tibble [58 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:58] "K-Means" "K-Means" "K-Means" "K-Means" ...
##  $ Framework    : chr [1:58] "Keras" "Keras" "PyTorch" "Keras" ...
##  $ Problem_Type : chr [1:58] "Clustering" "Clustering" "Clustering" "Clustering" ...
##  $ Dataset_Type : chr [1:58] "Time Series" "Text" "Text" "Text" ...
##  $ Accuracy     : num [1:58] 0.744 0.84 0.699 NA 0.96 ...
##  $ Precision    : num [1:58] 0.49 0.663 0.782 0.511 0.504 ...
##  $ Recall       : num [1:58] 0.877 0.558 0.775 0.691 0.54 ...
##  $ F1_Score     : num [1:58] 0.441 0.569 0.984 0.991 0.853 ...
##  $ Training_Time: num [1:58] NA 3.485 2.33 0.355 0.256 ...
##  $ Date         : POSIXct[1:58], format: "2023-03-09 11:26:21" "2023-03-22 11:26:21" ...

Descripción: en total hay 58 pruebas donde se usó el algoritmo de K-means para abordar problemas de Clustering.

filtro 2

library(dplyr)
df_clustering <- filter(datos, Training_Time < 2 & Training_Time>1)
str(df_clustering)

## tibble [112 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:112] "SVM" "K-Means" "SVM" "K-Means" ...
##  $ Framework    : chr [1:112] "PyTorch" "PyTorch" "PyTorch" "PyTorch" ...
##  $ Problem_Type : chr [1:112] "Regression" "Regression" "Clustering" "Classification" ...
##  $ Dataset_Type : chr [1:112] "Image" "Text" "Text" "Image" ...
##  $ Accuracy     : num [1:112] 0.897 0.975 0.885 0.566 0.715 ...
##  $ Precision    : num [1:112] 9.732 0.423 0.612 0.797 0.981 ...
##  $ Recall       : num [1:112] 0.781 0.826 0.507 0.386 0.493 ...
##  $ F1_Score     : num [1:112] 0.793 0.477 0.89 0.884 0.5 ...
##  $ Training_Time: num [1:112] 1.93 1.45 1.46 1.84 1.14 ...
##  $ Date         : POSIXct[1:112], format: "2023-03-18 11:26:21" "2023-03-25 11:26:21" ...

Descripción: en total hay 112 pruebas donde el Training_time estuvo entre 1 y 2. Esto representa un porcentaje alto porque se cuentan con 560 pruebas.

Filtro 3

library(dplyr)
df_clustering <- filter(datos, Precision < 0.5)
str(df_clustering)

## tibble [89 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:89] "K-Means" "K-Means" "Random Forest" "Random Forest" ...
##  $ Framework    : chr [1:89] "Keras" "PyTorch" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:89] "Clustering" "Regression" "Classification" "Regression" ...
##  $ Dataset_Type : chr [1:89] "Time Series" "Text" "Image" "Text" ...
##  $ Accuracy     : num [1:89] 0.744 0.975 0.992 0.947 0.826 ...
##  $ Precision    : num [1:89] 0.49 0.423 0.44 0.49 0.425 ...
##  $ Recall       : num [1:89] 0.877 0.826 0.716 0.745 NA ...
##  $ F1_Score     : num [1:89] 0.441 0.477 0.583 0.538 0.535 ...
##  $ Training_Time: num [1:89] NA 1.449 4.203 0.194 4.207 ...
##  $ Date         : POSIXct[1:89], format: "2023-03-09 11:26:21" "2023-03-25 11:26:21" ...

Descripción: en total hay 89 pruebas donde la precision fue menor al 0.5. Si esta medida representa porcentaje, entonces 89 pruebas tuvieron una precision menor al 50%.

filtro 4

df_filtered <- filter(datos, Algorithm == "SVM", Dataset_Type == "Image", Accuracy > 0.7)
str(df_clustering)

## tibble [89 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:89] "K-Means" "K-Means" "Random Forest" "Random Forest" ...
##  $ Framework    : chr [1:89] "Keras" "PyTorch" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:89] "Clustering" "Regression" "Classification" "Regression" ...
##  $ Dataset_Type : chr [1:89] "Time Series" "Text" "Image" "Text" ...
##  $ Accuracy     : num [1:89] 0.744 0.975 0.992 0.947 0.826 ...
##  $ Precision    : num [1:89] 0.49 0.423 0.44 0.49 0.425 ...
##  $ Recall       : num [1:89] 0.877 0.826 0.716 0.745 NA ...
##  $ F1_Score     : num [1:89] 0.441 0.477 0.583 0.538 0.535 ...
##  $ Training_Time: num [1:89] NA 1.449 4.203 0.194 4.207 ...
##  $ Date         : POSIXct[1:89], format: "2023-03-09 11:26:21" "2023-03-25 11:26:21" ...

Descripción: en total hay 89 pruebas donde se usó el algoritmo de SVM, el tipo de dato fue Image y tuvo una precisión en el conjunto de prueba mayor al 0.7.

filtro 5

df_filtered <- datos %>%
  filter(Problem_Type == "Clustering", 
         Framework == "Keras", 
         Precision > 0.6, 
         Accuracy > 0.8)
str(df_filtered)

## tibble [15 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Algorithm    : chr [1:15] "SVM" "SVM" "K-Means" "K-Means" ...
##  $ Framework    : chr [1:15] "Keras" "Keras" "Keras" "Keras" ...
##  $ Problem_Type : chr [1:15] "Clustering" "Clustering" "Clustering" "Clustering" ...
##  $ Dataset_Type : chr [1:15] "Text" "Image" "Text" "Tabular" ...
##  $ Accuracy     : num [1:15] 0.842 0.847 0.84 0.809 8.532 ...
##  $ Precision    : num [1:15] 0.842 0.872 0.663 0.623 0.837 ...
##  $ Recall       : num [1:15] 0.875 0.38 0.558 0.538 NA ...
##  $ F1_Score     : num [1:15] 0.704 0.491 0.569 NA 0.982 ...
##  $ Training_Time: num [1:15] 4.042 4.714 3.485 1.224 0.407 ...
##  $ Date         : POSIXct[1:15], format: "2023-03-11 11:26:21" "2023-03-19 11:26:21" ...

Descripción: en total hay 15 pruebas donde el tipo de problema fie CLustering, el framework usado fue Keras, la precisión fue mayor al 0.6 y la precisión en el conjunto de prueba fue mayor a 0.8.

Variables Numéricas y Tratamiento de datos NA

Se analizará cada variable númerica, viendo sus medidas de localización, medidas de dispersión, medidas de distribución y gráficas. Ya teniendo cada analisis se puede hacer la imputación de los datos faltantes, dado que, para hacer este paso se necesitan saber las medidas anteriores.

-Accuracy

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Accuracy))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.	NA’s
0.5038	0.6236	0.7578	0.8779	0.8824	9.718	39

num_NA <- sum(is.na(datos$Accuracy))
porcentaje_na<-(100*num_NA)/560
porcentaje_na

## [1] 6.964286

-Medidas de localización:
Esta variable tiene como media 0.8779 y mediana 0.7578.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Accuracy)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos$Accuracy
## A = 131.83, p-value < 2.2e-16

Esta variable numérica no sigue una distribución normal porque se hizo el test de Anderson-Darling y dió un p extremadamente pequeño, lo que indica que hay una fuerte evidencia para rechazar la hipótesis nula de normalidad. Además, la gráfica que se hará, reflejará que la variable no sigue una distribución normal.
-Coeficiete de dispersión:

media<-mean(datos$Accuracy, na.rm = TRUE)
sd_a<-sd(datos$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 107.4343

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Accuracy, na.rm = TRUE)
mediana <- median(datos$Accuracy, na.rm = TRUE)
desviacion <- sd(datos$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.38177

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA

Podemos ver que los NA’s de esta variable son 39, que corresponden a un 6.96 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.
De esta forma, haremos los histogramas antes y después del tratamiento para ver si no afecta en nada borrar la lista.

library(dplyr)
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)
dim(datos_omit)

## [1] 448  10

ggp1 <- ggplot(data.frame(value=datos$Accuracy), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Accuracy") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp2 <- ggplot(data.frame(value=datos_omit$Accuracy), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Accuracy") + ylab("Frecuencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Accuracy, datos_omit$Accuracy)
print(ks_test)

## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Accuracy and datos_omit$Accuracy
## D = 0.022159, p-value = 0.9998
## alternative hypothesis: two-sided

-Precision

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Precision))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.	NA’s
0.4019	0.5632	0.7195	0.8129	0.8596	9.732	19

num_NA <- sum(is.na(datos$Precision))
porcentaje_na<-(100*num_NA)/560
porcentaje_na

## [1] 3.392857

-Medidas de localización:
Esta variable tiene como media 0.8129 y mediana 0.7195.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Precision)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos$Precision
## A = 118.96, p-value < 2.2e-16

media<-mean(datos$Precision, na.rm = TRUE)
sd_a<-sd(datos$Precision, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 104.7427

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Precision, na.rm = TRUE)
mediana <- median(datos$Precision, na.rm = TRUE)
desviacion <- sd(datos$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3292674

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Los NA’s de esta variable son 19, que corresponden a un 3.39 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$Precision), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Precision") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$Precision), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Precisión") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Precision, datos_omit$Precision)
print(ks_test)

## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Precision and datos_omit$Precision
## D = 0.019503, p-value = 1
## alternative hypothesis: two-sided

-Recall

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Recall))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.	NA’s
0.3001	0.4819	0.6493	0.7486	0.8404	9.366	20

num_NA <- sum(is.na(datos$Recall))
porcentaje_na<-(100*num_NA)/560
porcentaje_na

## [1] 3.571429

-Medidas de localización:
Esta variable tiene como media 0.7486 y mediana 0.6493.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Recall)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos$Recall
## A = 98.028, p-value < 2.2e-16

media<-mean(datos$Recall, na.rm = TRUE)
sd_a<-sd(datos$Recall, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 104.7911

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Recall, na.rm = TRUE)
mediana <- median(datos$Recall, na.rm = TRUE)
desviacion <- sd(datos$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3797481

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Los NA’s de esta variable son 20, que corresponden a un 3.57 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$Recall), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Recall") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$Recall), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Recall") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Recall, datos_omit$Recall)
print(ks_test)

## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Recall and datos_omit$Recall
## D = 0.015873, p-value = 1
## alternative hypothesis: two-sided

-F1_Score

Este es el resumen de la variable:

library(pander)
pander(summary(datos$F1_Score))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.	NA’s
0.4	0.5515	0.7086	0.8122	0.8438	9.374	20

num_NA <- sum(is.na(datos$F1_Score))
porcentaje_na<-(100*num_NA)/560
porcentaje_na

## [1] 3.571429

-Medidas de localización:
Esta variable tiene como media 0.8122 y mediana 0.7086.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$F1_Score)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos$F1_Score
## A = 122.21, p-value < 2.2e-16

media<-mean(datos$F1_Score, na.rm = TRUE)
sd_a<-sd(datos$F1_Score, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 109.9297

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$F1_Score, na.rm = TRUE)
mediana <- median(datos$F1_Score, na.rm = TRUE)
desviacion <- sd(datos$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3479023

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Los NA’s de esta variable son 20, que corresponden a un 3.57 porciento del total. Por lo cual, podemos usar el método de borrado de listas para tratar esta variable.

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$F1_Score), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("F1_Score") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$F1_Score), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("F1_Score") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$F1_Score, datos_omit$F1_Score)
print(ks_test)

## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$F1_Score and datos_omit$F1_Score
## D = 0.021098, p-value = 0.9999
## alternative hypothesis: two-sided

-Training_Time

Este es el resumen de la variable:

library(pander)
pander(summary(datos$Training_Time))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.	NA’s
0.1032	1.244	2.435	2.991	3.813	46.99	20

num_NA <- sum(is.na(datos$Training_Time))
porcentaje_na<-(100*num_NA)/560
porcentaje_na

## [1] 3.571429

-Medidas de localización:
Esta variable tiene como media 2.991 y mediana 2.435.
-Prueba de normalidad:

library(nortest)
ad_test <- ad.test(datos$Training_Time)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos$Training_Time
## A = 80.983, p-value < 2.2e-16

media<-mean(datos$Training_Time, na.rm = TRUE)
sd_a<-sd(datos$Training_Time, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 147.9032

Un coeficiente mayor a 100% como en este caso, indica una alta dispersión o variabilidad de los datos respecto a su media.
-Asimetría:

media <- mean(datos$Training_Time, na.rm = TRUE)
mediana <- median(datos$Training_Time, na.rm = TRUE)
desviacion <- sd(datos$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3772653

Como el valor de “asimetria_pearson” es positivo, indica que hay una distribución sesgada hacia la derecha.

GRÁFICA
Usaremos el método de borrado de listas para tratar esta variable, veremos si siguen la misma distribución mediante la prueba de Kolmogorov-Smirnov (KS Test). Si es el caso, podemos tratar los datos NA’s mediante la eliminación de filas .

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
datos_omit <- na.omit(datos)

ggp1 <- ggplot(data.frame(value=datos$Training_Time), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Base de datos original") +
  xlab("Training_Time") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

ggp2 <- ggplot(data.frame(value=datos_omit$Training_Time), aes(x=value)) +
  geom_histogram(fill="#43B047", color="#049DCB", alpha=0.9) +
  ggtitle("Después del tratamiento") +
  xlab("Training_Time") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))

grid.arrange(ggp1, ggp2, ncol = 2)

ks_test <- ks.test(datos$Training_Time, datos_omit$Training_Time)
print(ks_test)

## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  datos$Training_Time and datos_omit$Training_Time
## D = 0.013938, p-value = 1
## alternative hypothesis: two-sided

RESUMEN GENERAL:

Se puede apreciar que todas las variables numéricas tienen un comportamiento similar con respecto a sus medidas de localización, dispersión, distribución y naturaleza de las gráficas que son sesgadas hacia las derecha. Además, al hacer la prueba Kolmogorov-Smirnov (KS Test) para ver si las distribuciones de las gráficas originales y las que no tienen NA’s, son iguales estadísticamente, podemos ver que los valores p son mayores a 0.05 y los D cercanos a 0, teniendo como conclusión que las distribuciones son iguales.

TRATAMIENTO DE LOS DATOS FALTANTES (NA’S)

Gracias al analisis anterior donde se comparó las gráficas antes y después del borrado de listas correspondiente de cada variable, se puede concluir que no afecta en nada la gráfica el borrado de estos NA’s. Sin embargo, a continuación se mirará individualmente cada uno de estos casos para ver la falta de datos no está relacionada con los datos y en realidad son NA’s del tipo MCAR(Missing Completely At Random).

Eliminación de los NA de cada variable

datos_sin_na <- na.omit(datos)

Visualización

filas_na <- datos[!complete.cases(datos), ]
library(knitr)
library(kableExtra)
kable(filas_na, caption="Base de datos sin NA's") %>%
  kable_styling(full_width=F) %>%
  column_spec(2, width="20em") %>%
  scroll_box(width="900px", height="450px")

Base de datos sin NA’s
Algorithm	Framework	Problem_Type	Dataset_Type	Accuracy	Precision	Recall	F1_Score	Training_Time	Date
SVM	Scikit-learn	Regression	Time Series	0.6618051	0.6929447	NA	0.4426950	4.9785924	2023-03-08 11:26:21
K-Means	Keras	Clustering	Time Series	0.7443216	0.4900292	0.8766533	0.4414046	NA	2023-03-09 11:26:21
Neural Network	PyTorch	Regression	Text	0.9985623	0.6366858	0.3357948	0.9014956	NA	2023-03-14 11:26:21
SVM	Keras	Regression	Time Series	NA	0.8710099	0.3416673	0.8161708	3.4064529	2023-03-16 11:26:21
Random Forest	Keras	Regression	Text	0.5818119	0.9352508	NA	0.8626737	3.4199049	2023-03-17 11:26:21
Neural Network	PyTorch	Regression	Text	NA	0.5528024	0.3847175	0.6551369	3.5159654	2023-03-23 11:26:21
K-Means	TensorFlow	Regression	Time Series	NA	0.7073332	0.7288014	0.8376069	3.0875174	2023-04-06 11:26:21
Neural Network	Scikit-learn	Classification	Text	0.8258334	0.4250037	NA	0.5345761	4.2069594	2023-04-08 11:26:21
Random Forest	Scikit-learn	Regression	Image	NA	0.8297940	0.9317173	8.4513780	4.8085087	2023-04-10 11:26:21
K-Means	TensorFlow	Classification	Image	5.9276276	NA	0.3994960	0.5817585	2.0692473	2023-04-13 11:26:21
Neural Network	Scikit-learn	Clustering	Image	0.8882984	0.8425050	0.5954240	0.8275728	NA	2023-04-15 11:26:21
SVM	Scikit-learn	Clustering	Time Series	0.8805120	0.6098219	0.7040399	NA	2.9576349	2023-04-18 11:26:21
Random Forest	Scikit-learn	Clustering	Tabular	0.8131102	0.8647921	NA	0.9411641	3.0049169	2023-04-19 11:26:21
K-Means	Keras	Clustering	Text	NA	0.5111173	0.6910497	0.9909150	0.3547478	2023-04-21 11:26:21
Random Forest	Scikit-learn	Regression	Image	0.7407612	0.8586236	0.8919231	0.7966086	NA	2023-04-25 11:26:21
SVM	PyTorch	Regression	Time Series	0.7457973	0.8615339	0.8522896	NA	4.4418578	2023-05-01 11:26:21
K-Means	Keras	Classification	Image	0.5321045	0.7747981	NA	0.9713331	1.0636026	2023-05-02 11:26:21
Random Forest	TensorFlow	Classification	Image	0.6210225	0.4768574	NA	0.8373886	0.5170071	2023-05-20 11:26:21
Neural Network	PyTorch	Regression	Tabular	0.9933313	0.6029605	0.3178134	0.9170145	NA	2023-05-21 11:26:21
K-Means	Keras	Clustering	Tabular	0.8090779	0.6233002	0.5384229	NA	1.2238923	2023-05-24 11:26:21
SVM	Scikit-learn	Clustering	Time Series	0.6753135	0.5184823	0.4525248	NA	3.8205333	2023-05-31 11:26:21
Random Forest	TensorFlow	Classification	Time Series	0.7837704	0.9168220	NA	0.8717234	1.6986442	2023-06-05 11:26:21
Random Forest	Keras	Clustering	Text	8.5323786	0.8370944	NA	0.9821543	0.4065677	2023-06-12 11:26:21
Random Forest	TensorFlow	Regression	Time Series	0.9354846	0.9624328	NA	0.9802212	NA	2023-06-18 11:26:21
Random Forest	PyTorch	Regression	Tabular	NA	0.8355879	0.9218826	0.9175843	2.8607010	2023-06-22 11:26:21
SVM	Scikit-learn	Clustering	Text	0.8657948	0.7050163	0.5382710	5.8587272	NA	2023-06-30 11:26:21
K-Means	TensorFlow	Clustering	Image	NA	0.5785291	0.6789853	0.5740273	0.5031344	2023-07-01 11:26:21
Neural Network	Scikit-learn	Classification	Text	0.5301760	0.7390132	0.4079015	NA	2.3519619	2023-07-02 11:26:21
K-Means	PyTorch	Classification	Image	0.8293539	NA	NA	0.8044120	3.9075415	2023-07-11 11:26:21
Random Forest	PyTorch	Regression	Tabular	0.7490979	NA	0.5397136	0.4311015	4.3247946	2023-07-12 11:26:21
Neural Network	TensorFlow	Clustering	Time Series	0.7776818	0.4595069	4.8917338	NA	1.6379899	2023-07-13 11:26:21
K-Means	PyTorch	Classification	Image	NA	0.8800426	0.6903962	0.5840660	4.2165296	2023-07-15 11:26:21
Neural Network	PyTorch	Classification	Tabular	NA	0.7325359	0.9956939	0.5714550	2.4675862	2023-07-23 11:26:21
K-Means	Scikit-learn	Clustering	Image	NA	0.9748441	0.5765964	0.9666691	2.3245506	2023-07-29 11:26:21
SVM	TensorFlow	Regression	Tabular	NA	0.5961590	0.6328822	0.8028875	0.7174099	2023-07-31 11:26:21
SVM	Scikit-learn	Clustering	Image	NA	0.5833625	0.4594248	0.5193953	4.7620796	2023-08-02 11:26:21
Neural Network	Scikit-learn	Classification	Time Series	NA	0.4019310	0.9139634	0.9824059	28.9729934	2023-08-13 11:26:21
SVM	PyTorch	Classification	Time Series	0.8157801	0.6132958	0.4041572	0.6421606	NA	2023-08-15 11:26:21
K-Means	Scikit-learn	Classification	Image	NA	0.4557944	0.9866912	0.8227327	0.9959051	2023-08-17 11:26:21
K-Means	TensorFlow	Classification	Image	NA	0.7529214	0.6383852	0.6536372	4.1459837	2023-08-18 11:26:21
SVM	Keras	Regression	Time Series	0.9856975	NA	0.5000485	0.5231998	2.8991763	2023-08-22 11:26:21
Random Forest	Keras	Clustering	Tabular	0.6796167	0.6989534	0.9865175	0.4453502	NA	2023-08-30 11:26:21
SVM	Scikit-learn	Classification	Text	0.7964754	NA	0.5853089	0.9708539	0.9581034	2023-08-31 11:26:21
SVM	TensorFlow	Clustering	Image	0.5817619	0.4351306	0.8792632	0.5783745	NA	2023-09-01 11:26:21
SVM	Scikit-learn	Clustering	Text	0.9847062	0.8709382	NA	0.7594268	1.1692169	2023-09-03 11:26:21
Neural Network	Keras	Clustering	Tabular	NA	0.8246086	0.9692330	0.7741893	4.3829517	2023-09-04 11:26:21
Random Forest	Keras	Clustering	Tabular	NA	0.6604116	0.9251631	0.8056158	3.1739118	2023-09-10 11:26:21
SVM	TensorFlow	Regression	Image	NA	0.5323672	0.9184253	0.7660045	0.8450380	2023-09-12 11:26:21
SVM	Scikit-learn	Regression	Text	0.7598870	0.8413979	NA	0.5626578	1.8209750	2023-09-14 11:26:21
Neural Network	PyTorch	Clustering	Text	0.9144417	0.9598680	0.9484678	NA	2.4820394	2023-09-16 11:26:21
SVM	PyTorch	Regression	Tabular	NA	0.7855391	0.3133813	0.9680402	4.6116243	2023-09-17 11:26:21
Random Forest	Scikit-learn	Clustering	Tabular	0.5559598	0.8990182	NA	0.9745486	2.0495419	2023-10-02 11:26:21
SVM	PyTorch	Regression	Image	0.5610550	0.6577941	0.3999952	0.4611359	NA	2023-10-08 11:26:21
SVM	TensorFlow	Classification	Time Series	NA	0.6897813	0.7295229	0.6604126	0.2710666	2023-10-14 11:26:21
K-Means	Scikit-learn	Regression	Tabular	NA	0.8367636	0.3543545	0.8340687	4.2119839	2023-10-17 11:26:21
Random Forest	TensorFlow	Clustering	Tabular	0.8195600	0.6621104	NA	0.7536724	0.2223611	2023-10-19 11:26:21
Random Forest	PyTorch	Classification	Time Series	0.8334321	0.4812124	0.9633816	NA	0.6468057	2023-10-24 11:26:21
K-Means	Keras	Classification	Time Series	0.6009267	NA	0.3538035	0.5490258	4.0270843	2023-10-29 11:26:21
SVM	TensorFlow	Regression	Image	NA	0.5481873	0.7949605	0.5485359	0.7148829	2023-11-01 11:26:21
SVM	PyTorch	Classification	Text	NA	0.8660263	0.4967031	0.4486895	1.9978803	2023-11-07 11:26:21
K-Means	TensorFlow	Clustering	Text	0.9654743	NA	0.7483332	0.7700702	0.2475090	2023-11-15 11:26:21
K-Means	Scikit-learn	Classification	Time Series	0.8256165	0.7361009	0.9220614	0.9524600	NA	2023-11-21 11:26:21
SVM	Scikit-learn	Regression	Text	0.9392578	0.6783595	NA	0.8232765	3.5606062	2023-12-02 11:26:21
K-Means	Keras	Regression	Image	NA	0.4688613	0.5223108	0.7015787	1.0100921	2023-12-13 11:26:21
SVM	Keras	Clustering	Time Series	0.6980863	0.4406010	NA	0.9562664	0.4028615	2023-12-16 11:26:21
Random Forest	TensorFlow	Clustering	Image	NA	0.8247011	0.4933187	0.4632359	0.5525666	2023-12-17 11:26:21
Neural Network	PyTorch	Classification	Image	NA	0.6749804	0.8875369	0.7931042	0.8373141	2023-12-20 11:26:21
Neural Network	Keras	Classification	Tabular	0.6866259	NA	0.3026739	0.5561420	4.8435743	2023-12-21 11:26:21
Random Forest	Keras	Classification	Text	0.8097452	NA	0.5521637	0.7985316	3.3433768	2023-12-26 11:26:21
Random Forest	Keras	Clustering	Text	0.9316668	0.7272591	0.5193435	NA	NA	2023-12-29 11:26:21
Neural Network	PyTorch	Regression	Image	0.7395909	0.7075016	0.9243105	0.5735125	NA	2023-12-31 11:26:21
Neural Network	TensorFlow	Clustering	Image	NA	0.4311739	0.5641226	0.7090034	0.1337020	2024-01-03 11:26:21
Random Forest	PyTorch	Clustering	Time Series	0.9777618	NA	0.3030543	0.9716890	1.6380026	2024-01-11 11:26:21
SVM	PyTorch	Classification	Image	9.4901527	0.6707436	0.8327265	0.9257917	NA	2024-01-16 11:26:21
K-Means	Keras	Classification	Image	NA	0.9909938	0.9655409	0.4615408	2.2067923	2024-01-17 11:26:21
Random Forest	Scikit-learn	Clustering	Tabular	0.6536450	0.7940681	0.6502808	NA	2.2258352	2024-01-26 11:26:21
Random Forest	Scikit-learn	Classification	Time Series	NA	0.7807495	0.8952436	0.4011953	2.8763658	2024-02-05 11:26:21
K-Means	Keras	Classification	Image	NA	0.7198173	0.9161097	0.4968177	1.7012941	2024-02-09 11:26:21
SVM	PyTorch	Clustering	Text	0.7417728	NA	0.5067110	0.9345133	0.6118271	2024-02-13 11:26:21
K-Means	Scikit-learn	Clustering	Tabular	0.6767107	0.8821621	0.4873149	NA	1.5521523	2024-02-28 11:26:21
Random Forest	TensorFlow	Classification	Text	NA	0.6564938	0.5830028	0.7788514	0.3585498	2024-03-08 11:26:21
Neural Network	TensorFlow	Regression	Time Series	NA	0.6755591	0.3841451	0.6845530	2.3867963	2024-03-25 11:26:21
SVM	Scikit-learn	Classification	Tabular	0.8621694	0.4603825	0.3009475	NA	2.3497077	2024-04-02 11:26:21
K-Means	Scikit-learn	Regression	Image	0.6370803	0.4851832	0.7525211	NA	NA	2024-04-04 11:26:21
K-Means	Scikit-learn	Clustering	Image	0.9470954	0.8492958	0.3165165	0.8749349	NA	2024-04-07 11:26:21
K-Means	Keras	Classification	Image	0.9001783	0.4757588	0.4597648	NA	2.8547252	2024-04-15 11:26:21
Random Forest	Keras	Classification	Time Series	0.7664789	NA	0.3404911	0.6336430	3.1026996	2024-04-20 11:26:21
SVM	Scikit-learn	Clustering	Image	NA	0.7857291	0.6811376	0.4444633	2.7982506	2024-04-25 11:26:21
K-Means	Scikit-learn	Clustering	Text	0.5300712	NA	0.4577669	0.9983094	3.0979720	2024-04-29 11:26:21
Random Forest	Scikit-learn	Clustering	Image	NA	0.9629829	0.8469253	0.6103123	1.8389986	2024-05-03 11:26:21
Neural Network	Scikit-learn	Clustering	Image	NA	0.7989909	0.7451683	0.6785431	0.4435299	2024-05-07 11:26:21
K-Means	Scikit-learn	Regression	Text	0.8780817	0.5561721	0.5717160	NA	2.0367706	2024-05-08 11:26:21
SVM	Scikit-learn	Classification	Text	0.9210166	NA	0.3054891	0.4398780	3.6842832	2024-05-19 11:26:21
Neural Network	PyTorch	Classification	Tabular	NA	0.4252148	0.6133714	0.7502282	1.1801854	2024-05-21 11:26:21
K-Means	TensorFlow	Regression	Image	0.5638567	NA	0.6707618	NA	4.5904607	2024-05-24 11:26:21
Random Forest	PyTorch	Clustering	Tabular	0.9684789	0.5828507	0.6029712	NA	4.1399104	2024-06-02 11:26:21
Neural Network	Keras	Regression	Text	NA	0.9113583	0.4816792	0.7042358	4.8939385	2024-06-20 11:26:21
Random Forest	PyTorch	Clustering	Tabular	0.5662623	0.9134177	0.6650217	0.9634411	NA	2024-07-01 11:26:21
Neural Network	Keras	Clustering	Tabular	0.9310072	NA	0.9841155	0.7818246	NA	2024-07-02 11:26:21
Random Forest	Keras	Classification	Image	0.5129060	0.7104678	NA	0.8565108	2.1442478	2024-07-13 11:26:21
K-Means	TensorFlow	Clustering	Tabular	0.8251005	0.7074183	0.3204188	NA	1.2456829	2024-07-26 11:26:21
Random Forest	TensorFlow	Classification	Time Series	0.5711247	0.5526349	NA	0.4403535	2.9123543	2024-08-06 11:26:21
SVM	TensorFlow	Regression	Image	0.8460807	0.4840145	0.7039858	NA	0.1553824	2024-08-10 11:26:21
SVM	TensorFlow	Clustering	Time Series	0.8076097	0.5176586	NA	0.4242105	2.0486536	2024-08-15 11:26:21
Random Forest	Scikit-learn	Classification	Time Series	0.6531268	NA	0.7596418	0.6467149	4.8716220	2024-08-16 11:26:21
Neural Network	Keras	Regression	Time Series	NA	0.5314413	0.4545969	0.5876668	1.9361377	2024-08-22 11:26:21
SVM	PyTorch	Clustering	Text	0.9137689	0.8815514	0.9067392	0.8586120	NA	2024-08-26 11:26:21
Random Forest	Keras	Clustering	Text	0.8846524	NA	0.9653214	0.8573816	1.1991609	2024-08-28 11:26:21
K-Means	Scikit-learn	Classification	Text	0.8801449	0.4683030	0.4977472	NA	1.7535260	2024-09-03 11:26:21
SVM	Keras	Regression	Time Series	NA	0.6961279	0.9247908	0.8540381	3.7870949	2024-09-07 11:26:21
Neural Network	Scikit-learn	Regression	Tabular	0.9007686	0.4788935	NA	0.6335383	2.2663245	2024-09-10 11:26:21
Random Forest	TensorFlow	Clustering	Image	0.7271887	NA	0.5320953	0.6051432	2.9027798	2024-09-15 11:26:21

Se puede apreciar que la falta de datos no está relacionada con los datos. No se sabe a ciencia cierta si estos datos faltantes fueron borrados al azar o se perdieron algunos formularios, además, los datos no comparten un patrón claro, por ejemplo: que los datos faltantes en Accuracy son solo los que tienen SVM en Algorithm.

Ahora analizaremos como cambian las medidas de localización, dispersion y distribución.
#### Accuracy

Localización:

summary(datos$Accuracy)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.5038  0.6236  0.7578  0.8779  0.8824  9.7181      39

summary(datos_sin_na$Accuracy)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5038  0.6183  0.7490  0.8458  0.8698  9.7181

Dispersión:

media<-mean(datos$Accuracy, na.rm = TRUE)
sd_a<-sd(datos$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 107.4343

media<-mean(datos_sin_na$Accuracy, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 97.14993

Distribución:

media <- mean(datos$Accuracy, na.rm = TRUE)
mediana <- median(datos$Accuracy, na.rm = TRUE)
desviacion <- sd(datos$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.38177

media <- mean(datos_sin_na$Accuracy, na.rm = TRUE)
mediana <- median(datos_sin_na$Accuracy, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3532361

Observación: Las medidas en la variable Accuracy no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Precision

Localización:

summary(datos$Precision)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.4019  0.5632  0.7195  0.8129  0.8596  9.7320      19

summary(datos_sin_na$Precision)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4031  0.5661  0.7275  0.8387  0.8670  9.7320

Dispersión:

media <- mean(datos$Precision, na.rm = TRUE)
sd_p <- sd(datos$Precision, na.rm = TRUE)
cv_p <- (sd_p / media) * 100
cv_p

## [1] 104.7427

media <- mean(datos_sin_na$Precision, na.rm = TRUE)
sd_p <- sd(datos_sin_na$Precision, na.rm = TRUE)
cv_p <- (sd_p / media) * 100
cv_p

## [1] 110.986

Distribución:

media <- mean(datos$Precision, na.rm = TRUE)
mediana <- median(datos$Precision, na.rm = TRUE)
desviacion <- sd(datos$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3292674

media <- mean(datos_sin_na$Precision, na.rm = TRUE)
mediana <- median(datos_sin_na$Precision, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.358299

Observación: Las medidas en la variable Precision no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Recall

Localización:

summary(datos$Recall)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.3001  0.4819  0.6493  0.7486  0.8404  9.3662      20

summary(datos_sin_na$Recall)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3001  0.4898  0.6513  0.7611  0.8365  9.3662

Dispersión:

media <- mean(datos$Recall, na.rm = TRUE)
sd_r <- sd(datos$Recall, na.rm = TRUE)
cv_r <- (sd_r / media) * 100
cv_r

## [1] 104.7911

media <- mean(datos_sin_na$Recall, na.rm = TRUE)
sd_r <- sd(datos_sin_na$Recall, na.rm = TRUE)
cv_r <- (sd_r / media) * 100
cv_r

## [1] 109.2378

Distribución:

media <- mean(datos$Recall, na.rm = TRUE)
mediana <- median(datos$Recall, na.rm = TRUE)
desviacion <- sd(datos$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3797481

media <- mean(datos_sin_na$Recall, na.rm = TRUE)
mediana <- median(datos_sin_na$Recall, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3961938

Observación: Las medidas en la variable Recall no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

F1_Score

Localización:

summary(datos$F1_Score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.4000  0.5515  0.7086  0.8122  0.8438  9.3740      20

summary(datos_sin_na$F1_Score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4000  0.5486  0.7031  0.8014  0.8396  9.3740

Dispersión:

media <- mean(datos$F1_Score, na.rm = TRUE)
sd_f1 <- sd(datos$F1_Score, na.rm = TRUE)
cv_f1 <- (sd_f1 / media) * 100
cv_f1

## [1] 109.9297

media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
sd_f1 <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
cv_f1 <- (sd_f1 / media) * 100
cv_f1

## [1] 109.1754

Distribución:

media <- mean(datos$F1_Score, na.rm = TRUE)
mediana <- median(datos$F1_Score, na.rm = TRUE)
desviacion <- sd(datos$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3479023

media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
mediana <- median(datos_sin_na$F1_Score, na.rm = TRUE)
desviacion <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3369211

Observación: Las medidas en la variable F1_Score no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Training_Time

Localización:

summary(datos$Training_Time)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.1032  1.2441  2.4347  2.9910  3.8131 46.9856      20

summary(datos_sin_na$Training_Time)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1032  1.2982  2.4809  3.0598  3.8446 46.9856

Dispersión:

media <- mean(datos$Training_Time, na.rm = TRUE)
sd_tt <- sd(datos$Training_Time, na.rm = TRUE)
cv_tt <- (sd_tt / media) * 100
cv_tt

## [1] 147.9032

media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
sd_tt <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
cv_tt <- (sd_tt / media) * 100
cv_tt

## [1] 151.8546

Distribución:

media <- mean(datos$Training_Time, na.rm = TRUE)
mediana <- median(datos$Training_Time, na.rm = TRUE)
desviacion <- sd(datos$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3772653

media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
mediana <- median(datos_sin_na$Training_Time, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.3737476

Observación: Las medidas en la variable Training_Time no cambian mucho a la hora de eliminar las filas, y esto no representa obstáculo alguno a nuestra pregunta problema.

Conclusion: Se puede concluir que el tratamiento de datos NA’s no afecta la distribucuión de los datos y el cambio que sufrieron las medidas no afecta en nada el análisis.

Detección de outliers y limpieza

Para la detección de datos atípicos usaremos el gráfico de caja y bigotes, y el filtro de Hampel se usará por encima de los percentiles porque este sirve más para variables que no siguen una distribución normal o tiene colas muy largas, como es el caso. Además, trabaja con medidas más robustas como la media y la mediana

Accuracy

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Accuracy) +
  geom_boxplot(fill = "#0c4c8a") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Accuracy) - 3 * mad(datos_sin_na$Accuracy, constant = 1)
lower_bound

## [1] 0.3595111

upper_bound <- median(datos_sin_na$Accuracy) + 3 * mad(datos_sin_na$Accuracy, constant = 1)
upper_bound

## [1] 1.13852

outlier_ind <- which(datos_sin_na$Accuracy < lower_bound | datos_sin_na$Accuracy > upper_bound)
outlier_ind

## [1]  15  77 110 112 196 232 239

datos_sin_na[outlier_ind, ]

## # A tibble: 7 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Random Forest  PyTorch     Regression   Text             9.72     0.782  0.548
## 2 Neural Network PyTorch     Regression   Time Series      5.26     0.506  0.829
## 3 K-Means        TensorFlow  Regression   Tabular          7.13     0.521  0.441
## 4 SVM            Scikit-lea… Regression   Tabular          5.20     0.489  0.680
## 5 SVM            TensorFlow  Classificat… Image            8.29     0.798  0.753
## 6 Random Forest  PyTorch     Clustering   Tabular          7.90     0.521  0.363
## 7 SVM            Scikit-lea… Regression   Tabular          5.98     0.928  0.799
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)

## 
## Adjuntando el paquete: 'EnvStats'

## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm

test <- rosnerTest(datos_sin_na$Accuracy, k = 7)
test$all.stats

##   i    Mean.i      SD.i    Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1 0 0.8457621 0.8216572 9.718080      15 10.79808   3.833870    TRUE
## 2 1 0.8259135 0.7069241 8.294427     196 10.56480   3.833271    TRUE
## 3 2 0.8091679 0.6125670 7.900862     232 11.57701   3.832670    TRUE
## 4 3 0.7932315 0.5124044 7.127467     110 12.36179   3.832068    TRUE
## 5 4 0.7789652 0.4151830 5.978890     239 12.52442   3.831464    TRUE
## 6 5 0.7672273 0.3338475 5.259856      77 13.45713   3.830859    TRUE
## 7 6 0.7570629 0.2565839 5.200546     112 17.31786   3.830252    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$Accuracy, probs=c(.05, .95), na.rm = T)

datos_sin_na$Accuracy[datos_sin_na$Accuracy < lower_bound] <- caps[1]
datos_sin_na$Accuracy[datos_sin_na$Accuracy > upper_bound] <- caps[2]

datos_sin_na[outlier_ind, ]

## # A tibble: 7 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Random Forest  PyTorch     Regression   Text            0.988     0.782  0.548
## 2 Neural Network PyTorch     Regression   Time Series     0.988     0.506  0.829
## 3 K-Means        TensorFlow  Regression   Tabular         0.988     0.521  0.441
## 4 SVM            Scikit-lea… Regression   Tabular         0.988     0.489  0.680
## 5 SVM            TensorFlow  Classificat… Image           0.988     0.798  0.753
## 6 Random Forest  PyTorch     Clustering   Tabular         0.988     0.521  0.363
## 7 SVM            Scikit-lea… Regression   Tabular         0.988     0.928  0.799
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafica de accuracy sin Outeliers

ggplot(datos_sin_na, aes(y = Accuracy)) +
  geom_boxplot(outlier.colour = "orange", outlier.shape = 16, outlier.size = 2, fill = "skyblue", color = "darkblue") +
  labs(title = "Caja de Bigote de Accuracy sin outeliers",
       y = "Accuracy") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Precision

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Precision) +
  geom_boxplot(fill = "darkred") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Precision) - 3 * mad(datos_sin_na$Precision, constant = 1)
lower_bound

## [1] 0.2777772

upper_bound <- median(datos_sin_na$Precision) + 3 * mad(datos_sin_na$Precision, constant = 1)
upper_bound

## [1] 1.177228

outlier_ind <- which(datos_sin_na$Precision < lower_bound | datos_sin_na$Precision > upper_bound)
outlier_ind

##  [1]   6  96 111 157 223 241 250 288 433 439

length(outlier_ind)

## [1] 10

datos_sin_na[outlier_ind, ]

## # A tibble: 10 × 10
##    Algorithm      Framework  Problem_Type Dataset_Type Accuracy Precision Recall
##    <chr>          <chr>      <chr>        <chr>           <dbl>     <dbl>  <dbl>
##  1 SVM            PyTorch    Regression   Image           0.897      9.73  0.781
##  2 SVM            Scikit-le… Classificat… Image           0.591      4.06  0.482
##  3 K-Means        TensorFlow Clustering   Time Series     0.686      6.21  0.328
##  4 K-Means        Scikit-le… Classificat… Time Series     0.603      4.15  7.74 
##  5 Neural Network Keras      Clustering   Text            0.537      9.67  0.819
##  6 SVM            TensorFlow Classificat… Image           0.824      5.43  0.976
##  7 Random Forest  TensorFlow Regression   Text            0.902      8.93  0.603
##  8 SVM            PyTorch    Clustering   Text            0.584      4.08  0.717
##  9 SVM            Keras      Regression   Text            0.506      5.76  0.707
## 10 K-Means        PyTorch    Regression   Tabular         0.768      7.04  0.926
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$Precision, k = 10)
test$all.stats

##    i    Mean.i      SD.i    Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 0.8386718 0.9308086 9.732008       6  9.55442   3.833870    TRUE
## 2  1 0.8187762 0.8310328 9.674189     223 10.65591   3.833271    TRUE
## 3  2 0.7989210 0.7180191 8.932619     250 11.32797   3.832670    TRUE
## 4  3 0.7806431 0.6061150 7.044472     439 10.33439   3.832068    TRUE
## 5  4 0.7665353 0.5286183 6.207645     111 10.29308   3.831464    TRUE
## 6  5 0.7542529 0.4614512 5.760933     433 10.84986   3.830859    TRUE
## 7  6 0.7429256 0.3955383 5.432777     241 11.85688   3.830252    TRUE
## 8  7 0.7322910 0.3266570 4.145151     157 10.44784   3.829643    TRUE
## 9  8 0.7245345 0.2834703 4.075645     288 11.82174   3.829033    TRUE
## 10 9 0.7169010 0.2341822 4.055990      96 14.25851   3.828422    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

Para el tratamiento de valoresa atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$Precision, probs=c(.05, .95), na.rm = T)

datos_sin_na$Precision[datos_sin_na$Precision < lower_bound] <- caps[1]
datos_sin_na$Precision[datos_sin_na$Precision > upper_bound] <- caps[2]

datos_sin_na[outlier_ind, ]

## # A tibble: 10 × 10
##    Algorithm      Framework  Problem_Type Dataset_Type Accuracy Precision Recall
##    <chr>          <chr>      <chr>        <chr>           <dbl>     <dbl>  <dbl>
##  1 SVM            PyTorch    Regression   Image           0.897     0.982  0.781
##  2 SVM            Scikit-le… Classificat… Image           0.591     0.982  0.482
##  3 K-Means        TensorFlow Clustering   Time Series     0.686     0.982  0.328
##  4 K-Means        Scikit-le… Classificat… Time Series     0.603     0.982  7.74 
##  5 Neural Network Keras      Clustering   Text            0.537     0.982  0.819
##  6 SVM            TensorFlow Classificat… Image           0.824     0.982  0.976
##  7 Random Forest  TensorFlow Regression   Text            0.902     0.982  0.603
##  8 SVM            PyTorch    Clustering   Text            0.584     0.982  0.717
##  9 SVM            Keras      Regression   Text            0.506     0.982  0.707
## 10 K-Means        PyTorch    Regression   Tabular         0.768     0.982  0.926
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafica de Precision sin Outeliers

ggplot(datos_sin_na, aes(y = Precision)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 2, fill = "gold", color = "darkorange") +
  labs(title = "Caja de Bigote de Precision sin Outeliers",
       y = "Precision") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Recall

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Recall) +
  geom_boxplot(fill = "#0c4") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Recall) - 3 * mad(datos_sin_na$Recall, constant = 1)
lower_bound

## [1] 0.1155031

upper_bound <- median(datos_sin_na$Recall) + 3 * mad(datos_sin_na$Recall, constant = 1)
upper_bound

## [1] 1.187093

outlier_ind <- which(datos_sin_na$Recall < lower_bound | datos_sin_na$Recall > upper_bound)
outlier_ind

## [1]   4  88 114 157 221 270 303 308 420

length(outlier_ind)

## [1] 9

datos_sin_na[outlier_ind, ]

## # A tibble: 9 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 K-Means        PyTorch     Regression   Image           0.637     0.626   7.45
## 2 K-Means        PyTorch     Clustering   Image           0.822     0.725   5.73
## 3 K-Means        Scikit-lea… Clustering   Image           0.719     0.998   9.37
## 4 K-Means        Scikit-lea… Classificat… Time Series     0.603     0.982   7.74
## 5 Neural Network PyTorch     Classificat… Text            0.724     0.449   3.44
## 6 K-Means        Keras       Clustering   Image           0.801     0.521   5.77
## 7 Neural Network TensorFlow  Classificat… Text            0.702     0.615   4.86
## 8 Neural Network TensorFlow  Clustering   Time Series     0.564     0.827   5.44
## 9 Random Forest  TensorFlow  Clustering   Image           0.531     0.566   5.50
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$Recall, k = 9)
test$all.stats

##   i    Mean.i      SD.i    Value Obs.Num     R.i+1 lambda.i+1 Outlier
## 1 0 0.7610971 0.8314054 9.366182     114 10.350047   3.833870    TRUE
## 2 1 0.7418463 0.7255257 7.737749     157  9.642529   3.833271    TRUE
## 3 2 0.7261605 0.6460189 7.454810       4 10.415561   3.832670    TRUE
## 4 3 0.7110399 0.5622109 5.765916     270  8.991068   3.832068    TRUE
## 5 4 0.6996551 0.5089064 5.726373      88  9.877490   3.831464    TRUE
## 6 5 0.6883081 0.4497505 5.499848     420 10.698244   3.830859    TRUE
## 7 6 0.6774222 0.3874519 5.436669     308 12.283452   3.830252    TRUE
## 8 7 0.6666303 0.3144283 4.859080     303 13.333562   3.829643    TRUE
## 9 8 0.6571020 0.2428199 3.438827     221 11.455922   3.829033    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

caps <- quantile(datos_sin_na$Recall, probs=c(.05, .95), na.rm = T)

datos_sin_na$Recall[datos_sin_na$Recall < lower_bound] <- caps[1]
datos_sin_na$Recall[datos_sin_na$Recall > upper_bound] <- caps[2]

datos_sin_na[outlier_ind, ]

## # A tibble: 9 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 K-Means        PyTorch     Regression   Image           0.637     0.626  0.976
## 2 K-Means        PyTorch     Clustering   Image           0.822     0.725  0.976
## 3 K-Means        Scikit-lea… Clustering   Image           0.719     0.998  0.976
## 4 K-Means        Scikit-lea… Classificat… Time Series     0.603     0.982  0.976
## 5 Neural Network PyTorch     Classificat… Text            0.724     0.449  0.976
## 6 K-Means        Keras       Clustering   Image           0.801     0.521  0.976
## 7 Neural Network TensorFlow  Classificat… Text            0.702     0.615  0.976
## 8 Neural Network TensorFlow  Clustering   Time Series     0.564     0.827  0.976
## 9 Random Forest  TensorFlow  Clustering   Image           0.531     0.566  0.976
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafica de Recall sin Outeliers

ggplot(datos_sin_na, aes(y = Recall)) +
  geom_boxplot(outlier.colour = "blue", outlier.shape = 16, outlier.size = 2, fill = "lightgreen", color = "darkgreen") +
  labs(title = "Caja de Bigote de Recall",
       y = "Recall") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

F1_Score

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = F1_Score) +
  geom_boxplot(fill = "#0C945689") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$F1_Score) - 3 * mad(datos_sin_na$F1_Score, constant = 1)
lower_bound

## [1] 0.2672388

upper_bound <- median(datos_sin_na$F1_Score) + 3 * mad(datos_sin_na$F1_Score, constant = 1)
upper_bound

## [1] 1.139019

outlier_ind <- which(datos_sin_na$F1_Score < lower_bound | datos_sin_na$F1_Score > upper_bound)
outlier_ind

## [1] 160 230 267 281 296 316 333 437

length(outlier_ind)

## [1] 8

datos_sin_na[outlier_ind, ]

## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 SVM            TensorFlow  Regression   Tabular         0.766     0.579  0.817
## 2 Neural Network Scikit-lea… Clustering   Image           0.671     0.676  0.937
## 3 SVM            PyTorch     Clustering   Image           0.522     0.566  0.817
## 4 K-Means        TensorFlow  Regression   Tabular         0.637     0.651  0.565
## 5 K-Means        PyTorch     Classificat… Text            0.773     0.915  0.841
## 6 K-Means        TensorFlow  Classificat… Image           0.677     0.905  0.621
## 7 Random Forest  Scikit-lea… Clustering   Time Series     0.588     0.792  0.736
## 8 Neural Network PyTorch     Classificat… Text            0.630     0.770  0.831
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$F1_Score, k = 8)
test$all.stats

##   i    Mean.i      SD.i    Value Obs.Num     R.i+1 lambda.i+1 Outlier
## 1 0 0.8013885 0.8749189 9.374049     296  9.798234   3.833870    TRUE
## 2 1 0.7822103 0.7759213 9.295359     316 10.971665   3.833271    TRUE
## 3 2 0.7631225 0.6634602 8.178579     281 11.176942   3.832670    TRUE
## 4 3 0.7464585 0.5630661 7.747684     437 12.434110   3.832068    TRUE
## 5 4 0.7306900 0.4548205 5.499742     230 10.485569   3.831464    TRUE
## 6 5 0.7199247 0.3946604 5.320668     333 11.657473   3.830859    TRUE
## 7 6 0.7095157 0.3286397 5.131244     160 13.454635   3.830252    TRUE
## 8 7 0.6994892 0.2524146 4.632073     267 15.579855   3.829643    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping Para el tratamiento de valores atípicos se ha decidido usar Capping. Primero, se identifico el 5th percentil y el 95th percentil, luego se reemplazaron por el 5th percentillos valores menores al limite inferior conseguido anteriormente, y asimismo con los valores mayores al limite superior.

caps <- quantile(datos_sin_na$F1_Score, probs=c(.05, .95), na.rm = T)

datos_sin_na$F1_Score[datos_sin_na$F1_Score < lower_bound] <- caps[1]
datos_sin_na$F1_Score[datos_sin_na$F1_Score > upper_bound] <- caps[2]

datos_sin_na[outlier_ind, ]

## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 SVM            TensorFlow  Regression   Tabular         0.766     0.579  0.817
## 2 Neural Network Scikit-lea… Clustering   Image           0.671     0.676  0.937
## 3 SVM            PyTorch     Clustering   Image           0.522     0.566  0.817
## 4 K-Means        TensorFlow  Regression   Tabular         0.637     0.651  0.565
## 5 K-Means        PyTorch     Classificat… Text            0.773     0.915  0.841
## 6 K-Means        TensorFlow  Classificat… Image           0.677     0.905  0.621
## 7 Random Forest  Scikit-lea… Clustering   Time Series     0.588     0.792  0.736
## 8 Neural Network PyTorch     Classificat… Text            0.630     0.770  0.831
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Grafico sin Outeliers de F1_Score

ggplot(datos_sin_na, aes(y = F1_Score)) +
  geom_boxplot(outlier.colour = "purple", outlier.shape = 16, outlier.size = 2, fill = "yellow", color = "orange") +
  labs(title = "Caja de Bigote de F1 Score",
       y = "F1 Score") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Training_Time

Gráfica de caja y bigotes

ggplot(datos_sin_na) +
  aes(x = "", y = Training_Time) +
  geom_boxplot(fill = "#F39C12") +
  theme_minimal()

Filtro de Hampel

lower_bound <- median(datos_sin_na$Training_Time) - 3 * mad(datos_sin_na$Training_Time, constant = 1)
lower_bound

## [1] -1.363486

upper_bound <- median(datos_sin_na$Training_Time) + 3 * mad(datos_sin_na$Training_Time, constant = 1)
upper_bound

## [1] 6.325322

outlier_ind <- which(datos_sin_na$Training_Time < lower_bound | datos_sin_na$Training_Time > upper_bound)
outlier_ind

## [1] 100 109 201 214 217 324 344 417

length(outlier_ind)

## [1] 8

datos_sin_na[outlier_ind, ]

## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Neural Network PyTorch     Regression   Tabular         0.987     0.733  0.706
## 2 Neural Network Scikit-lea… Classificat… Time Series     0.871     0.847  0.734
## 3 Neural Network PyTorch     Clustering   Tabular         0.524     0.556  0.737
## 4 Neural Network Keras       Clustering   Tabular         0.718     0.551  0.394
## 5 Random Forest  TensorFlow  Classificat… Text            0.772     0.714  0.824
## 6 SVM            Scikit-lea… Regression   Tabular         0.662     0.913  0.970
## 7 K-Means        TensorFlow  Regression   Tabular         0.996     0.728  0.795
## 8 K-Means        PyTorch     Regression   Time Series     0.561     0.617  0.828
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

Para ver si estos datos en verdad son atípicos, podemos usar la prueba de Rosner que sirve para muestras grandes.

library(EnvStats)
test <- rosnerTest(datos_sin_na$Training_Time, k = 8)
test$all.stats

##   i   Mean.i     SD.i    Value Obs.Num     R.i+1 lambda.i+1 Outlier
## 1 0 3.059781 4.646419 46.98563     324  9.453699   3.833870    TRUE
## 2 1 2.961513 4.159537 46.83874     217 10.548585   3.833271    TRUE
## 3 2 2.863133 3.606191 44.58645     100 11.569913   3.832670    TRUE
## 4 3 2.769373 3.017332 44.35790     201 13.783214   3.832068    TRUE
## 5 4 2.675705 2.282925 28.29499     344 11.222129   3.831464    TRUE
## 6 5 2.617874 1.932676 20.93344     109  9.476788   3.830859    TRUE
## 7 6 2.576436 1.726646 20.25186     417 10.236856   3.830252    TRUE
## 8 7 2.536356 1.508782 13.79138     214  7.459672   3.829643    TRUE

Con esta prueba concluimos que todos los posibles valores atípicos, en realidad lo son.

Capping

caps <- quantile(datos_sin_na$Training_Time, probs=c(.05, .95), na.rm = T)

datos_sin_na$Training_Time[datos_sin_na$Training_Time < lower_bound] <- caps[1]
datos_sin_na$Training_Time[datos_sin_na$Training_Time > upper_bound] <- caps[2]

Grafico sin outeliers de Training time

datos_sin_na[outlier_ind, ]

## # A tibble: 8 × 10
##   Algorithm      Framework   Problem_Type Dataset_Type Accuracy Precision Recall
##   <chr>          <chr>       <chr>        <chr>           <dbl>     <dbl>  <dbl>
## 1 Neural Network PyTorch     Regression   Tabular         0.987     0.733  0.706
## 2 Neural Network Scikit-lea… Classificat… Time Series     0.871     0.847  0.734
## 3 Neural Network PyTorch     Clustering   Tabular         0.524     0.556  0.737
## 4 Neural Network Keras       Clustering   Tabular         0.718     0.551  0.394
## 5 Random Forest  TensorFlow  Classificat… Text            0.772     0.714  0.824
## 6 SVM            Scikit-lea… Regression   Tabular         0.662     0.913  0.970
## 7 K-Means        TensorFlow  Regression   Tabular         0.996     0.728  0.795
## 8 K-Means        PyTorch     Regression   Time Series     0.561     0.617  0.828
## # ℹ 3 more variables: F1_Score <dbl>, Training_Time <dbl>, Date <dttm>

ggplot(datos_sin_na, aes(y = Training_Time)) +
  geom_boxplot(outlier.colour = "brown", outlier.shape = 16, outlier.size = 2, fill = "lightcoral", color = "darkred") +
  labs(title = "Caja de Bigote de Training Time",
       y = "Training Time") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Analisis de las variables numéricas después de los tratamientos

Accuracy

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Accuracy))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.5038	0.6183	0.749	0.7507	0.8698	0.9997

Coeficiente de dispersión

media<-mean(datos_sin_na$Accuracy, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Accuracy, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 19.56421

Asimetría

media <- mean(datos_sin_na$Accuracy, na.rm = TRUE)
mediana <- median(datos_sin_na$Accuracy, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Accuracy, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.03537286

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Accuracy)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Accuracy
## A = 4.9922, p-value = 2.319e-12

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Accuracy), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Accuracy") +
  xlab("Accuracy") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Precision

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Precision))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.4031	0.5661	0.7275	0.7154	0.867	0.999

Coeficiente de dispersión

media<-mean(datos_sin_na$Precision, na.rm = TRUE)
sd_a<-sd(datos_sin_na$Precision, na.rm=TRUE)
cv_a<-(sd_a/media)*100
cv_a

## [1] 24.35969

Asimetría

media <- mean(datos_sin_na$Precision, na.rm = TRUE)
mediana <- median(datos_sin_na$Precision, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Precision, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] -0.2088443

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Precision)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Precision
## A = 5.5097, p-value = 1.331e-13

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Precision), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Precision") +
  xlab("Precision") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Recall

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Recall))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.3001	0.4898	0.6513	0.6573	0.8365	0.9985

Coeficiente de dispersión

media <- mean(datos_sin_na$Recall, na.rm = TRUE)
sd_a <- sd(datos_sin_na$Recall, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a

## [1] 31.41571

Asimetría

media <- mean(datos_sin_na$Recall, na.rm = TRUE)
mediana <- median(datos_sin_na$Recall, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Recall, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.08711636

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Recall)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Recall
## A = 5.1701, p-value = 8.675e-13

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Recall), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Recall") +
  xlab("Recall") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

F1_Score

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$F1_Score))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.4	0.5486	0.7031	0.6955	0.8396	0.9993

Coeficiente de dispersión

media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
sd_a <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a

## [1] 24.64901

Asimetría

media <- mean(datos_sin_na$F1_Score, na.rm = TRUE)
mediana <- median(datos_sin_na$F1_Score, na.rm = TRUE)
desviacion <- sd(datos_sin_na$F1_Score, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] -0.1334735

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$F1_Score)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$F1_Score
## A = 5.019, p-value = 2e-12

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$F1_Score), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de F1_Score") +
  xlab("F1_Score") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Training_Time

Medidas de tendencia central

library(pander)
pander(summary(datos_sin_na$Training_Time))

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0.1032	1.298	2.481	2.552	3.845	4.998

Coeficiente de dispersión

media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
sd_a <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
cv_a <- (sd_a / media) * 100
cv_a

## [1] 56.11634

Asimetría

media <- mean(datos_sin_na$Training_Time, na.rm = TRUE)
mediana <- median(datos_sin_na$Training_Time, na.rm = TRUE)
desviacion <- sd(datos_sin_na$Training_Time, na.rm = TRUE)
asimetria_pearson <- 3 * (media - mediana) / desviacion
asimetria_pearson

## [1] 0.1492231

Prueba de normalidad

library(nortest)
ad_test <- ad.test(datos_sin_na$Training_Time)
print(ad_test)

## 
##  Anderson-Darling normality test
## 
## data:  datos_sin_na$Training_Time
## A = 6.1063, p-value = 4.996e-15

Gráfica

library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(gridExtra)
library(missForest)
ggp1 <- ggplot(data.frame(value=datos_sin_na$Training_Time), aes(x=value)) +
  geom_histogram(fill="#FD0000", color="#E52521", alpha=0.9) +
  ggtitle("Histograma de Training_Time") +
  xlab("Training_Time") + ylab("Frequencia") +
  theme_ipsum() +
  theme(plot.title = element_text(size=15))
ggp1

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Analisis Bivariado

Recordemos que nuestra pregunta problema es: ¿Cómo varía la precisión (Precision) de los diferentes algoritmos de aprendizaje automático según el tipo de problema (Problem_Type)?

Para resolver esta pregunta haremos 2 tipos de graficas:

Grafico de puntos

ggplot(datos_sin_na, aes(x = Problem_Type, y = Precision)) +
  geom_jitter(aes(color = Problem_Type), size = 3, width = 0.2) +
  labs(title = "Precisión por Tipo de Problema",
       x = "Tipo de Problema",
       y = "Precisión") +
  theme_minimal()

Este gráfico muestra la distribución de las precisiones individuales para cada tipo de problema:clasificación, agrupamiento y regresión.

En el eje vertical se observa la precisión, que varía entre 0.4 y 1.0, mientras que el eje horizontal clasifica los tres tipos de problemas. Cada punto representa la precisión de un modelo en su respectiva categoría. Se puede ver que los modelos de clasificación y agrupamiento generalmente tienen una precisión más alta y concentrada en valores cercanos a 1.0, mientras que los modelos de regresión presentan una mayor dispersión y una precisión generalmente más baja.

Grafico de barra con la media de precisión

ggplot(datos_sin_na, aes(x = Problem_Type, y = Precision, fill = Problem_Type)) +
  stat_summary(fun = mean, geom = "bar", position = "dodge") +
  labs(title = "Media de Precisión por Tipo de Problema",
       x = "Tipo de Problema",
       y = "Media de Precisión") +
  theme_minimal()

Este gráfico presenta el promedio de las precisiones obtenidas, con regresión ligeramente superior a clasificación y clustering, aunque la diferencia es mínima. Esto sugiere que los modelos en cada tipo de problema tienden a obtener niveles de precisión comparables en promedio.

Analisis combinado

Variabilidad: La primera gráfica (strip plot) muestra la dispersión de la precisión de cada modelo para los tres tipos de problemas. Aquí se observa que los modelos de clasificación y clustering tienden a tener una precisión bastante concentrada en niveles altos, mientras que los modelos de regresión presentan una mayor variabilidad, con algunos modelos alcanzando precisiones bajas.
La segunda gráfica (gráfico de barras) presenta las medias de precisión para cada tipo de problema. En esta gráfica se observa que, en promedio, los tres tipos de problemas tienen medias de precisión similares, alrededor de 0.65, con los problemas de regresión mostrando una ligera ventaja.

Conclusión

El análisis de la precisión de diferentes algoritmos de aprendizaje automático muestra que los métodos de regresión son más efectivos en comparación con los de clasificación y clustering, presentando una mayor mediana y un rango más amplio de valores. Mientras que la clasificación y el clustering muestran un rendimiento similar y menor.

EDA IA

Natalia Alvarado, Sergio Rada

2024-10-05

Cargar la base de datos

Estructura del Dataset

Base de datos completa

Estructura general

Resumen estadístico

Variables Categóricas

-Algorithm

-Framework

-Problem_Type

-Dataset_Type

Variables Numéricas y Tratamiento de datos NA

-Accuracy

-Precision

-Recall

-F1_Score

-Training_Time

RESUMEN GENERAL:

TRATAMIENTO DE LOS DATOS FALTANTES (NA’S)

Precision

Recall

F1_Score

Training_Time

Detección de outliers y limpieza

Accuracy

Precision

Recall

F1_Score

Training_Time

Analisis de las variables numéricas después de los tratamientos

Accuracy

Precision

Recall

F1_Score

Training_Time

Analisis Bivariado