date pm25 pm10
1 2024-06-21 00:00:00 NA NA
2 2024-06-21 01:00:00 NA NA
3 2024-06-21 02:00:00 16.74790 22.2621
4 2024-06-21 03:00:00 11.69750 23.9225
5 2024-06-21 04:00:00 NA 10.4109
6 2024-06-21 05:00:00 5.31512 14.8207
Primeramente determinamos el K óptimo que será utilizado como principal parámetro en la copletación de datos faltantes:
Code
datos_sin_na <-na.omit(df$df)# Configuración de control para la validación cruzadacontrol <-trainControl(method ="cv", number =10)# Búsqueda del k óptimoset.seed(123)modelo <-train(pm25~., data = datos_sin_na, method ="knn",trControl = control,tuneLength =20)# Resultadosprint(modelo)
k-Nearest Neighbors
219 samples
16 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 197, 197, 198, 197, 197, 198, ...
Resampling results across tuning parameters:
k RMSE Rsquared MAE
5 8.358520 0.21826283 6.377855
7 8.858393 0.14489171 6.855358
9 9.063432 0.12434717 7.012211
11 9.035783 0.12666836 6.954638
13 9.020266 0.10261145 6.966740
15 8.934417 0.10409365 6.898997
17 9.003863 0.09059925 7.007185
19 9.043460 0.08682066 7.034710
21 9.066077 0.07656560 7.029111
23 9.084355 0.07070887 6.993344
25 9.064549 0.07238900 6.984655
27 9.101419 0.06036228 7.048669
29 9.123851 0.05822914 7.054266
31 9.178495 0.05024674 7.119413
33 9.212477 0.04410112 7.160160
35 9.230600 0.04782410 7.203478
37 9.237239 0.04794015 7.219297
39 9.208216 0.05837104 7.214041
41 9.163661 0.07119001 7.183212
43 9.157573 0.06870827 7.183684
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 5.
Code
plot(modelo)
De acuerdo con el test realizado el valor óptimo es k = 5.
Aplicando kNN para la compmetación de datos de \(PM_{10}\) y \(PM_{2.5}\)