Este conjunto de datos contiene observaciones sobre 11 variables, así como los niveles de concentración de 7 algas nocivas. Los valores se midieron en varios ríos europeos. Las 11 variables predictoras incluyen 3 variables contextuales (temporada, tamaño y velocidad) que describen la muestra de agua, más 8 mediciones de concentración química.
library("DMwR")
## Warning: package 'DMwR' was built under R version 3.6.3
data(algae)
head(algae, 10) ; dim(algae)
## season size speed mxPH mnO2 Cl NO3 NH4 oPO4 PO4 Chla
## 1 winter small medium 8.00 9.8 60.800 6.238 578.000 105.000 170.000 50.000
## 2 spring small medium 8.35 8.0 57.750 1.288 370.000 428.750 558.750 1.300
## 3 autumn small medium 8.10 11.4 40.020 5.330 346.667 125.667 187.057 15.600
## 4 spring small medium 8.07 4.8 77.364 2.302 98.182 61.182 138.700 1.400
## 5 autumn small medium 8.06 9.0 55.350 10.416 233.700 58.222 97.580 10.500
## 6 winter small high 8.25 13.1 65.750 9.248 430.000 18.250 56.667 28.400
## 7 summer small high 8.15 10.3 73.250 1.535 110.000 61.250 111.750 3.200
## 8 autumn small high 8.05 10.6 59.067 4.990 205.667 44.667 77.434 6.900
## 9 winter small medium 8.70 3.4 21.950 0.886 102.750 36.300 71.000 5.544
## 10 winter small high 7.93 9.9 8.000 1.390 5.800 27.250 46.600 0.800
## a1 a2 a3 a4 a5 a6 a7
## 1 0.0 0.0 0.0 0.0 34.2 8.3 0.0
## 2 1.4 7.6 4.8 1.9 6.7 0.0 2.1
## 3 3.3 53.6 1.9 0.0 0.0 0.0 9.7
## 4 3.1 41.0 18.9 0.0 1.4 0.0 1.4
## 5 9.2 2.9 7.5 0.0 7.5 4.1 1.0
## 6 15.1 14.6 1.4 0.0 22.5 12.6 2.9
## 7 2.4 1.2 3.2 3.9 5.8 6.8 0.0
## 8 18.2 1.6 0.0 0.0 5.5 8.7 0.0
## 9 25.4 5.4 2.5 0.0 0.0 0.0 0.0
## 10 17.0 0.0 0.0 2.9 0.0 0.0 1.7
## [1] 200 18
str(algae)
## 'data.frame': 200 obs. of 18 variables:
## $ season: Factor w/ 4 levels "autumn","spring",..: 4 2 1 2 1 4 3 1 4 4 ...
## $ size : Factor w/ 3 levels "large","medium",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ speed : Factor w/ 3 levels "high","low","medium": 3 3 3 3 3 1 1 1 3 1 ...
## $ mxPH : num 8 8.35 8.1 8.07 8.06 8.25 8.15 8.05 8.7 7.93 ...
## $ mnO2 : num 9.8 8 11.4 4.8 9 13.1 10.3 10.6 3.4 9.9 ...
## $ Cl : num 60.8 57.8 40 77.4 55.4 ...
## $ NO3 : num 6.24 1.29 5.33 2.3 10.42 ...
## $ NH4 : num 578 370 346.7 98.2 233.7 ...
## $ oPO4 : num 105 428.8 125.7 61.2 58.2 ...
## $ PO4 : num 170 558.8 187.1 138.7 97.6 ...
## $ Chla : num 50 1.3 15.6 1.4 10.5 ...
## $ a1 : num 0 1.4 3.3 3.1 9.2 15.1 2.4 18.2 25.4 17 ...
## $ a2 : num 0 7.6 53.6 41 2.9 14.6 1.2 1.6 5.4 0 ...
## $ a3 : num 0 4.8 1.9 18.9 7.5 1.4 3.2 0 2.5 0 ...
## $ a4 : num 0 1.9 0 0 0 0 3.9 0 0 2.9 ...
## $ a5 : num 34.2 6.7 0 1.4 7.5 22.5 5.8 5.5 0 0 ...
## $ a6 : num 8.3 0 0 0 4.1 12.6 6.8 8.7 0 0 ...
## $ a7 : num 0 2.1 9.7 1.4 1 2.9 0 0 0 1.7 ...
Muestra el total de valores faltantes (NA’s)
colSums(sapply(algae, is.na))
## season size speed mxPH mnO2 Cl NO3 NH4 oPO4 PO4 Chla
## 0 0 0 1 2 10 2 2 2 2 12
## a1 a2 a3 a4 a5 a6 a7
## 0 0 0 0 0 0 0
Gráfico Nº1
require (VIM)
aggr(algae, prop=F, numbers=T, border=NA, combined=T)
Gráfico Nº2
require (VIM)
aggr(algae, prop=F, numbers=T, border=NA, combined=F)
Gráfico Nº3
require (VIM)
aggr(algae, sortVars = T, prop = T, sortCombs = T, cex.lab = 1.5,
cex.axis = .6, cex.numbers = 5, combined = F, gap=-.2)
##
## Variables sorted by number of missings:
## Variable Count
## Chla 0.060
## Cl 0.050
## mnO2 0.010
## NO3 0.010
## NH4 0.010
## oPO4 0.010
## PO4 0.010
## mxPH 0.005
## season 0.000
## size 0.000
## speed 0.000
## a1 0.000
## a2 0.000
## a3 0.000
## a4 0.000
## a5 0.000
## a6 0.000
## a7 0.000
Elimina filas con valores faltantes (NA’S)
#Por ejemplo para la variable: Chla
algae <- algae[!is.na(algae$Chla), ]
Muestra el total de valores faltantes (NA’s)
colSums(sapply(algae, is.na))
## season size speed mxPH mnO2 Cl NO3 NH4 oPO4 PO4 Chla
## 0 0 0 1 1 1 0 0 0 1 0
## a1 a2 a3 a4 a5 a6 a7
## 0 0 0 0 0 0 0
dim(algae)
## [1] 188 18
Elimina todos los valores faltantes (NA’S)
datosLimpios <- na.omit(algae)
Muestra el total de valores faltantes (NA’s)
colSums(sapply(datosLimpios, is.na))
## season size speed mxPH mnO2 Cl NO3 NH4 oPO4 PO4 Chla
## 0 0 0 0 0 0 0 0 0 0 0
## a1 a2 a3 a4 a5 a6 a7
## 0 0 0 0 0 0 0
dim(datosLimpios)
## [1] 184 18