Universidad Católica de Oriente: A la Verdad por la Fe y la Ciencia
En este capitulo haremos los primeros pasos de nuestro proyecto en ciencia de datos y BigData. Consideraremos la apertura de la base de datos, análisis de datos atípicos e imputación de datos
####Imputación con la Media
#install.packages("mice", dependencies = TRUE)
library(mice)
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
###names(data)
columns <- c("Peso", "Estatura", "IMC", "Promedio.Sem", "Prom.Acum")
imputed_data <- mice(data[,names(data) %in% columns],m = 1,
maxit = 1, method = "mean",seed = 2018,print=F)
complete.data <- mice::complete(imputed_data)
par(mfrow=c(3,2))
##names(data)
plot(density(data$Peso,na.rm = T),col=2,main="Peso")
lines(density(complete.data$Peso),col=3)
plot(density(data$Estatura,na.rm = T),col=2,main="Estatura")
lines(density(complete.data$Estatura),col=3)
plot(density(data$IMC,na.rm = T),col=2,main="IMC")
lines(density(complete.data$IMC),col=3)
plot(density(data$Promedio.Sem,na.rm = T),col=2,main="Promedio Semestral")
lines(density(complete.data$Promedio.Sem),col=3)
plot(density(data$Prom.Acum,na.rm = T),col=2,main="Prom.Acumulado")
lines(density(complete.data$Prom.Acum),col=3)
dev.off()
## null device
## 1
#####Imputación por Random Forest
imputed_data <- mice(data[,names(data) %in% columns],m = 1,
maxit = 1, method = "rf",seed = 2018,print=F)
complete.data <- mice::complete(imputed_data)
par(mfrow=c(3,2))
##names(data)
plot(density(data$Peso,na.rm = T),col=2,main="Peso")
lines(density(complete.data$Peso),col=3)
plot(density(data$Estatura,na.rm = T),col=2,main="Estatura")
lines(density(complete.data$Estatura),col=3)
plot(density(data$IMC,na.rm = T),col=2,main="IMC")
lines(density(complete.data$IMC),col=3)
plot(density(data$Promedio.Sem,na.rm = T),col=2,main="Promedio Semestral")
lines(density(complete.data$Promedio.Sem),col=3)
plot(density(data$Prom.Acum,na.rm = T),col=2,main="Prom.Acumulado")
lines(density(complete.data$Prom.Acum),col=3)
dev.off()
## null device
## 1
help(mice)
## starting httpd help server ...
## done
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.