Intro to the code.

Missing values in the dataset need to be handled before analysing datasets.
Commonly used approaches are
1. Replacing the missing value with a rationale value which can be mean/ median/ mode/ or a constant
2. Replacing missing value with previous value in the dataset
3. Deleting the missing values (not a preffered option)

We will use inbuilt dataset “airquality” in this code.

library(tidyverse)
## -- Attaching packages ------------------
## v ggplot2 3.1.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts -- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
dat =  airquality

Using summary function to see if there are missing values

summary(dat$Solar.R)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     7.0   115.8   205.0   185.9   258.8   334.0       7

To see which rows have missing values

which(is.na(dat$Solar.R))
## [1]  5  6 11 27 96 97 98

To replace missing values with median

dat$Solar.R = impute(dat$Solar.R,#the variable having missing values 
                      median #value to be imputed: mean/median/mode/etc
                      )
summary(dat$Solar.R)
## 
##  7 values imputed to 205
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     7.0   120.0   205.0   186.8   256.0   334.0

To replace missing value with mean

dat = airquality
dat$Solar.R = impute(dat$Solar.R, 
                     mean)
summary(dat$Solar.R)
## 
##  7 values imputed to 185.9315
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     7.0   120.0   194.0   185.9   256.0   334.0

To replace missing value with the previous value in the dataset

dat = airquality
dat = dat %>% fill(Solar.R)
summary(dat$Solar.R)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       7     112     203     186     259     334

To delete rows with missing values

dat = airquality
dim(dat)
## [1] 153   6
dat = dat %>% 
  filter(Solar.R != "NA")
dim(dat)
## [1] 146   6

Thank You