1 tidyr

The goal of tidyr is to help you create tidy data. Tidy data is data where:

  • Every column is variable.
  • Every row is an observation..
  • Every cell is a single value.

https://www.r-bloggers.com/handling-missing-values-in-r-using-tidyr/.


1.1 Loading packages


1.2 tidyr functions

Following are the 3 tidyr functions that are handy for processing Missing Values

  • drop_na() : 결측치 있는 행 삭제
  • fill() : 결측치를 최근값으로 대체(up, down)
  • replace_na() : 결측치 대체값을 정해서 대체

1.3 Dataset with Missing Value

To get a dataset with missing values, let’s take mtcars and make some missing values in it.

##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
## [1] 32 11
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.35   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.0  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.12   Mean   :6.129   Mean   :230.7   Mean   :147.9  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##  NA's   :1       NA's   :1                       NA's   :1      
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##                                                                  
##        am              gear           carb      
##  Min.   :0.0000   Min.   :3.00   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.00   1st Qu.:2.000  
##  Median :0.0000   Median :4.00   Median :2.000  
##  Mean   :0.4062   Mean   :3.71   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.00   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.00   Max.   :8.000  
##                   NA's   :1