tell us

na.omit() is part of the stats package that is within tidyverse. na.omit() allows you to remove rows with contains some missing NA values, or specify how you would like to handle the NA values by specifying the argument method. It is similar to is.na() which makes missing values turn into the NA symbol.

This is useful when there are NAs in the data set that we want to exclude from analyses. As part of the tidying code process, we can use na.omit() to tidy the code before running any subsequent data analyses.

show us

We are first going to load our packages tidyverse and palmerpenguins.

library(tidyverse) 
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(palmerpenguins) #for data set

get some data

Then, we are using sum() and is.na() to calculate how many NA are in our data set. It shows we have 336 in the penguins_raw data.

sum(is.na(penguins_raw))
## [1] 336

use the function

Using na.omit() we will effectively remove all rows that contain any NAs from our original data set.

newdata <- penguins_raw %>% 
  na.omit()

sum(is.na(newdata))
## [1] 0

We can see that all NAs have now been excluded from the new data set.

newdata2 <- penguins_raw %>% 
  na.omit(method="s")

sum(is.na(newdata2))
## [1] 0

more resources

We used the following posts to write this blog: https://rdrr.io/cran/timeSeries/man/stats-na.omit.html

https://www.r-bloggers.com/2021/04/handling-missing-values-in-r/

https://www.r-bloggers.com/2021/06/remove-rows-that-contain-all-na-or-certain-columns-in-r/