suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
1. What happens to missing values in a histogram? What happens to missing values in a bar chart? Why is there a difference?
Missing values are removed when the number of observations in each bin are calculated. See the warning message: Removed 9 rows containing non-finite values (stat_bin)
diamonds2 <- diamonds %>%
mutate(y = ifelse(y < 3 | y > 20, NA, y))
ggplot(diamonds2, aes(x = y)) +
geom_histogram()

In the geom_bar()
function, NA
is treated as another category. The x
aesthetic in geom_bar()
requires a discrete (categorical) variable, and missing values act like another category.
diamonds %>%
mutate(cut = if_else(runif(n()) < 0.1, NA_character_, as.character(cut))) %>%
ggplot() +
geom_bar(mapping = aes(x = cut))

In a histogram, the x
aesthetic variable needs to be numeric, and stat_bin()
groups the observations by ranges into bins. Since the numeric value of the NA
observations is unknown, they cannot be placed in a particular bin, and are dropped.
2. What does na.rm = TRUE
do in mean()
and sum()
?
This option removes NA
values from the vector prior to calculating the mean and sum.
mean(c(0, 1, 2, NA), na.rm = TRUE)
[1] 1
sum(c(0, 1, 2, NA), na.rm = TRUE)
[1] 3
LS0tDQp0aXRsZTogIkVEQSBNaXNzaW5nIFZhbHVlcyINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2Zsb2F0OiB0cnVlDQotLS0NCg0KYGBge3IgbG9hZGxpYnJhcnl9DQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgidGlkeXZlcnNlIikpDQpgYGANCg0KIyMjIDEuIFdoYXQgaGFwcGVucyB0byBtaXNzaW5nIHZhbHVlcyBpbiBhIGhpc3RvZ3JhbT8gV2hhdCBoYXBwZW5zIHRvIG1pc3NpbmcgdmFsdWVzIGluIGEgYmFyIGNoYXJ0PyBXaHkgaXMgdGhlcmUgYSBkaWZmZXJlbmNlPw0KDQpNaXNzaW5nIHZhbHVlcyBhcmUgcmVtb3ZlZCB3aGVuIHRoZSBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zIGluIGVhY2ggYmluIGFyZSBjYWxjdWxhdGVkLiBTZWUgdGhlIHdhcm5pbmcgbWVzc2FnZTogYFJlbW92ZWQgOSByb3dzIGNvbnRhaW5pbmcgbm9uLWZpbml0ZSB2YWx1ZXMgKHN0YXRfYmluKWANCg0KYGBge3J9DQpkaWFtb25kczIgPC0gZGlhbW9uZHMgJT4lDQogIG11dGF0ZSh5ID0gaWZlbHNlKHkgPCAzIHwgeSA+IDIwLCBOQSwgeSkpDQoNCmdncGxvdChkaWFtb25kczIsIGFlcyh4ID0geSkpICsNCiAgZ2VvbV9oaXN0b2dyYW0oKQ0KYGBgDQoNCkluIHRoZSBgZ2VvbV9iYXIoKWAgZnVuY3Rpb24sIGBOQWAgaXMgdHJlYXRlZCBhcyBhbm90aGVyIGNhdGVnb3J5LiBUaGUgYHhgIGFlc3RoZXRpYyBpbiBgZ2VvbV9iYXIoKWAgcmVxdWlyZXMgYSBkaXNjcmV0ZSAoY2F0ZWdvcmljYWwpIHZhcmlhYmxlLCBhbmQgbWlzc2luZyB2YWx1ZXMgYWN0IGxpa2UgYW5vdGhlciBjYXRlZ29yeS4NCg0KYGBge3J9DQpkaWFtb25kcyAlPiUNCiAgbXV0YXRlKGN1dCA9IGlmX2Vsc2UocnVuaWYobigpKSA8IDAuMSwgTkFfY2hhcmFjdGVyXywgYXMuY2hhcmFjdGVyKGN1dCkpKSAlPiUNCiAgZ2dwbG90KCkgKw0KICBnZW9tX2JhcihtYXBwaW5nID0gYWVzKHggPSBjdXQpKQ0KYGBgDQoNCkluIGEgaGlzdG9ncmFtLCB0aGUgYHhgIGFlc3RoZXRpYyB2YXJpYWJsZSBuZWVkcyB0byBiZSBudW1lcmljLCBhbmQgYHN0YXRfYmluKClgIGdyb3VwcyB0aGUgb2JzZXJ2YXRpb25zIGJ5IHJhbmdlcyBpbnRvIGJpbnMuIFNpbmNlIHRoZSBudW1lcmljIHZhbHVlIG9mIHRoZSBgTkFgIG9ic2VydmF0aW9ucyBpcyB1bmtub3duLCB0aGV5IGNhbm5vdCBiZSBwbGFjZWQgaW4gYSBwYXJ0aWN1bGFyIGJpbiwgYW5kIGFyZSBkcm9wcGVkLg0KDQojIyMgMi4gV2hhdCBkb2VzIGBuYS5ybSA9IFRSVUVgIGRvIGluIGBtZWFuKClgIGFuZCBgc3VtKClgPw0KDQpUaGlzIG9wdGlvbiByZW1vdmVzIGBOQWAgdmFsdWVzIGZyb20gdGhlIHZlY3RvciBwcmlvciB0byBjYWxjdWxhdGluZyB0aGUgbWVhbiBhbmQgc3VtLg0KDQpgYGB7cn0NCm1lYW4oYygwLCAxLCAyLCBOQSksIG5hLnJtID0gVFJVRSkNCnN1bShjKDAsIDEsIDIsIE5BKSwgbmEucm0gPSBUUlVFKQ0KYGBg