library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
question_1 = 1:10
map_dbl(question_1, ~ .x +5)
## [1] 6 7 8 9 10 11 12 13 14 15
Above is the list of numbers (1-10) after the value of 5 has been added to each.
mat_x = as_tibble(matrix(1:120,
nrow = 20,
ncol = 6))
map_dbl(mat_x, sum)
## V1 V2 V3 V4 V5 V6
## 210 610 1010 1410 1810 2210
The sum of all the columns have been calculated above using the purrr (otherwise known as the “map” function). Since we are working with numeric values, we use the dbl variation.
mat_x %>%
summarize(across(c(V1:V6), sum))
## # A tibble: 1 × 6
## V1 V2 V3 V4 V5 V6
## <int> <int> <int> <int> <int> <int>
## 1 210 610 1010 1410 1810 2210
Similar to the code above, the sum of all 6 columns have been calculated; however, this time using the column-wise operation approach.
column_sum = numeric(6)
for (i in 1:6) {
column_sum[i] = sum(mat_x[[i]],
na.rm = T)
}
print(column_sum)
## [1] 210 610 1010 1410 1810 2210
Finally, the sum of all 6 columns have been calculated using the “loop” approach.
numbers = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
goals_scored = c(1, 3, 2, 6, 3, 4, 5, 5, 1, 2)
players = c("Son",
"Messi",
"Ronaldo",
"Kane",
"Maddison",
"Nunez",
"Bellingham",
"Haaland",
"Neymar",
"Mbappe")
tibble_data = tibble(numbers, goals_scored, players)
In order to create the tibble, three objects were separately created and then combined.
map(tibble_data, class)
## $numbers
## [1] "numeric"
##
## $goals_scored
## [1] "numeric"
##
## $players
## [1] "character"
By using the map function, each variable type was able to be determined.
tibble_data %>%
summarize(across(where(is.numeric), mean),
across(where(is.character), n_distinct))
## # A tibble: 1 × 3
## numbers goals_scored players
## <dbl> <dbl> <int>
## 1 5.5 3.2 10
The averages(means) of each numeric value and the number of observations for each character variable were obtained by using the column-wise operation approach.
map(c(-10, 0, 10, 100), ~rnorm(n = 10, mean = .))
## [[1]]
## [1] -9.175788 -9.733734 -8.694723 -10.810948 -9.029988 -12.894930
## [7] -9.456814 -9.180146 -10.489775 -9.931699
##
## [[2]]
## [1] -0.6134124 0.1782929 -1.3561550 3.0580791 -0.5693579 -0.6805487
## [7] -0.5661826 -1.7844832 -1.2922168 0.1834073
##
## [[3]]
## [1] 9.120641 11.476309 9.739450 8.083528 11.553327 10.652730 10.965861
## [8] 9.724512 10.860142 10.847866
##
## [[4]]
## [1] 99.47861 101.00573 101.79119 99.61217 99.59213 98.56763 101.41890
## [8] 99.14275 100.41110 100.57954
4 normally distributed variables have been created by using the map function with means of: -10, 0, 10, and 100.
library(stevedata)
data = pwt_sample
data %>%
map_dbl(~ sum(is.na(.)))
## country isocode year pop hc rgdpna rgdpo rgdpe labsh avh
## 0 0 0 2 2 2 2 2 2 17
## emp rnna
## 2 2
By using the map function, the following variables have missing values: pop (2), hc (2), rgdpna (2), rgdpe (2), labsh (2), avh (17), emp (2), rnna (2).
data = pwt_sample
data %>%
summarize(across(everything(), ~sum(is.na(.x))))
## # A tibble: 1 × 12
## country isocode year pop hc rgdpna rgdpo rgdpe labsh avh emp rnna
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 0 0 0 2 2 2 2 2 2 17 2 2
In this scenario, the same result is shown; however, the column-wise operation approach was used this time.
data = pwt_sample
data %>%
map_dbl(~ round(mean(., na.rm = T), 2))
## country isocode year pop hc rgdpna rgdpo
## NA NA 1984.50 35.53 2.81 1139533.26 1070786.67
## rgdpe labsh avh emp rnna
## 1066653.52 0.61 1857.05 16.14 5439110.67
By using the map function, all the average values across the variables were obtained (and rounded to the second decimal).
It is worth noting that with the non-numeric values, they were reported as NA. For that reason, the number of unique values are calculated below by using the column-wise operation approach.
data = pwt_sample
data %>%
summarize(across(where(is.character), ~ n_distinct(.),
.names = "n_distinct_{.col}"))
## # A tibble: 1 × 2
## n_distinct_country n_distinct_isocode
## <int> <int>
## 1 22 22
From using the column-wise operation approach, 22 unique country values have been obtained.
data = pwt_sample
data %>%
select(-c(country, isocode, year)) %>%
map_dfr(~ round(mean(., na.rm = T), 2)) %>%
pivot_longer(cols = everything(),
names_to = "variables",
values_to = "average")
## # A tibble: 9 × 2
## variables average
## <chr> <dbl>
## 1 pop 35.5
## 2 hc 2.81
## 3 rgdpna 1139533.
## 4 rgdpo 1070787.
## 5 rgdpe 1066654.
## 6 labsh 0.61
## 7 avh 1857.
## 8 emp 16.1
## 9 rnna 5439111.
From the combination of several, different functions using the pipes, an output table is obtained where each row resembles the 9 different variables that are measured for each country, along with the averages across the countries and years.