Packages and Data

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Question 1

question_1 = 1:10
map_dbl(question_1, ~ .x +5)

##  [1]  6  7  8  9 10 11 12 13 14 15

Above is the list of numbers (1-10) after the value of 5 has been added to each.

Question 2

purrr (map function)

mat_x = as_tibble(matrix(1:120,
                         nrow = 20,
                         ncol = 6))

map_dbl(mat_x, sum)

##   V1   V2   V3   V4   V5   V6 
##  210  610 1010 1410 1810 2210

The sum of all the columns have been calculated above using the purrr (otherwise known as the “map” function). Since we are working with numeric values, we use the dbl variation.

Column-Wise Operations

mat_x %>% 
  summarize(across(c(V1:V6), sum))

## # A tibble: 1 × 6
##      V1    V2    V3    V4    V5    V6
##   <int> <int> <int> <int> <int> <int>
## 1   210   610  1010  1410  1810  2210

Similar to the code above, the sum of all 6 columns have been calculated; however, this time using the column-wise operation approach.

Loops

column_sum = numeric(6)
for (i in 1:6) {
  column_sum[i] = sum(mat_x[[i]],
                      na.rm = T)
}

print(column_sum)

## [1]  210  610 1010 1410 1810 2210

Finally, the sum of all 6 columns have been calculated using the “loop” approach.

Question 3

Creating the tibble

numbers = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
goals_scored = c(1, 3, 2, 6, 3, 4, 5, 5, 1, 2)
players = c("Son",
             "Messi",
             "Ronaldo",
             "Kane",
             "Maddison",
             "Nunez",
             "Bellingham",
             "Haaland",
             "Neymar",
             "Mbappe")

tibble_data = tibble(numbers, goals_scored, players)

In order to create the tibble, three objects were separately created and then combined.

Identifying the type of each variable

map(tibble_data, class)

## $numbers
## [1] "numeric"
## 
## $goals_scored
## [1] "numeric"
## 
## $players
## [1] "character"

By using the map function, each variable type was able to be determined.

Calculating the mean of numeric variables and the number of character observations

tibble_data %>% 
  summarize(across(where(is.numeric), mean),
            across(where(is.character), n_distinct))

## # A tibble: 1 × 3
##   numbers goals_scored players
##     <dbl>        <dbl>   <int>
## 1     5.5          3.2      10

The averages(means) of each numeric value and the number of observations for each character variable were obtained by using the column-wise operation approach.

Question 4

map(c(-10, 0, 10, 100), ~rnorm(n = 10, mean = .))

## [[1]]
##  [1]  -9.175788  -9.733734  -8.694723 -10.810948  -9.029988 -12.894930
##  [7]  -9.456814  -9.180146 -10.489775  -9.931699
## 
## [[2]]
##  [1] -0.6134124  0.1782929 -1.3561550  3.0580791 -0.5693579 -0.6805487
##  [7] -0.5661826 -1.7844832 -1.2922168  0.1834073
## 
## [[3]]
##  [1]  9.120641 11.476309  9.739450  8.083528 11.553327 10.652730 10.965861
##  [8]  9.724512 10.860142 10.847866
## 
## [[4]]
##  [1]  99.47861 101.00573 101.79119  99.61217  99.59213  98.56763 101.41890
##  [8]  99.14275 100.41110 100.57954

4 normally distributed variables have been created by using the map function with means of: -10, 0, 10, and 100.

Question 5

Approach 1 (map function)

library(stevedata)

data = pwt_sample

data %>% 
  map_dbl(~ sum(is.na(.)))

## country isocode    year     pop      hc  rgdpna   rgdpo   rgdpe   labsh     avh 
##       0       0       0       2       2       2       2       2       2      17 
##     emp    rnna 
##       2       2

By using the map function, the following variables have missing values: pop (2), hc (2), rgdpna (2), rgdpe (2), labsh (2), avh (17), emp (2), rnna (2).

Approach 2 (Column-Wise Operation)

data = pwt_sample

data %>% 
  summarize(across(everything(), ~sum(is.na(.x))))

## # A tibble: 1 × 12
##   country isocode  year   pop    hc rgdpna rgdpo rgdpe labsh   avh   emp  rnna
##     <int>   <int> <int> <int> <int>  <int> <int> <int> <int> <int> <int> <int>
## 1       0       0     0     2     2      2     2     2     2    17     2     2

In this scenario, the same result is shown; however, the column-wise operation approach was used this time.

Question 6

Averages across all columns

data = pwt_sample

data %>% 
  map_dbl(~ round(mean(., na.rm = T), 2))

##    country    isocode       year        pop         hc     rgdpna      rgdpo 
##         NA         NA    1984.50      35.53       2.81 1139533.26 1070786.67 
##      rgdpe      labsh        avh        emp       rnna 
## 1066653.52       0.61    1857.05      16.14 5439110.67

By using the map function, all the average values across the variables were obtained (and rounded to the second decimal).

It is worth noting that with the non-numeric values, they were reported as NA. For that reason, the number of unique values are calculated below by using the column-wise operation approach.

Non-Numeric Distinct Values

data = pwt_sample

data %>% 
  summarize(across(where(is.character), ~ n_distinct(.),
                   .names = "n_distinct_{.col}"))

## # A tibble: 1 × 2
##   n_distinct_country n_distinct_isocode
##                <int>              <int>
## 1                 22                 22

From using the column-wise operation approach, 22 unique country values have been obtained.

Tidied Output Table

data = pwt_sample 

data %>%
  select(-c(country, isocode, year)) %>% 
  map_dfr(~ round(mean(., na.rm = T), 2)) %>% 
  pivot_longer(cols = everything(),
               names_to = "variables",
               values_to = "average")

## # A tibble: 9 × 2
##   variables    average
##   <chr>          <dbl>
## 1 pop            35.5 
## 2 hc              2.81
## 3 rgdpna    1139533.  
## 4 rgdpo     1070787.  
## 5 rgdpe     1066654.  
## 6 labsh           0.61
## 7 avh          1857.  
## 8 emp            16.1 
## 9 rnna      5439111.

From the combination of several, different functions using the pipes, an output table is obtained where each row resembles the 9 different variables that are measured for each country, along with the averages across the countries and years.

Assignment 11

ChanKim

2023-11-08