1 An iteration function

Write an iteration functions that iterate over the numbers 1 to 10 and adds 5 to each of them. Store the results in a new vector called “output”. Use only map functions to answer this question.

numbers = 1:10
output = map_int(numbers, ~.x + 5) 

2 A tibble and the sum of each column

Create the tibble mat_x below and calculates the sum of each column (use all approaches).

map(mat_x, sum)
## $V1
## [1] 210
## 
## $V2
## [1] 210
## 
## $V3
## [1] 210
## 
## $V4
## [1] 210
## 
## $V5
## [1] 210
## 
## $V6
## [1] 210
map_dfr(mat_x, sum)
## # A tibble: 1 × 6
##      V1    V2    V3    V4    V5    V6
##   <int> <int> <int> <int> <int> <int>
## 1   210   210   210   210   210   210

3 A 10x3 tibble, two numeric and one character

Create a tibble of dimensions 10 x 3, with two numeric and one character variables. Calculate the mean of the column if numeric and the number of observations if character (use all approaches).

more_numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
powerball <- c(1,28,30,34,52,6,11,6,23,7)
friends <- c("Patrick",
             "Mark",
             "Kendra",
             "Sam",
             "Kira",
             "Tate",
             "Melanie",
             "Kevin",
             "Ryan",
             "Red")
data= tibble(more_numbers, powerball, friends)

map(data,class)
## $more_numbers
## [1] "numeric"
## 
## $powerball
## [1] "numeric"
## 
## $friends
## [1] "character"

Mean and Count Calculate the mean of the numeric columns and the number of observations for the character column.

data %>%
  summarize(across(where(is.numeric), mean),
            across(where(is.character), n_distinct))
## # A tibble: 1 × 3
##   more_numbers powerball friends
##          <dbl>     <dbl>   <int>
## 1          5.5      19.8      10
data %>%
  summarise(across(where(is.numeric), 
                   ~if_else(!is.character(.), mean(., na.rm = TRUE), NA)),
            character_count = sum(if_else(!is.na(friends), 1, 0))) 
## # A tibble: 1 × 3
##   more_numbers powerball character_count
##          <dbl>     <dbl>           <dbl>
## 1          5.5      19.8              10
            #ChatGPT

Assistance from Kyle and ChatGPT

4 An object containing four normally distributed variables

Create an object containing 4 normally distributed variables with means of -10, 0, 10, and 100, respectively. Each variable should contain 4 observations (i.e., the dimension of your object is 10 x 4). Use map functions only.

means = c(-10, 0, 10, 100)

randoms =
  map_dfr(means, ~c(value = rnorm(10, mean = .x)))

Assistance from Kyle

5 Stevedata

Use the data from the stevedata package called pwt_sample (this is the same data used in class on Thursday). Calculate how many missing values for each columns in the dataset (se both approaches). If there are any missing values, examine them and decide what to do next.

data2=pwt_sample

data2%>%
  summarize(sum(is.na(country)),
            sum(is.na(isocode)),
            sum(is.na(year)),
            sum(is.na(pop)),
            sum(is.na(hc)),
            sum(is.na(rgdpna)),
            sum(is.na(rgdpo)),
            sum(is.na(rgdpe)),
            sum(is.na(labsh)),
            sum(is.na(avh)),
            sum(is.na(emp)),
            sum(is.na(rnna)))
## # A tibble: 1 × 12
##   `sum(is.na(country))` `sum(is.na(isocode))` `sum(is.na(year))`
##                   <int>                 <int>              <int>
## 1                     0                     0                  0
## # ℹ 9 more variables: `sum(is.na(pop))` <int>, `sum(is.na(hc))` <int>,
## #   `sum(is.na(rgdpna))` <int>, `sum(is.na(rgdpo))` <int>,
## #   `sum(is.na(rgdpe))` <int>, `sum(is.na(labsh))` <int>,
## #   `sum(is.na(avh))` <int>, `sum(is.na(emp))` <int>, `sum(is.na(rnna))` <int>
map_dfr(data2, ~ sum(is.na(.)))
## # A tibble: 1 × 12
##   country isocode  year   pop    hc rgdpna rgdpo rgdpe labsh   avh   emp  rnna
##     <int>   <int> <int> <int> <int>  <int> <int> <int> <int> <int> <int> <int>
## 1       0       0     0     2     2      2     2     2     2    17     2     2
data2 %>%
  map_dbl(~ sum(is.na(.)))
## country isocode    year     pop      hc  rgdpna   rgdpo   rgdpe   labsh     avh 
##       0       0       0       2       2       2       2       2       2      17 
##     emp    rnna 
##       2       2

The variable “avh” or the average annual hours worked, has 17 NAs. “Pop” (population), “HC” (human capital), “rgdpna” (Real GDP at constant 2011 prices), “rgdpe”, “labsh”, and “emp” (number of persons engaged) all have 2 NAs.

data2 %>%
  filter(is.na(pop))
## # A tibble: 2 × 12
##   country isocode  year   pop    hc rgdpna rgdpo rgdpe labsh   avh   emp  rnna
##   <chr>   <chr>   <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Chile   CHL      1950    NA    NA     NA    NA    NA    NA    NA    NA    NA
## 2 Greece  GRC      1950    NA    NA     NA    NA    NA    NA    NA    NA    NA
data2%>%
  filter(is.na(avh))
## # A tibble: 17 × 12
##    country     isocode  year    pop    hc  rgdpna   rgdpo   rgdpe  labsh   avh
##    <chr>       <chr>   <int>  <dbl> <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>
##  1 Chile       CHL      1950 NA     NA        NA      NA      NA  NA        NA
##  2 Greece      GRC      1950 NA     NA        NA      NA      NA  NA        NA
##  3 Iceland     ISL      1950  0.143  1.97   1255.   1363.   1294.  0.635    NA
##  4 Iceland     ISL      1951  0.146  1.98   1237.   1344.   1239.  0.635    NA
##  5 Iceland     ISL      1952  0.148  1.99   1213.   1329.   1234.  0.635    NA
##  6 Iceland     ISL      1953  0.151  2.00   1393.   1534.   1443.  0.635    NA
##  7 Iceland     ISL      1954  0.155  2.01   1526.   1695.   1604.  0.635    NA
##  8 Iceland     ISL      1955  0.158  2.02   1691.   1918.   1852.  0.635    NA
##  9 Iceland     ISL      1956  0.162  2.02   1742.   1918.   1843.  0.635    NA
## 10 Iceland     ISL      1957  0.165  2.03   1745.   1926.   1859.  0.635    NA
## 11 Iceland     ISL      1958  0.169  2.04   1904.   2127.   2051.  0.635    NA
## 12 Iceland     ISL      1959  0.173  2.05   1958.   2183.   2159.  0.635    NA
## 13 Iceland     ISL      1960  0.176  2.05   2010.   2246.   2113.  0.635    NA
## 14 Iceland     ISL      1961  0.179  2.07   2008.   2349.   2256.  0.635    NA
## 15 Iceland     ISL      1962  0.182  2.08   2175.   2442.   2338.  0.635    NA
## 16 Iceland     ISL      1963  0.186  2.09   2398.   2735.   2622.  0.635    NA
## 17 Netherlands NLD      1969 12.8    2.64 295567. 234000. 238506.  0.729    NA
## # ℹ 2 more variables: emp <dbl>, rnna <dbl>
data2 %>%
  filter(is.na(avh)) %>%
  arrange(country)
## # A tibble: 17 × 12
##    country     isocode  year    pop    hc  rgdpna   rgdpo   rgdpe  labsh   avh
##    <chr>       <chr>   <int>  <dbl> <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>
##  1 Chile       CHL      1950 NA     NA        NA      NA      NA  NA        NA
##  2 Greece      GRC      1950 NA     NA        NA      NA      NA  NA        NA
##  3 Iceland     ISL      1950  0.143  1.97   1255.   1363.   1294.  0.635    NA
##  4 Iceland     ISL      1951  0.146  1.98   1237.   1344.   1239.  0.635    NA
##  5 Iceland     ISL      1952  0.148  1.99   1213.   1329.   1234.  0.635    NA
##  6 Iceland     ISL      1953  0.151  2.00   1393.   1534.   1443.  0.635    NA
##  7 Iceland     ISL      1954  0.155  2.01   1526.   1695.   1604.  0.635    NA
##  8 Iceland     ISL      1955  0.158  2.02   1691.   1918.   1852.  0.635    NA
##  9 Iceland     ISL      1956  0.162  2.02   1742.   1918.   1843.  0.635    NA
## 10 Iceland     ISL      1957  0.165  2.03   1745.   1926.   1859.  0.635    NA
## 11 Iceland     ISL      1958  0.169  2.04   1904.   2127.   2051.  0.635    NA
## 12 Iceland     ISL      1959  0.173  2.05   1958.   2183.   2159.  0.635    NA
## 13 Iceland     ISL      1960  0.176  2.05   2010.   2246.   2113.  0.635    NA
## 14 Iceland     ISL      1961  0.179  2.07   2008.   2349.   2256.  0.635    NA
## 15 Iceland     ISL      1962  0.182  2.08   2175.   2442.   2338.  0.635    NA
## 16 Iceland     ISL      1963  0.186  2.09   2398.   2735.   2622.  0.635    NA
## 17 Netherlands NLD      1969 12.8    2.64 295567. 234000. 238506.  0.729    NA
## # ℹ 2 more variables: emp <dbl>, rnna <dbl>

Chile and Greece have NA’s for most variables for the year 1950. Iceland is missing data in the “avh” variable for the years 1950 through 1963, and the Netherlands are missing it from the year 1969. For the purpose of this assignment where we are just exploring the different functions, we can simply leave out the NAs. However, if our analysis was specifically focused on changes in work practices over time, or comparing several national economies in the year 1950, we may need to seek an alternate solution to replace the NAs. Given the age of the dataset, we could probably find at least some of these numbers through other sources.

6 Average value for all columns

Use the data from the stevedata package called pwt_sample (this is the same data used in class on Thursday). Calculate the average value for all columns. Make sure that results only return 2 decimal number (useround() function).

data2%>%
  map_dbl(~round(mean(.,na.rm=T),2))
## Warning in mean.default(., na.rm = T): argument is not numeric or logical:
## returning NA

## Warning in mean.default(., na.rm = T): argument is not numeric or logical:
## returning NA
##    country    isocode       year        pop         hc     rgdpna      rgdpo 
##         NA         NA    1984.50      35.53       2.81 1139533.26 1070786.67 
##      rgdpe      labsh        avh        emp       rnna 
## 1066653.52       0.61    1857.05      16.14 5439110.67

Non-numeric functions return an NA according to the warning message.

data2 %>%
  mutate(year = as.character(year)) %>%
  summarize(across(where(is.numeric), mean, na.rm = T),
            across(where(is.character), n_distinct)) %>%
  pivot_longer(cols = everything (),
               names_to = "variables",
               values_to = "average")
## # A tibble: 12 × 2
##    variables     average
##    <chr>           <dbl>
##  1 pop            35.5  
##  2 hc              2.81 
##  3 rgdpna    1139533.   
##  4 rgdpo     1070787.   
##  5 rgdpe     1066654.   
##  6 labsh           0.609
##  7 avh          1857.   
##  8 emp            16.1  
##  9 rnna      5439111.   
## 10 country        22    
## 11 isocode        22    
## 12 year           70