Write an iteration functions that iterate over the numbers 1 to 10 and adds 5 to each of them. Store the results in a new vector called “output”. Use only map functions to answer this question.
numbers = 1:10
output = map_int(numbers, ~.x + 5)
Create the tibble mat_x below and calculates the sum of each column (use all approaches).
map(mat_x, sum)
## $V1
## [1] 210
##
## $V2
## [1] 210
##
## $V3
## [1] 210
##
## $V4
## [1] 210
##
## $V5
## [1] 210
##
## $V6
## [1] 210
map_dfr(mat_x, sum)
## # A tibble: 1 × 6
## V1 V2 V3 V4 V5 V6
## <int> <int> <int> <int> <int> <int>
## 1 210 210 210 210 210 210
Create a tibble of dimensions 10 x 3, with two numeric and one character variables. Calculate the mean of the column if numeric and the number of observations if character (use all approaches).
more_numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
powerball <- c(1,28,30,34,52,6,11,6,23,7)
friends <- c("Patrick",
"Mark",
"Kendra",
"Sam",
"Kira",
"Tate",
"Melanie",
"Kevin",
"Ryan",
"Red")
data= tibble(more_numbers, powerball, friends)
map(data,class)
## $more_numbers
## [1] "numeric"
##
## $powerball
## [1] "numeric"
##
## $friends
## [1] "character"
Mean and Count Calculate the mean of the numeric columns and the number of observations for the character column.
data %>%
summarize(across(where(is.numeric), mean),
across(where(is.character), n_distinct))
## # A tibble: 1 × 3
## more_numbers powerball friends
## <dbl> <dbl> <int>
## 1 5.5 19.8 10
data %>%
summarise(across(where(is.numeric),
~if_else(!is.character(.), mean(., na.rm = TRUE), NA)),
character_count = sum(if_else(!is.na(friends), 1, 0)))
## # A tibble: 1 × 3
## more_numbers powerball character_count
## <dbl> <dbl> <dbl>
## 1 5.5 19.8 10
#ChatGPT
Assistance from Kyle and ChatGPT
Create an object containing 4 normally distributed variables with means of -10, 0, 10, and 100, respectively. Each variable should contain 4 observations (i.e., the dimension of your object is 10 x 4). Use map functions only.
means = c(-10, 0, 10, 100)
randoms =
map_dfr(means, ~c(value = rnorm(10, mean = .x)))
Assistance from Kyle
Use the data from the stevedata package called pwt_sample (this is the same data used in class on Thursday). Calculate how many missing values for each columns in the dataset (se both approaches). If there are any missing values, examine them and decide what to do next.
data2=pwt_sample
data2%>%
summarize(sum(is.na(country)),
sum(is.na(isocode)),
sum(is.na(year)),
sum(is.na(pop)),
sum(is.na(hc)),
sum(is.na(rgdpna)),
sum(is.na(rgdpo)),
sum(is.na(rgdpe)),
sum(is.na(labsh)),
sum(is.na(avh)),
sum(is.na(emp)),
sum(is.na(rnna)))
## # A tibble: 1 × 12
## `sum(is.na(country))` `sum(is.na(isocode))` `sum(is.na(year))`
## <int> <int> <int>
## 1 0 0 0
## # ℹ 9 more variables: `sum(is.na(pop))` <int>, `sum(is.na(hc))` <int>,
## # `sum(is.na(rgdpna))` <int>, `sum(is.na(rgdpo))` <int>,
## # `sum(is.na(rgdpe))` <int>, `sum(is.na(labsh))` <int>,
## # `sum(is.na(avh))` <int>, `sum(is.na(emp))` <int>, `sum(is.na(rnna))` <int>
map_dfr(data2, ~ sum(is.na(.)))
## # A tibble: 1 × 12
## country isocode year pop hc rgdpna rgdpo rgdpe labsh avh emp rnna
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 0 0 0 2 2 2 2 2 2 17 2 2
data2 %>%
map_dbl(~ sum(is.na(.)))
## country isocode year pop hc rgdpna rgdpo rgdpe labsh avh
## 0 0 0 2 2 2 2 2 2 17
## emp rnna
## 2 2
The variable “avh” or the average annual hours worked, has 17 NAs. “Pop” (population), “HC” (human capital), “rgdpna” (Real GDP at constant 2011 prices), “rgdpe”, “labsh”, and “emp” (number of persons engaged) all have 2 NAs.
data2 %>%
filter(is.na(pop))
## # A tibble: 2 × 12
## country isocode year pop hc rgdpna rgdpo rgdpe labsh avh emp rnna
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Chile CHL 1950 NA NA NA NA NA NA NA NA NA
## 2 Greece GRC 1950 NA NA NA NA NA NA NA NA NA
data2%>%
filter(is.na(avh))
## # A tibble: 17 × 12
## country isocode year pop hc rgdpna rgdpo rgdpe labsh avh
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Chile CHL 1950 NA NA NA NA NA NA NA
## 2 Greece GRC 1950 NA NA NA NA NA NA NA
## 3 Iceland ISL 1950 0.143 1.97 1255. 1363. 1294. 0.635 NA
## 4 Iceland ISL 1951 0.146 1.98 1237. 1344. 1239. 0.635 NA
## 5 Iceland ISL 1952 0.148 1.99 1213. 1329. 1234. 0.635 NA
## 6 Iceland ISL 1953 0.151 2.00 1393. 1534. 1443. 0.635 NA
## 7 Iceland ISL 1954 0.155 2.01 1526. 1695. 1604. 0.635 NA
## 8 Iceland ISL 1955 0.158 2.02 1691. 1918. 1852. 0.635 NA
## 9 Iceland ISL 1956 0.162 2.02 1742. 1918. 1843. 0.635 NA
## 10 Iceland ISL 1957 0.165 2.03 1745. 1926. 1859. 0.635 NA
## 11 Iceland ISL 1958 0.169 2.04 1904. 2127. 2051. 0.635 NA
## 12 Iceland ISL 1959 0.173 2.05 1958. 2183. 2159. 0.635 NA
## 13 Iceland ISL 1960 0.176 2.05 2010. 2246. 2113. 0.635 NA
## 14 Iceland ISL 1961 0.179 2.07 2008. 2349. 2256. 0.635 NA
## 15 Iceland ISL 1962 0.182 2.08 2175. 2442. 2338. 0.635 NA
## 16 Iceland ISL 1963 0.186 2.09 2398. 2735. 2622. 0.635 NA
## 17 Netherlands NLD 1969 12.8 2.64 295567. 234000. 238506. 0.729 NA
## # ℹ 2 more variables: emp <dbl>, rnna <dbl>
data2 %>%
filter(is.na(avh)) %>%
arrange(country)
## # A tibble: 17 × 12
## country isocode year pop hc rgdpna rgdpo rgdpe labsh avh
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Chile CHL 1950 NA NA NA NA NA NA NA
## 2 Greece GRC 1950 NA NA NA NA NA NA NA
## 3 Iceland ISL 1950 0.143 1.97 1255. 1363. 1294. 0.635 NA
## 4 Iceland ISL 1951 0.146 1.98 1237. 1344. 1239. 0.635 NA
## 5 Iceland ISL 1952 0.148 1.99 1213. 1329. 1234. 0.635 NA
## 6 Iceland ISL 1953 0.151 2.00 1393. 1534. 1443. 0.635 NA
## 7 Iceland ISL 1954 0.155 2.01 1526. 1695. 1604. 0.635 NA
## 8 Iceland ISL 1955 0.158 2.02 1691. 1918. 1852. 0.635 NA
## 9 Iceland ISL 1956 0.162 2.02 1742. 1918. 1843. 0.635 NA
## 10 Iceland ISL 1957 0.165 2.03 1745. 1926. 1859. 0.635 NA
## 11 Iceland ISL 1958 0.169 2.04 1904. 2127. 2051. 0.635 NA
## 12 Iceland ISL 1959 0.173 2.05 1958. 2183. 2159. 0.635 NA
## 13 Iceland ISL 1960 0.176 2.05 2010. 2246. 2113. 0.635 NA
## 14 Iceland ISL 1961 0.179 2.07 2008. 2349. 2256. 0.635 NA
## 15 Iceland ISL 1962 0.182 2.08 2175. 2442. 2338. 0.635 NA
## 16 Iceland ISL 1963 0.186 2.09 2398. 2735. 2622. 0.635 NA
## 17 Netherlands NLD 1969 12.8 2.64 295567. 234000. 238506. 0.729 NA
## # ℹ 2 more variables: emp <dbl>, rnna <dbl>
Chile and Greece have NA’s for most variables for the year 1950. Iceland is missing data in the “avh” variable for the years 1950 through 1963, and the Netherlands are missing it from the year 1969. For the purpose of this assignment where we are just exploring the different functions, we can simply leave out the NAs. However, if our analysis was specifically focused on changes in work practices over time, or comparing several national economies in the year 1950, we may need to seek an alternate solution to replace the NAs. Given the age of the dataset, we could probably find at least some of these numbers through other sources.
Use the data from the stevedata package called pwt_sample (this is the same data used in class on Thursday). Calculate the average value for all columns. Make sure that results only return 2 decimal number (useround() function).
data2%>%
map_dbl(~round(mean(.,na.rm=T),2))
## Warning in mean.default(., na.rm = T): argument is not numeric or logical:
## returning NA
## Warning in mean.default(., na.rm = T): argument is not numeric or logical:
## returning NA
## country isocode year pop hc rgdpna rgdpo
## NA NA 1984.50 35.53 2.81 1139533.26 1070786.67
## rgdpe labsh avh emp rnna
## 1066653.52 0.61 1857.05 16.14 5439110.67
Non-numeric functions return an NA according to the warning message.
data2 %>%
mutate(year = as.character(year)) %>%
summarize(across(where(is.numeric), mean, na.rm = T),
across(where(is.character), n_distinct)) %>%
pivot_longer(cols = everything (),
names_to = "variables",
values_to = "average")
## # A tibble: 12 × 2
## variables average
## <chr> <dbl>
## 1 pop 35.5
## 2 hc 2.81
## 3 rgdpna 1139533.
## 4 rgdpo 1070787.
## 5 rgdpe 1066654.
## 6 labsh 0.609
## 7 avh 1857.
## 8 emp 16.1
## 9 rnna 5439111.
## 10 country 22
## 11 isocode 22
## 12 year 70