Avoid for loop

for loops can be repetitive and require quiet a few lines of codes everytime. Also need to define some empty vector to store values from for loop and merge them with data later.

With purrr package, all of these can be avoided. And a function can be applied to each piece of a data frame very easily.

Data

Data is collected from Niel Weiss’s book. Reference given in bibliography section.

We have two groups of women. One group is given hormone replacement therapy (HRT), and another group is placebo.

We aim to see whether this HRT makes any diffrence in women’s Low serum levels of high-density-lipoprotein (HDL) cholesterol andhigh serum levels of low-density-lipoprotein (LDL) cholesterol. Abnormal levels of these two measures can result in coronary heart disease (CHD) in women.

We have mean and sd fro these two groups. But the sample size is small. We will simulate 1000 samples from both groups, and for both variables. This is lke taking 1000 random samples from the population from each group.

### Avoid for loops with purrr
### Simulate With Purrr map & map2

require(tidyverse)

### Mean and SD from two groups ==> HRT and Placebo
### Within each group we have two measurements, HDL and LDL
d <- tibble(group = c("HRT", "Placebo"),
       MeanHDL  = c(8.1, 2.4),
       sdHDL = c(10.5, 4.3),
       MeanLDL  = c(-18.2, -2.2),
       sdLDL = c(26.5, 12.2))

d
## # A tibble: 2 x 5
##   group   MeanHDL sdHDL MeanLDL sdLDL
##   <chr>     <dbl> <dbl>   <dbl> <dbl>
## 1 HRT         8.1  10.5   -18.2  26.5
## 2 Placebo     2.4   4.3    -2.2  12.2
### For vertical lines in ggplot
vline.data <- data.frame(z = c(8.1, 2.4, -18.2, -2.2), 
                         Type = c("SimHDL", "SimHDL", "SimLDL", "SimLDL"))
vline.data
##       z   Type
## 1   8.1 SimHDL
## 2   2.4 SimHDL
## 3 -18.2 SimLDL
## 4  -2.2 SimLDL

purrr and Simulate

We have 2 parameters in rnorm function, so we will use map2 function from purrr, which takse in two parameters for a function.

dsim <- d %>% 
  mutate(SimHDL = map2(MeanHDL, sdHDL, ~ rnorm(1000, ..1, ..2)),
         SimLDL = map2(MeanLDL, sdLDL, ~ rnorm(1000, ..1, ..2))
  ) 
### purrr just enabled me to avoid writing TWO for loops
dsim
## # A tibble: 2 x 7
##   group   MeanHDL sdHDL MeanLDL sdLDL SimHDL        SimLDL       
##   <chr>     <dbl> <dbl>   <dbl> <dbl> <list>        <list>       
## 1 HRT         8.1  10.5   -18.2  26.5 <dbl [1,000]> <dbl [1,000]>
## 2 Placebo     2.4   4.3    -2.2  12.2 <dbl [1,000]> <dbl [1,000]>

unnest

We can unnest the SimHDL and SimLDL columns to expand their values.

head(dsim %>% 
       unnest(cols = SimHDL:SimLDL)
     )
## # A tibble: 6 x 7
##   group MeanHDL sdHDL MeanLDL sdLDL SimHDL  SimLDL
##   <chr>   <dbl> <dbl>   <dbl> <dbl>  <dbl>   <dbl>
## 1 HRT       8.1  10.5   -18.2  26.5  29.9  -45.2  
## 2 HRT       8.1  10.5   -18.2  26.5  26.5  -84.2  
## 3 HRT       8.1  10.5   -18.2  26.5   3.55 -57.1  
## 4 HRT       8.1  10.5   -18.2  26.5  16.3  -12.3  
## 5 HRT       8.1  10.5   -18.2  26.5   9.42  27.0  
## 6 HRT       8.1  10.5   -18.2  26.5   3.88  -0.825

Make Plot

plotly::ggplotly( 
  dsim %>% 
  pivot_longer(cols = SimHDL:SimLDL,names_to = "Type", values_to = "Value") %>% 
  unnest() %>% 
  ggplot(aes(Value, fill = group, color = group)) + 
  geom_density(alpha = 0.3) +
  geom_vline(data = vline.data, aes(xintercept = z), alpha = 0.5, linetype = "longdash")+
  facet_wrap(~Type, scales = "free") +
  theme_light()
)

Comment on Plot

From the plot, we may suggest there is difference in variation between HRT and placebo groups. But the difference in mean level of HDL and LDL among two groups may not be difference due to overlap.


Source

References

Weiss, NA. 2005. “Introductory Statistics.” Pearson Education, Inc.