W1 self test

1. customise your Rmd document by adding your name as the author, a table of contents and choosing a theme that you like.
2. load the packages you will need
3. read the birthweight data
4. calculate the mean birthweight separately for twins and singletons
5. identify the earliest (i.e. the minimum value) gestational age for each ethnicity group
6. write some notes about how group_by and summarise work with the pipe below, including a link to documentation or a blog post that you think is useful
7. download a picture of a baby from the internet and insert it into your document below
8. write the summary of mean birthweight by twins/singletons that you made in step 3 above to a new csv file
9. Knit your document and publish the output to RPubs

Welcome to the PSYC3361 coding W1 self test. The test assesses your ability to use the coding skills covered in the Week 1 online coding modules.

In particular, it assesses your ability to…

choose packages/functions
read in data
group_by and summarise
make notes using RMarkdown
insert pictures in an Rmd document
write data to csv

It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.

Your notes should also document the troubleshooting process you went through to arrive at the code that worked.

For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.

Good luck!!

Jenny

1. customise your Rmd document by adding your name as the author, a table of contents and choosing a theme that you like.

I edited the YAML header to add my name as the author, set ‘toc: true’ to generate a table of contents, and chose cerulean as the theme.

2. load the packages you will need

library(tidyverse)

I used library() to load the tidyverse package, which includes dplyr for data manipulation and readr for reading csv files. message=FALSE and warning=FALSE are set to suppress the loading messages in the output.

3. read the birthweight data

birthweight <- read_csv("data/birthweight_data.csv")

I used read_csv() from the readr package to read the birthweight data from the data folder into an object called birthweight. message=FALSE suppresses the column type messages.

4. calculate the mean birthweight separately for twins and singletons

birthweight %>%
  group_by(plurality) %>%
  summarise(mean_birthweight = mean(birthweight, na.rm = TRUE))

## # A tibble: 2 × 2
##   plurality mean_birthweight
##   <chr>                <dbl>
## 1 singleton            3248.
## 2 twin                 2311.

I used group_by() to split the data by the plurality column (which indicates twins vs singletons), then summarise() to calculate the mean birthweight for each group. na.rm = TRUE excludes any missing values from the calculation.

5. identify the earliest (i.e. the minimum value) gestational age for each ethnicity group

birthweight %>%
  group_by(child_ethn) %>%
  summarise(min_gest_age = min(gestation_age_w, na.rm = TRUE))

## # A tibble: 10 × 2
##    child_ethn                        min_gest_age
##    <chr>                             <chr>       
##  1 Aboriginal/Torres Strait Islander 33          
##  2 African/African-American          26          
##  3 Caucasian                         26          
##  4 East Asian                        33          
##  5 Hispanic/Latino                   37          
##  6 Middle-Eastern                    28          
##  7 Missing                           36          
##  8 Polynesian/Melanesian             28          
##  9 South Asian                       28          
## 10 South-East Asian                  29

6. write some notes about how group_by and summarise work with the pipe below, including a link to documentation or a blog post that you think is useful

group_by() tells R which variable to use to split the data into groups. summarise() then calculates a summary statistic (like mean or min) for each group separately. the pipe %>% connects the steps together, passing the output of one function into the next.

Useful resource: https://dplyr.tidyverse.org/reference/group_by.html

7. download a picture of a baby from the internet and insert it into your document below

baby
I used markdown syntax for embedding an image. The ![baby] acts as the alt text (a description of the image for accessibility), while (images/babycay.jpg) is the file path pointing to the specific images in the ‘images’ file.

8. write the summary of mean birthweight by twins/singletons that you made in step 3 above to a new csv file

birthweight %>%
  group_by(plurality) %>%
  summarise(mean_birthweight = mean(birthweight, na.rm = TRUE)) %>%
  write_csv("data/mean_birthweight_summary.csv")

I used write_csv() to save the mean birthweight summary to a new csv file in the data folder. This reuses the same group_by() and summarise() pipeline from challenge 4.