Welcome to the PSYC3361 coding W1 self test. The test assesses your ability to use the coding skills covered in the Week 1 online coding modules.

In particular, it assesses your ability to…

  • choose packages/functions
  • read in data
  • group_by and summarise
  • make notes using RMarkdown
  • insert pictures in an Rmd document
  • write data to csv

It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.

Your notes should also document the troubleshooting process you went through to arrive at the code that worked.

For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.

Good luck!!

Jenny

1. Customise your Rmd document by adding your name as the author, a table of contents and choosing a theme that you like.

A title, name, date, HTML output, theme (cerulean) and table of contents have been included in the YAML section above.

2. Load the packages you will need

Load the tidyverse package, which includes several essential packages for data analysis. By loading tidyverse, we are ensuring that we have access to a suite of tools for data manipulation (dplyr), data visualization (ggplot2), and other data science tasks.

For this exercise, I am loading the tidyverse package and the here package. Tidyverse contains functions to read in the data read_csv and to create gruoped summaries (group_by and summarise). The here package makes it easy to tell R where the data is when you are reading it in.

Step 1: Install the package

install.packages("here")
install.packages("tidyverse")

Step 2: Load the package

# Load the packages
library(tidyverse)
library(here)

3. Read the birthweight data

Read the birthweight data from a CSV file into an R data frame. Ensure that the CSV file is in my working directory.

Trouble Shotting Process: ERROR –> file was not found in the working directory. I found that it was in a folder. In future when file is stored in folder put the “foldername/thedocument”

The data is in .csv format so I am giong to use the read_csv() function. This call tells R to find the data “here” within the data folder and to make a new object called babies

babies <- read_csv(here("data", "birthweight_data.csv"))

4. Calculate the mean birthweight separately for twins and singletons

This script groups data based on the plurality variable, categorizing the entries into twins and singletons. It then calculates the mean birthweight for each category. The resulting means are displayed using the print(mean_birthweight) statement.

babies %>%
  group_by(plurality) %>%
  summarise(mean_bw = mean(birthweight)) %>%
  ungroup()
## # A tibble: 2 × 2
##   plurality mean_bw
##   <chr>       <dbl>
## 1 singleton   3248.
## 2 twin        2311.

5. Identify the earliest (i.e. the minimum value) gestational age for each ethicity group

The pipe groups the data by ethnicities; using the child_ethn variable) and calculates the minimum gestation age for each group. The resulting minimum values are displayed with the print(min_gest_age_ethn) statement.

babies %>%
  group_by(child_ethn) %>%
  summarise(min_ga = min(gestation_age_w)) %>%
  ungroup()
## # A tibble: 10 × 2
##    child_ethn                        min_ga
##    <chr>                             <chr> 
##  1 Aboriginal/Torres Strait Islander 33    
##  2 African/African-American          26    
##  3 Caucasian                         26    
##  4 East Asian                        33    
##  5 Hispanic/Latino                   37    
##  6 Middle-Eastern                    28    
##  7 Missing                           36    
##  8 Polynesian/Melanesian             28    
##  9 South Asian                       28    
## 10 South-East Asian                  29

6. Write some notes about how group_by and summarise work with the pipe below, including a link to documentation or a blog post that you think is useful

The pipe allows you to string together a number of code operations into a sequence of actions that you can do with your data. For example, it is useful to produce descriptive summaries separately for each group in your data set. By taking the dataframe, piping it to group_by, then piping it again to summarise, we can easily calculate means separtely for each group. The Tidyverse documentation has useful examples.

When using a pipe, the group_by function is used to group/categorise data with according to the selected grouping variable/s, and the summarise function is then used to return one row of specified summary statistics for each of the group variables.

This can be visualised using pipe in question 3:

“mean_birthweight_summary <- birthweight %>% group_by(plurality) %>% summarise( mean_birth = mean(birthweight) ) %>% ungroup()

print(mean_birthweight_summary)”

Where the dataset with grouped in singletons and twins via the specified grouping variable, plurality, and then the summarise function calculate the mean birthweight for each of these groups.

7. Download a picture of a baby from the internet and insert it into your document below

Putting images in your Rmd file is going to be useful when you want to insert screenshots into your verification report. You can create a folder within your project called “images” and put the image files in that folder. Then you can call the location of the image file with the notation images/baby.jpeg. Put the path within round brackets (i.e. (images/baby.jpeg)) and put an exclamation point and some square brackets on the front []

8. Write the summary of mean birthweight by twins/singletons that you made in step 3 above to a new csv file

Here I am adding another pipe operation onto the bottom of the mean birthweight calculation to write the data to a new csv file that I could open in a different program.

babies %>%
  group_by(plurality) %>%
  summarise(mean_bw = mean(birthweight)) %>%
  ungroup() %>%
  write_csv("bw_by_plurality.csv")

9. Knit your document and publish the output to RPubs