Assignment 2: Summarizing and Visualizing Data

Objectives

By the end of this assignment, you should:

understand how to derive information from data ( summarize, mutate, group_by)
understand how to make and modify a basic plot using ggplot

This assignment is due Thursday, February 6 at noon. Please turn your .html AND .Rmd files into Canvas. Your .Rmd file should knit without an error before turning in the assignment.

The first few excercises concern the data babynames again. The dataset is included in the “babynames” package.

[a] Trim babynames to just the rows that contain your favorite your name (if your name appears for both sexes, choose one). [b] Trim the result to just the columns that will appear in your graph (not strictly necessary, but useful practice). [c] Plot the results as a line graph with year on the x axis and prop on the y-axis.

Use summarise() to compute three statistics about the data: [a] The first (minimum) year in the dataset. [b] The last (maximum) year in the dataset. [c] The total number of children represented in the data.

Extract the rows where name == “Washington”. Then, use summarise() and a summary functions to find: [a] The total number of children named Washington. [b] The year with the most babies named Washington.

Use group_by(), summarise(), and arrange() to display the ten most popular names. Compute popularity as the total number of children of a single sex given a name.

Use group_by to calculate and then plot the number of children born each year over time.

The final exercises concern the object complexity norms for the Lewis and Frank (2016) dataset that we have talked about in class. In the experiment, we asked participants to rate a random sample of 10 (out of 60) objects in terms of their conceptual complexity. They gave their responses using a slider scale where 0 = simple, and 1 = complex (and all values in between were possible). We collected data from 2 samples of participants. The data are in the data frame located at data/lewis_object_complexity_norms_by_participant.csv. Each variable is described below.

sample - indicates whether the participants were from sample one or sample two.
participant_id - unique identifier for each participant.
object_id - unique idenfitier for each object (if you’re curious what the objects look like, you can find them here).
complexity_rating - a participant’s rating for a particular object.

lewis2016_data <- read_csv("data/lewis_object_complexity_norms_by_participant.csv")

For each object, calculate the mean, min, and max complexity rating. Use geom_bar(stat = "identity") to plot the mean complexity for each object. Then, use geom_point to plot the mininium complexity in red and the maximum complexity in blue. Sort the bars by the mean complexity rating from lowest to highest (see class slides for an example of this). Change the labels on the plot to make it maximally readable (xlab, ylab, ggtitle, and any other changes you think might help).
Create a new variable called range that is the difference between the maximum and mininimum complexity rating for each object.
For each object, for each sample, calculate the mean complexity rating and save it to a data frame called participant_sample_mean. Print the first 5 rows of your data frame. (note: you can print a prettier table by using the kable() function around the data frame you want to output).
Create a scatter plot with sample one on the x-axis and sample two on the y-axis. Add a line to fit the data.

This is a little tricky. The current data frame is tidy (each observation is its own row), but to make a plot that puts sample 1 and sample 2 on different axes we need sample one and sample two to be their own columns. In other words, we need a dataframe that has three columns: object_id, mean_complexity_sample_1, mean_complexity_sample_2. To transform the current data frame into the one we need, we can use the function pivot_wider(). The code below does this for you.

participant_sample_mean_wide  <- participant_sample_mean %>%
  pivot_wider(names_from = "sample", values_from = "mean_complexity")

Print the first 5 rows of participant_sample_mean_wide. Then plot the data with a scatter plot.

Choose your own geom! Look at the geoms on the cheat sheet and choose one to use with this data (other than geom_line, geom_point, geom_smooth, and geom_bar). Make a plot.

Assignment 2: Summarizing and Visualizing Data

Modern Research Methods