Dplyr Basics

Worksheet: Understanding dplyr Commands and Data Frames

Instructions
For each line of code below, answer the questions that follow. Be specific about what the input data frame is, what the output object is, and what changes as a result of the command.

Part 1: select( )

charm_daily <- select(charm_data, DATE, PRCP) prism_daily <- select(prism_data, DATE, PRCP)

Questions:
1. What is the input data frame in the first line of code?
2. What is the name of the output data frame created in the first line?
3. Which columns are kept in charm_daily?
4. What happens to all other columns from charm_data?
5. Does the number of rows change? Explain why or why not.
6. How is the second line of code similar to the first?
7. How is the input data frame different in the second line?

Part 2: na.omit( )

charm_daily <- na.omit(charm_daily) prism_daily <- na.omit(prism_daily)

Questions:
1. What is the input data frame in the first line?
2. Is a new data frame created, or is an existing one overwritten?
3. What type of data is removed by na.omit()?
4. Does na.omit() remove rows, columns, or both?
5. How might the number of rows in charm_daily change after this command?
6. How is na.omit() fundamentally different from select( )?

Part 3: mean()

charm_annual_mean <- mean(annual_charm$Annual_Precip, na.rm = TRUE)

Questions: 1. What is the input data structure used in this command?
2. Is the input a data frame or a single column?
3. Why is na.rm = TRUE included?
4. What is the output object created by this command?
5. Is charm_annual_mean a data frame? Explain.
6. Does this command modify the data frame annual_charm? Why or why not?

Part 4: mutate()

bec_annual <- mutate(bec_annual, standard_departure = (Annual_Precip - bec_mean) / bec_sd)

Questions:
1. What is the input data frame?
2. Is the output data frame new, or does it overwrite an existing object?
3. What is the name of the new column created?
4. Which existing column is used in the calculation?
5. Are bec_mean and bec_sd data frames or numeric values?
6. Does the number of rows change after this command? Explain.

Part 5: filter()

charm_post_1980 <- filter(annual_charm, Year > 1980)

Questions:

What is the input data frame?
What condition is being applied to the rows?
What happens to rows that do not meet this condition?
Are any columns removed by this command?
What is the name of the output data frame?
How is filter() different from select() in terms of what it changes?

Part 6: Concept Check

Which commands in this worksheet:

Change the number of rows?
Change the number of columns?

Which command returns a single numeric value instead of a data frame?
Why is it important to know whether a command overwrites an object or creates a new one?
If a student accidentally runs na.omit() before select(), how might that affect their results?