Questions and Answers Part 1: select()
charm_daily <- select(charm_data, DATE, PRCP)
prism_daily <- select(prism_data, DATE, PRCP)
What is the input data frame in the first line of code?
The input data frame in the first line is charm_data.
What is the name of the output data frame created in the first
line?
The output data frame created is charm_daily.
Which columns are kept in charm_daily?
The columns DATE and PRCP are kept.
What happens to all other columns from charm_data?
All other columns are removed and are not included in
charm_daily.
Does the number of rows change? Explain why or why not.
No, the number of rows does not change because select() only works on
columns.
How is the second line of code similar to the first?
Both lines use select() to keep only the DATE and PRCP columns.
How is the input data frame different in the second line?
The input data frame in the second line is prism_data instead of
charm_data.
charm_daily <- na.omit(charm_daily)
prism_daily <- na.omit(prism_daily)
What is the input data frame in the first line?
The input data frame is charm_daily.
Is a new data frame created, or is an existing one
overwritten?
The existing data frame charm_daily is overwritten.
What type of data is removed by na.omit()?
Rows containing missing values (NA) are removed.
Does na.omit() remove rows, columns, or both?
na.omit() removes rows only.
How might the number of rows in charm_daily change after this
command?
The number of rows may decrease if any rows contain missing
values.
How is na.omit() fundamentally different from select()?
na.omit() removes rows based on missing values, while select() removes
columns.
charm_annual_mean <- mean(annual_charm$Annual_Precip, na.rm = TRUE)
What is the input data structure used in this command?
The input data structure is the column Annual_Precip from the data frame
annual_charm.
Is the input a data frame or a single column?
The input is a single numeric column, not a data frame.
Why is na.rm = TRUE included?
It tells R to ignore missing values when calculating the mean.
What is the output object created by this command?
The output object is charm_annual_mean.
Is charm_annual_mean a data frame? Explain.
No, it is a single numeric value representing the mean.
Does this command modify the data frame annual_charm? Why or why
not?
No, it only calculates a value and does not change the data
frame.
bec_annual <- mutate(bec_annual,standard_departure = (Annual_Precip - bec_mean) / bec_sd)
What is the input data frame?
The input data frame is bec_annual.
Is the output data frame new, or does it overwrite an existing
object?
It overwrites the existing object bec_annual.
What is the name of the new column created?
The new column is named standard_departure.
Which existing column is used in the calculation?
The column Annual_Precip is used in the calculation.
Are bec_mean and bec_sd data frames or numeric values?
They are numeric values.
Does the number of rows change after this command? Explain.
No, mutate() does not change the number of rows.
charm_post_1980 <- filter(annual_charm, Year > 1980)
What is the input data frame?
The input data frame is annual_charm.
What condition is being applied to the rows?
Only rows where Year is greater than 1980 are kept.
What happens to rows that do not meet this condition?
They are removed from the output data frame.
Are any columns removed by this command?
No, all columns are retained.
What is the name of the output data frame?
The output data frame is charm_post_1980.
How is filter() different from select() in terms of what it
changes?
filter() changes rows, while select() changes columns.
Which commands change the number of rows?
na.omit() and filter().
Which commands change the number of columns?
select() and mutate().
Which command returns a single numeric value instead of a data
frame?
mean().
Why is it important to know whether a command overwrites an
object or creates a new one?
Because overwriting can permanently remove data, while creating a new
object preserves the original data frame.
If a student accidentally runs na.omit() before select(), how
might that affect their results?
Rows could be removed due to missing values in columns that were not
needed for the analysis.