Housekeeping

Reminders from Week 3

HW 3 is Due 9/17/25

  • Practice Questions are posted.

    • I have just updated these questions.

    • I am currently recording new demo videos.

    • I will have all demo videos edited and posted by this Saturday or sooner.

Quiz 1 on Thursday 9/25/25

  • Weeks 1 - 4 (Lectures 1 - 8)

  • HW Assignments 1 - 3

Side Trip about piping

%>% vs. |>

  • What’s the difference?

  • For your purposes they are interchangeable, but |> is newer

  • %>% requires magrittr package but |> doesn’t

    • I load this package anyways as a precaution in case I need other pipe functions
  • |> may give you an error if are working on a machine with an old version of R or RStudio

  • |> is slightly more efficient because of what the computer is doing is slightly different

  • More information for those who are interested (not required)

Review: In-class Exercise from Week 3

#|label: import and prep bom2023

mojo_23_fall_wknd <- read_csv("data/Box_Office_Mojo_Week3_HW3.csv", show_col_types=F) |>   # import data
  mutate(Month = factor(month,                                                        # create factors
                         levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                                  "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")),
         Day = factor(day,     
                      levels=c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"),
                      labels= c("M", "T", "W", "Th", "F", "Sa", "Su"))) |>
  select(Month, Day, top10gross) |>                                                  # select variables
  group_by(Month, Day) |>                                                            # group by category
  summarize(max_top10g = max(top10gross, na.rm=T)) |>                                # summarize
  ungroup() |>                                                                       # ungroup
  filter(Day %in% c("F", "Sa", "Su") & Month %in% c("Sep", "Oct", "Nov", "Dec"))     # filter fall wknds

mojo_23_fall_wknd |> kable()
Month Day max_top10g
Sep F 27292508
Sep Sa 32699795
Sep Su 27575389
Oct F 53212742
Oct Sa 47653696
Oct Su 32681165
Nov F 43491965
Nov Sa 41719271
Nov Su 29342997
Dec F 39891139
Dec Sa 40050370
Dec Su 23078184

Completed code from Week 3 Exercise

  • An alternative to the code below is to round data as desired in mutate statement before reshaping data with pivot_wider.
#|label:  completed code wk 3 exercise

mojo_23_fall_wknd_wide <- mojo_23_fall_wknd |>
  mutate(max_top10g = (max_top10g/1000000) |> round(4)) |>                # convert to millions
  pivot_wider(id_cols=Month, names_from = Day, values_from = max_top10g)  # reshape data

mojo_23_fall_wknd_wide[,2:4] <- round(mojo_23_fall_wknd_wide[,2:4],1)     # round cols 2-4 to one decimal

# mojo_23_fall_wknd_wide[,2:4] <- round(mojo_23_fall_wknd_wide[,2:4])     # round cols 2-4 to whole numbers

mojo_23_fall_wknd_wide |> write_csv("data/Week_4_In_Class_First_Name_Last_Name.csv") # export as .csv 

mojo_23_fall_wknd_wide |> kable()       # create kable table (was not required in Week 3)
Month F Sa Su
Sep 27.3 32.7 27.6
Oct 53.2 47.7 32.7
Nov 43.5 41.7 29.3
Dec 39.9 40.1 23.1

💥 Week 4 In-class Exercises - Q1 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685

If all the columns in a dataset are numeric, you can round the whole dataset at once with the command round(<name of dataset>).

Why wouldn’t that work for the dataset in the previous exercise, mojo_23_fall_wknd_wide?

Hint: To answer this question, you are encouraged to

  • try running the command round(mojo_23_fall_wknd_wide)

  • examine the data using glimpse

Review - Week 1

  • R, R Studio, R Projects, and Quarto files

    • Creating an R Project OR an R Quarto Project

    • Creating data and img folders.

  • Selecting data rows and columns by location using square brackets

  • Examining data using summary and unique, and table

  • Data types:

    • numeric (<dbl>, <int>)

    • character (<chr>)

    • logical (lgl)

    • factor(<fct>, <ord>)

      • In Week 3 we discussed how to convert character variables to factors.

      • Numeric variables can also be converted to factors.

Review - Week 2

  • Review of Week 1 PLUS

  • dplyr package commands to select, modify, and summarize data:

    • select - used to select variables

    • filter - used to filter observation by observation values

      • Can be used with

        • numeric values

        • character values

        • factor levels

    • slice - used to filter or select observations by location

    • mutate - used to modify variables or create new variables

    • factor - used to create a factor variable from another variable

Review - Week 3

  • Review of Weeks 1 and 2 PLUS

  • Coercion commands to coerce a variable to the type needed

    • as.integer, as.numeric, as.character
    • HW 3 included as.integer
    • Week 3 included a preview demo of as.numeric
  • dplyer commands

    • group_by and filter
    • group_by and summarize
  • Commands to reshape data:

    • pivot_widerand pivot_longer
  • Display data table using kable()

Review - ggplot

  • ggplot geometries (geom) covered so far:

    • boxplot: geom_boxplot
    • barplot: geom_bar
    • scatterplot: geom_point
    • line plot: geom_line
    • area plot: geom_area
#|label: ggplot review
#|include: false
set.seed(999)                  # standardizes sample                         
my_diamonds <- diamonds |> 
  slice(sample(1:53940, 1000)) # example dataset

# print to screen without saving
my_diamonds |> ggplot() + 
  geom_point(aes(x=carat, y=price, 
                 color=clarity))

# save but don't print to screen
diamonds_plot <- my_diamonds |> ggplot() + 
  geom_point(aes(x=carat, y=price, 
                 color=clarity))

# save AND print to screen, 
# enclose all plot code in parentheses
(diamonds_plot <- my_diamonds |> ggplot() + 
  geom_point(aes(x=carat, y=price, 
                 color=clarity)))

# export most recent ggplot to img folder
ggsave("img/diamonds_plot_Week4.png", 
       width=6, height=4)

Format of Quiz 1

  • Students will have 70 minutes

  • Students with a time accommodation: we will schedule an alternative.

    • Tentative Time: Friday 9/26 at 1:00 PM
  • All students must work alone.

  • Quiz intended to be long and students may not finish.

    • All questions are equally weighted and independent.
    • There will be approximately 7-9 multi-part questions on Blackboard.
    • Each question will have multiple versions.
    • The questions will include instructions
    • Students will have to examine the data and execute the specified tasks.

Format of Quiz 1 Continued

  • You will be provided with a zipped R project with a Quarto file template and data and img folders to complete your work.


  • The Quiz 1 Practice Questions and provided zipped R project are very similar to what you will see on Quiz 1


  • For each question you will:

    • Use the space provided to execute the specified tasks.

    • Answer the question(s) on Blackboard.

Grading of Quiz 1

  • Grading will take a little time. In addition to your Blackboard answers you are required to submit

    • your Quarto (.qmd) file
    • specified data or plot files
    • NOTE: YOU DO NOT SUBMIT A ZIPPED PROJECT for the quiz.
  • I can not give you full credit if you do not show your work in the provided Quarto (.qmd) file.

  • Reminder: You can use different code than what is taught and will receive full credit if the result is correct.

  • For each question, the grade will be tallied as follows:

    • R code (.qmd file 10%) - quick check
    • Blackboard answers (90%)
  • Quiz 1 is worth 22.5% of your final grade in this course.

Practice Questions for Quiz 1

  • There are a set of practice questions posted on Blackboard.

  • Quiz 1 questions will be similar to these and will use same datasets or similar ones

  • Variable names are be modified so that AI cannot easily answer questions.

  • You can use AI and are encouraged to do so when studying.

  • During the time limited test, AI will be less helpful.

  • If other data are used:

    • I will post an announcement, so you can examine the data and documentation before the quiz.

Before Thursday

  • Download, unzip, and examine the R project.


  • Look over Blackboard Questions

    • Take notes on what is not clear for you.
    • I will answer questions on Thursday.


  • I am currently recording a new demo video playlist for the updated questions.

  • I will post an announcement when videos are posted.

Before Quiz 1:

  • Work through all the practice questions and write the code with comments to make sure you understand it.


  • Quiz 1 is open notes so you can use the code you create for the practice questions.


  • Make sure that your laptop has up-to-date versions of R and RStudio.

  • Make sure all packages listed in the setup for Quiz 1 are successfully installed and loaded in R on your laptop.

  • You can use AI during the test, but questions will be written so that AI can not answer the question without your understanding and interpretation.

Examining Data and R Help files

Throughout the practice questions, you are asked to Examine the data..

  • Examine data help files (see below)
  • Examine data using glimpse
  • Examine data in the Global Environment
  • Examine and sort the data


  • To view a data help file such as the documentation for mtcars, type ?mtcars in the R Console (lower left pane) and click Enter.

  • Documentation will appear in the lower right Help window.

Examining Data with glimpse

mtcars |> glimpse()  # examine mtcars R dataset      
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

NOTE that the q1_cars dataset in the Quiz 1 Practice Questions is similar but the variable names have been modified.

💥 Week 4 In-class Exercises 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685

  • Download and unzip the Quiz 1 Practice Questions R Project.

  • Run the setup and the second chunk which saves the datasets to the Global Environment.

  • Once a dataset is saved to the Global Environment you can click on it.

  • Click on dataset name in Global Environment.

    • This will open the dataset in a tab in upper right pane.

    • Click on tab to view data.

    • Click on variables to sort them.

💥 Week 4 In-class Exercises Q2-Q3 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685

Q2. The q1_cars dataset is saved in the Global Environment once you download and unzip the Quiz 1 Practice Questions R project and run the first two chunks of code.

  • This dataset has ____ observations (rows).


Q3. Examine the data set in the Global Environment to answer this question.

The car with the LOWEST fuel efficiency (mpg) is the _______.

💥 Week 4 In-class Exercises Q4 💥

Poll Everywhere - My User Name: penelopepoolereisenbies685


How many categories are in the cylinder variable in the q1_cars dataset?

Quiz 1 - Similar Practice Questions in Format

  • Chunk 1 is the setup chunk.

    • Running Chunk 1 will load (and install if needed) the required packages.
  • Chunk 2 saves the datasets to your Global Environment

    • Running Chunk 2 will save the practice question datasets to your Global Environment.
  • Quiz 1 will have the same structure:

    • A zipped file that you will be asked to unzip and save to your desktop

    • Two R chunks that you have to run before beginning your timed quiz.

    • A set of numbered empty code chunks where you will complete your work for each question.

    • You will submit your .qmd file, not the whole R project when the quiz is completed.

    • You will also submit requested plots and .csv files.

Overview Of Practice Questions

  • Question will specify which dataset to use.

  • Instructions are written using very little R terminology so that AI can not generate the correct code without your guidance and understanding.

  • Demo videos and AI will help you translate requested data management tasks into R commands if you get stuck.

  • Questions are designed to be short, but Quiz 1 questions will be a little shorter.

Practice Question 1

  • Examine the q1_cars dataset using the methods covered in class.

    1. How many rows are in this dataset?
  • Although there are many kinds of variables in this dataset, all of the variables except the first one, are coded as one type of variable for simplicity.

    1. Fill in the blank to specify the abbreviation for the variable type for the primary variable type in this data set.

      • Almost all of the variables in this dataset are type ____.
  • It also is very helpful to examine the data documentation in R.

    • In the R console, enter ?mtcars
    • Examine the dataset documentation in the help window in the lower left panel.
    • Helpful hints like this like this will not be included on a quiz.
  • Save a new version of this dataset to a new name and filter the new dataset to only include cars with BOTH a straight engine and an automatic transmission.

  • Examine the new filtered data set.

Practice Question 1 Cont’d and HINTS and NOTES

  1. How many rows are in this new filtered dataset you created?
  • Interactively sort the new dataset to answer the following question.
  1. Within the new filtered dataset, the highest mpg (miles per gallon) is ____.

HINT:

  • Once a dataset is opened, remember that you can click on a column to sort the dataset by a specific variable.

NOTES:

  • On a quiz, I will tell you the name of the R dataset, but I will not tell you how to find the documentation.

  • I will also change variable names slightly so that AI can not answer the questions correctly without you modifying the code.

Plan for Thursday

  • HW 3

    _ Make progress on HW 3 before Thursday

    • There will be time for questions about HW 3 if needed.
  • Read through the practice questions.

    • Take notes on questions you don’t understand

    • Let me know if you see typos.

    • I will go through specified questions as needed.