I. RStudio
The course notes for the statistics portion of this course will be contained in dynamic R Notebooks. While reading through the notes, you will be prompted to try executing R code snippets to see graphics and statistics. We will be using the software RStudio to work with R.
Download RStudio
Please follow the link to the RStudio Download page. Make sure you are selecting the free version and the correct operating system: Windows or MacOS. This is pretty easy as the webpage runs a system checker and suggests the correct package once you click the “download free version” button. Follows these steps:
- Install R first, using the STEP 1 link provided.
- Only after R has been installed, download RStudio.
- Run the installer - it will find all the R files it needs and connect the front end (RStudio) to the engine (R).
- Open RStudio.
RStudio: Working Directory
R saves everything to a working directory, and when you try to open or load files, it goes looking in the working directory. Let’s set it up now to avoid trouble later. In the main menu select Session and then \(\fbox{Set Working Directory}\). When RStudio was installed, it created a folder in MyDocuments called R. Use the browse option to surf to it and hit OK.
RStudio: Home Screen
The RStudio home screen is divided into four panels:
- Code Editor
- R Console
- Workspace and History
- Plots and Files
Most of our work will be in the Code Editor, the top-left panel. This is where we create reports and documents and run code blocks.
The R Console lets us see under the hood. These code lines show the input and output from the R engine. The remaining two areas have useful info for pros but are often left completely alone by beginners. You can resize all four windows to your own taste, but I suggest making the Code Editor very large since it will be our main work area.
RStudio: R Notebooks
To create a new notebook for your work, use the File menu and select \(\fbox{New File : R Notebook}\)
When you try to create a new R Notebook file, the Rstudio will ask if it can download and install the packages it needs. Click yes. R is a stripped down, open-source, fast and powerful statistical engine. This all comes at the cost of being a bit dense and difficult to work with, a problem we typically circumvent by loading packages and libraries to make our lives easier.
Once you can open the R Notebooks I have created for you and create new R Notebooks of your own, you have everything you need for the stats portion of the course working in RStudio.
RStudio: Mosaic
Designed by data scientists for introductory statistics students, the Mosaic library contains all the needed functions for basic stats. Better, the coding has been streamlined so the whole process is much more intuitive than in native R. Trust me, I’ve been using R since before there was an RStudio or Mosaic, and these are vast improvements.
We only have to install Mosaic once. Go to Tools menu at top where the first option is \(\fbox{Install Packages...}\). When you start typing “mosaic” (lower case letters), the auto-complete will pop up so you know you’re on the right track.
When you run this code on your machine the first time, RStudio will spit out a few dozen lines of red text in the console, and a download progress bar will keep popping up. This is normal, and you will never need to do it again (unless you install a different package).
Now that the Mosaic package is installed, we still need to load it. We have to do this in every new session we create in RStudio which is why I suggest quitting with the “save session” option on, so that all data and packages you had loaded will reload the next time you open RStudio.
library(mosaic)
II. Introduction to R
Let’s do some stats. We have everything setup properly, so let’s create some data and graphics. Why not flip a coin? Mosaic has a built coin flipper.
rflip()
Flipping 1 coin [ Prob(Heads) = 0.5 ] ...
T
Number of Heads: 0 [Proportion Heads: 0]
Interesting, but would it not be more interesting to flip 10 coins at once, and then do that same thing ten thousand times? Then we would have some data to plot, and you will have a glimpse into what RStudio can do.
rflip(10)
Flipping 10 coins [ Prob(Heads) = 0.5 ] ...
H T T T H T T H H H
Number of Heads: 5 [Proportion Heads: 0.5]
We can use the function do to repeat a process many times. The code below will flips 10 coins, record the results, then repeat the 10 coin flips 20 total times.
do(20) * rflip(10)
Tasks: Create a Variable
The next code block creates a variable flips to store all the data in. Once created, we can start plotting the data and making tables.
flips = do(10000) * rflip(10)
Notice there is no output. R has just computed the coin flips and stored the results in the variable flips. If we want to see what happened, we can print out the variable flips to see what’s in it.
flips
Tasks: Tables and Histograms
We want to visualize what’s happening with this data, so we need some kind of summary, either a table, or a graph, or both. We can count the number of heads that have appeared in each group of ten coin flips. The tally function counts all possible values from the batches of 10 flips, and summarizes how many of each outcome occurred in the ten thousand trials.
tally(~ heads, data = flips)
heads
0 1 2 3 4 5 6 7 8
6 111 424 1155 1998 2483 2059 1188 466
9 10
99 11
histogram(~ heads, data = flips)

The histogram looks a bit wonky because the default bin width is too large, so we have empty categories or bins. Let’s tell the histogram function we want bin widths of 1 centered at 5, so it will split the data into its integer values and force the center of the histogram to be at 5.
histogram(~ heads, data = flips, width = 1)

You should try different widths in the code above, to help you understand how its working. At least try \(\fbox{width = 2}\).
Great! Now you’ve done some statistics with RStudio.
