Before We Start

The term “R” is used to refer to both the programming language and software that interprets the scripts written using it.
RStudio is currently a very popular way to not only write your R scripts but also to interact with the R software. To function correctly, RStudio needs R and therefore both need to be installed on your computer.

To make it easier to interact with R, we will use RStudio. RStudio is the most popular IDE (Integrated Development Environment) for R. An IDE is a piece of software that provides tools to make programming easier.

Reasons to Use R

R does not involve a lot of pointing and clicking, and that is a good thing
R is great for reproducibility
R is interdisciplinary and extensible
R works on data of all shapes and sizes
R produces high -quality graphics
R has a large and welcoming community -You can always find help working through conundrums and issues with R by taking advantage of R’s extensive community. Check out websites such as Stack Overflow, R Studio Community.
R is FREE, it is open source AND cross platform -Anyone can inspect the source code. Because of this transparency, there is less chance for mistakes, and if you (or someone else) find some, you can report and fix bugs. -It is open source and supported by a large community of developers and users, so there is a large selection of third-party packages which are freely available

Finding Your Way Around R

When you open up R Studio, you will automatically see 3 panes. A simple way to see all 4 panes is to click on
View -> Panes -> Show all panes

Alternatively, another way to adjust the placement of these panes can be found through the menu:
Tools -> Global Options -> Pane Layout

Finally, you can also use the keyboard shortcut (which is the best practice) :
Alt + Ctrl + Shift + 0

(The above is a GREAT example of how there are often many ways to achieve the same results in R)

Panes

Top Left - Source: Scripts and docs (Markdown)
Bottom Left - Console: output of commands (copy of code + results) through Source , anything that doesn’t need to be saved in the script for reproducibility
Top Right - Environment/History: look here for all command history (beyond coding); location of all created objects
Bottom Right - Files and more: see the contents of your project/working directory; any visualizations you have created; help details; Packages loaded or begin install process

New Script (vs. New Project)

There are many reasons to work within a script or a “Project”. For simplicity of navigation, today (and possibly next lesson) we will open a new script.
There are obvious benefits to work within a Project:

All changes to the project (i.e. script) are always saved in real time (no manual save)
Projects are conducive to sharing with others; a project can be sent to another collaborator, and all they have to do is open up the project in their computer. All file paths are executable and no adjustments have to be made on the user side.
One icon opens everything

My rule of thumb:

Day to day, I work in R (not in projects); this can be confusing to others, but I often have to switch between scripts/reports or want to have a script/report easily accessible. Having multiple scripts open at the same time has implications for data objects saved, and environment clutter, but working within those parameters is learnable and manageable.
When I am working on building a new script, or alter an existing report, I will work in Projects, to focus explicitly on one project and keep environment and script streamlined.

Start a New Script

To start a new script, click on
File -> New File -> New R Script

Note on Key Bindings

Many tasks we perform with the mouse can be achieved with a combination of key strokes instead. These keyboard versions for performing tasks are referred to as key bindings. For example, we just showed you how to start a new script, you can but you can also use a key binding: **Ctrl+Shift+N on Windows and command+shift+N on the Mac.

We highly recommend that you memorize key bindings for the operations you use most. RStudio provides a useful cheat sheet with the most widely used commands. You can get it from RStudio directly
Fun Fact: you can ALSO access this in RStudio under the Tools Menu.

Tools -> Keyboard Shortcuts Help

R Script for Basic Navigation

Basic terminology

Coding involves writing down instructions for the computer to follow, and then we tell the computer to follow those instructions. We write, or code, instructions in R because it is a common language that both the computer and we can understand We call the instructions commands and we tell the computer to follow the instructions by executing (also called running) those commands.

Interact within R

Console - Using the console (when we have commands that do not have to be reproducible)/ EXAMPLE: type 4 x 5 in the console and run the command…..what happens? EXAMPLE: type 4 x 5 in the source and run the command ….. what happens?

In both cases, the command was executed and you see the results directly in the console.
- If we only type the command in the console, the command and the results will be forgotten when you close your session. If for some reason, calculating 4 x 5 is an executable that you want to remember to do the next time you open the project/conduct the project, you would type it into the script.

Source - Location of code to be saved/currently saved and code can be executed directly from the console. - Also known as script editor - Allows code and workflow to be reproducible

Execute/Run Commands Execute your commands (code) directly from the Script Editor by using the Ctrl + Enter shortcut (CMD + Return on Mac) Alternatively, you can highlight your code with your cursor, and click run at the top of the Script Editor

Examples in R

##you need to do some math real quick? Type your equation in the Console
#example: 125 exits with 13 NA's, 78 positive destinations at exits. How many positive exit destinations in percentage?

78/113 ##type into the console

Let’s Build a Script

Clear Environment

If you are working in a Project, this is irrelevant, but it is good practice when you may be working on multiple projects simultaneously.

The command ls() will list all objects in your environment. (Note: you see the result of this command in the Console on the bottom left). Alternatively, you can click on Environment and see your data objects listed. IF you have objects in your environment, it is best to remove them, so that you start fresh.
To do this, run the command: **rm(list=ls())

ls()
rm(list=ls()) ## here we are telling R the command *remove* (or rather, clear) all of the objects from thew orkspace to be able to start with a clean enviornment.

Notes about “NOTES”

In your script, sometimes it is useful to write notes that help us understand why or what we are doing with our code, especially when sharing the script with another collaborator. This is where the hashtage (##) comes into play. Any notes we want to have within our script must begin with a hashtag, only 1 is necessary. If you fail to include the hashtag, R will try to run the notes as if it is a command, and you will have error signs. Plus it throws off the rest of your scripts formatting. We can now add hashtags next to ls() to remind us what it was for. As you become more proficient in R, and know the purposes of each command, notes are not always necessary. For now after the code ls() we will add ##check to see what is in the environment. Then, after rm(list=ls()) we will add ## remove all objects from the environment

Set Working Directory

Again if you are working within a Project, this step not necessary; by opening/creating a specific Project, R by default points to your working directory.

It is good practice to keep a set of related data, analyses, and text self-contained in a single folder called the working directory. All of the scripts within this folder can then use relative paths to files. Relative paths indicate where inside the project a file is located (as opposed to absolute paths, which point to where a file is on a specific computer). Working this way makes it a lot easier to move your project around on your computer and share it with others without having to directly modify file paths in the individual scripts.

RStudio provides a helpful set of tools to do this through its “Projects” interface, which not only creates a working directory for you but also remembers its location (allowing you to quickly navigate to it). The interface also (optionally) preserves custom settings and open files to make it easier to resume work after a break.

To point to your working directory via point and click:
Session -> Set Working Directory -> Choose Directory This will allow you to browse your computer for your folder of choice.

Key Bindings Shortcut == Ctrl + Shift + H Now we have selected our working directory, we can see everything within that directory in the lower right pane: Files.

Once you have set your working directory, navigate to your the Environment pane, and click on History. You will see the command line to point to that specific directory. Click on it to highlight, and then click, (green arrow) To Source. This will add that line of code into your script (hence, reproducibility). Anytime you need to use this script again, especially for reporting purposes, all you have to do is run the line of code to set the working directory. Note, if/when you might pop between various scripts, if you are running lines of code which involve uploading new files, if you are in a different working directory than the script requires, you will get an error message in the console when trying to load that file.

Let’s save our script within our Directory. To rename it, click:
File -> Save As Let’s call our script: Training One You can see that it is now added to our Files.

Loading Packages

Packages are part of the beauty of R. R comes loaded with many packages already installed, and R, without any packages installed has many powerful base functions. However, packages are the window to the world of the REALLY AWESOME THINGS YOU CAN DO IN R!! You can create hohum graphics in base R (still pretty nice compared to excel), but if you load the package “GGPlot2” you world will be rocked! Packages also reflect a really cool aspect of R, in that it is open source. It is constantly evolving and becoming smarter and more efficient because of the community of R users, and the free nature of the software. So new packages are constantly being developed to aid in super unique or specialized forms of data analysis, for all kinds of multi-disciplinary purposes (medicine, social sciences, pyschology, trade, etc.) Let’s take a moment to peruse the packages that are already loaded.

Note: packages are also referred to as “libraries”. This is because each package contains a “library” of commands that are specific to that package. You cannot execute commands that are specific to a particular package if you do not have that library loaded.

Even if the package you want is shown in the list on right, you MUST load the package. If it appears on the list in the bottom right console, it indicates it has been installed, but not loaded.

When do I need to install packages? When do I need to load packages (or libraries)?

Anytime you update R, you will need to re-install packages (for the most part). Any time you shut down R , you will need to load the libraries from that package. Again, you cannot execute (run) commands that are package specific without that particular package being loaded.

##Example: most arithmatic functions are already loaded in R.  

log(15)
665/72.3

### There are commands in R Base that are very similar to commands that are particular to "Wrangle" packages. Now that your library is loaded, try loading a data set directly from your directory.

read.csv("active.csv", header = TRUE)

##now try to load that same data set with a more useful method to load a csv, which comes from the tidyverse package, "read_csv"

read_csv("active.csv")
##Notice you have an error message. This is because the tidyverse package is not (well, it shouldn't be at this point) loaded, so you cannot technically access that command.

Wrangling Packages

For our purposes within R & E, the most useful packages we will use are related to Data Wrangling.
Data Wrangling: is a process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and vlauable for a variety of purposes. Many employees within R & E already do data wrangling, but likely through Excel, which without a doubt, is a longer, clunkier and less accurate process than using software like R.

The packages we will use primarily are:

tidyverse
dplyr
readr
lubridate

##Load Packages

You can simply search for your package in the list in the bottom right console, under Packages tab. If you find your package, you can click on it, and TA DA package is loaded. In many cases, you may not find your package on the right list. To search for a package you can click on “Install” in the Packages tab. There you can type in the package you want and it should show up. Click on it to install.

Of course, we want to practice the code to install packages, because this is far faster than searching or pointing and clicking, as long as you know the name of your package.

## to load our wrangle packages:

install.packages(tidyverse) ## question: do we want to do add this code to the source or the console?
install.packages(dplyr)
install.packages(readr)

As a reminder, once the package is loaded, we are not done. We have to load the library for that package to be able to actually use it.

## load the libraries

library(tidyverse)
library(dplyr)
library(readr)

Loading a Data Set

Before we stop for today, we will load a data set and take a look at it.
To load a data set from our data into the R interface, we will use the command: read_csv.
However, we want to do more than just load it. Loading it just shows the data in the console. We want to manipulate the data. This means we are going to turn the data into an object (also in R speak, called assign the data to an object), with a name we can easily use.
Let’s load our data set “Active Clients” with the name “active.q3”“, indicating this is an active client list from quarter 3.

We assign the data to an object using the key binding : <-
You get this binding by clicking on the corresponding keys OR you can click Alt -
You can also think of <- as an equal sign as we get more into different commands we may want to use.


##note you have to indicate the *.filetype* or the file will not load.
##for fun, let us first just load a data set.

read.csv("active.csv")

#you can now see the data has been loaded into R.  However, we cannot do anything with the data as is, other than look at it. To manipute the data, we must assign it to an object.


active <- read_csv("active.csv") ##we have assigne our data set to the object "active"

## note that in R "speak" we refer to loaded csv's often as "data frames", rather than data sets or any other name.  
## we can learn what type of file it is by the following code:
class(active) ## the command "class" indicates what type of data we are working with

To learn more about any command, type ?commandname

## example
?read_csv

#We saw that in ? we can remove the option to have the data types loaded when we load the data. Let's remove the object and reload, so that we don't have to see that clarification. If we want to see it later when it is relevant, there is a code for that!

rm(active) ## the command rm()removes any datasets we want from our environment, they just need to be separated with commas.

##now let's reload

active <- read_csv("active", show_col_types = FALSE)

Let’s look our data. To look using point and click, simply click on the environment and click on the object you just made.

Let’s code instead!


view(active) ##view is the command we use, always followed by the name of the data in parentheses

##to go back to our script

#ctrol + shift +tab

Session 1: Basic R Navigation

September 26th, 2024