The term “R” is used to refer to both the programming language and
software that interprets the scripts written using it.
RStudio is currently a very popular way to not only write your R scripts
but also to interact with the R software. To function correctly, RStudio
needs R and therefore both need to be installed on your computer.
To make it easier to interact with R, we will use RStudio. RStudio is the most popular IDE (Integrated Development Environment) for R. An IDE is a piece of software that provides tools to make programming easier.
When you open up R Studio, you will automatically see 3 panes. A
simple way to see all 4 panes is to click on
View -> Panes -> Show all panes
Alternatively, another way to adjust the placement of these panes can
be found through the menu:
Tools -> Global Options -> Pane Layout
Finally, you can also use the keyboard shortcut (which is the best
practice) :
Alt + Ctrl + Shift + 0
(The above is a GREAT example of how there are often many ways to achieve the same results in R)
There are many reasons to work within a script or a “Project”. For
simplicity of navigation, today (and possibly next lesson) we will open
a new script.
There are obvious benefits to work within a Project:
My rule of thumb:
To start a new script, click on
File -> New File -> New R Script
Many tasks we perform with the mouse can be achieved with a combination of key strokes instead. These keyboard versions for performing tasks are referred to as key bindings. For example, we just showed you how to start a new script, you can but you can also use a key binding: **Ctrl+Shift+N on Windows and command+shift+N on the Mac.
We highly recommend that you memorize key bindings for the operations
you use most. RStudio provides a useful cheat sheet with the most widely
used commands. You can get it from RStudio
directly
Fun Fact: you can ALSO access this in RStudio under the Tools
Menu.
Tools -> Keyboard Shortcuts Help
##you need to do some math real quick? Type your equation in the Console
#example: 125 exits with 13 NA's, 78 positive destinations at exits. How many positive exit destinations in percentage?
78/113 ##type into the console
If you are working in a Project, this is irrelevant, but it is good practice when you may be working on multiple projects simultaneously.
The command ls() will list all objects in your
environment. (Note: you see the result of this command in the
Console on the bottom left). Alternatively, you can click on
Environment and see your data objects listed. IF you
have objects in your environment, it is best to remove them, so that you
start fresh.
To do this, run the command: **rm(list=ls())
ls()
rm(list=ls()) ## here we are telling R the command *remove* (or rather, clear) all of the objects from thew orkspace to be able to start with a clean enviornment.
In your script, sometimes it is useful to write notes that help us understand why or what we are doing with our code, especially when sharing the script with another collaborator. This is where the hashtage (##) comes into play. Any notes we want to have within our script must begin with a hashtag, only 1 is necessary. If you fail to include the hashtag, R will try to run the notes as if it is a command, and you will have error signs. Plus it throws off the rest of your scripts formatting. We can now add hashtags next to ls() to remind us what it was for. As you become more proficient in R, and know the purposes of each command, notes are not always necessary. For now after the code ls() we will add ##check to see what is in the environment. Then, after rm(list=ls()) we will add ## remove all objects from the environment
Again if you are working within a Project, this step not necessary; by opening/creating a specific Project, R by default points to your working directory.
It is good practice to keep a set of related data, analyses, and text self-contained in a single folder called the working directory. All of the scripts within this folder can then use relative paths to files. Relative paths indicate where inside the project a file is located (as opposed to absolute paths, which point to where a file is on a specific computer). Working this way makes it a lot easier to move your project around on your computer and share it with others without having to directly modify file paths in the individual scripts.
RStudio provides a helpful set of tools to do this through its “Projects” interface, which not only creates a working directory for you but also remembers its location (allowing you to quickly navigate to it). The interface also (optionally) preserves custom settings and open files to make it easier to resume work after a break.
To point to your working directory via point and click:
Session -> Set Working Directory -> Choose Directory
This will allow you to browse your computer for your folder of
choice.
Key Bindings Shortcut == Ctrl + Shift + H Now we have selected our working directory, we can see everything within that directory in the lower right pane: Files.
Once you have set your working directory, navigate to your the Environment pane, and click on History. You will see the command line to point to that specific directory. Click on it to highlight, and then click, (green arrow) To Source. This will add that line of code into your script (hence, reproducibility). Anytime you need to use this script again, especially for reporting purposes, all you have to do is run the line of code to set the working directory. Note, if/when you might pop between various scripts, if you are running lines of code which involve uploading new files, if you are in a different working directory than the script requires, you will get an error message in the console when trying to load that file.
Let’s save our script within our Directory. To rename it,
click:
File -> Save As Let’s call our script:
Training One You can see that it is now added to our
Files.
Packages are part of the beauty of R. R comes loaded with many packages already installed, and R, without any packages installed has many powerful base functions. However, packages are the window to the world of the REALLY AWESOME THINGS YOU CAN DO IN R!! You can create hohum graphics in base R (still pretty nice compared to excel), but if you load the package “GGPlot2” you world will be rocked! Packages also reflect a really cool aspect of R, in that it is open source. It is constantly evolving and becoming smarter and more efficient because of the community of R users, and the free nature of the software. So new packages are constantly being developed to aid in super unique or specialized forms of data analysis, for all kinds of multi-disciplinary purposes (medicine, social sciences, pyschology, trade, etc.) Let’s take a moment to peruse the packages that are already loaded.
Note: packages are also referred to as “libraries”. This is because each package contains a “library” of commands that are specific to that package. You cannot execute commands that are specific to a particular package if you do not have that library loaded.
Even if the package you want is shown in the list on right, you MUST load the package. If it appears on the list in the bottom right console, it indicates it has been installed, but not loaded.
Anytime you update R, you will need to re-install packages (for the most part). Any time you shut down R , you will need to load the libraries from that package. Again, you cannot execute (run) commands that are package specific without that particular package being loaded.
##Example: most arithmatic functions are already loaded in R.
log(15)
665/72.3
### There are commands in R Base that are very similar to commands that are particular to "Wrangle" packages. Now that your library is loaded, try loading a data set directly from your directory.
read.csv("active.csv", header = TRUE)
##now try to load that same data set with a more useful method to load a csv, which comes from the tidyverse package, "read_csv"
read_csv("active.csv")
##Notice you have an error message. This is because the tidyverse package is not (well, it shouldn't be at this point) loaded, so you cannot technically access that command.
For our purposes within R & E, the most useful packages
we will use are related to Data Wrangling.
Data Wrangling: is a process of transforming and
mapping data from one “raw” data form into another format with
the intent of making it more appropriate and vlauable for a variety of
purposes. Many employees within R & E already do data wrangling, but
likely through Excel, which without a doubt, is a longer, clunkier and
less accurate process than using software like R.
The packages we will use primarily are:
##Load Packages
You can simply search for your package in the list in the bottom right console, under Packages tab. If you find your package, you can click on it, and TA DA package is loaded. In many cases, you may not find your package on the right list. To search for a package you can click on “Install” in the Packages tab. There you can type in the package you want and it should show up. Click on it to install.
Of course, we want to practice the code to install packages, because this is far faster than searching or pointing and clicking, as long as you know the name of your package.
## to load our wrangle packages:
install.packages(tidyverse) ## question: do we want to do add this code to the source or the console?
install.packages(dplyr)
install.packages(readr)
As a reminder, once the package is loaded, we are not done. We have to load the library for that package to be able to actually use it.
## load the libraries
library(tidyverse)
library(dplyr)
library(readr)
Before we stop for today, we will load a data set and take a look at
it.
To load a data set from our data into the R interface, we will use the
command: read_csv.
However, we want to do more than just load it. Loading it just shows the
data in the console. We want to manipulate the data. This means
we are going to turn the data into an object (also in R speak, called
assign the data to an object), with a name we can easily use.
Let’s load our data set “Active Clients” with the name
“active.q3”“, indicating this is an active client list from
quarter 3.
We assign the data to an object using the key binding :
<-
You get this binding by clicking on the corresponding keys OR you can
click Alt -
You can also think of <- as an equal sign as we get
more into different commands we may want to use.
##note you have to indicate the *.filetype* or the file will not load.
##for fun, let us first just load a data set.
read.csv("active.csv")
#you can now see the data has been loaded into R. However, we cannot do anything with the data as is, other than look at it. To manipute the data, we must assign it to an object.
active <- read_csv("active.csv") ##we have assigne our data set to the object "active"
## note that in R "speak" we refer to loaded csv's often as "data frames", rather than data sets or any other name.
## we can learn what type of file it is by the following code:
class(active) ## the command "class" indicates what type of data we are working with
To learn more about any command, type ?commandname
## example
?read_csv
#We saw that in ? we can remove the option to have the data types loaded when we load the data. Let's remove the object and reload, so that we don't have to see that clarification. If we want to see it later when it is relevant, there is a code for that!
rm(active) ## the command rm()removes any datasets we want from our environment, they just need to be separated with commas.
##now let's reload
active <- read_csv("active", show_col_types = FALSE)
Let’s look our data. To look using point and click, simply click on the environment and click on the object you just made.
Let’s code instead!
view(active) ##view is the command we use, always followed by the name of the data in parentheses
##to go back to our script
#ctrol + shift +tab