BI412L Lab 1: Intro to R: Part 1

Navigating the RStudio Interface

RStudio’s default window layout is divided into four main panes, each serving different functions to help you code, analyze, and visualize data effectively. Here’s a breakdown of these panes and their typical contents:

Source/Script Editor (Top-Left Pane):
- Purpose: This pane is where you write and edit R scripts (.R files), R Markdown (.Rmd) documents, or other types of text files.
- Features:
  - Run Code: Use the “Run” button or Ctrl + Enter (Windows) / Cmd + Enter (Mac) to execute a current line the cursor is on or any highlighted codes.
  - Open Files: Click on the tabs to switch between open scripts, or use the file browser to open a new script.
  - Syntax Highlighting: The editor provides syntax highlighting for R code, making it easier to read and debug.
  - Find/Replace: Use Ctrl + F (Windows) / Cmd + F (Mac) to search within the script.
Console (Bottom-Left Pane):
- Purpose: The console is where you can directly type and execute R commands, and where R outputs the results of executed code.
- Features:
  - Type Commands: You can type R commands directly and press Enter to run them.
  - View Output: The results of executed commands, including error messages, will be displayed here.
  - Command History: Use the up and down arrow keys to scroll through previously entered commands.
Environment/History/Connections (Top-Right Pane):
1. Environment Tab:
- Purpose: Displays the objects (like data frames, vectors, etc.) that are currently defined and loaded in the R environment.
- Features: Click on objects to view their details or use the broom icon to clear the environment.
1. History Tab:
- Purpose: Shows a history of all the commands you’ve executed during the current session.
- Features: Click on a command to re-run it, or clear the history as needed.
1. Connections Tab:
- Purpose: Allows you to manage connections to external databases.
- Features: Use the interface to connect to a new database or manage existing connections.
Files/Plots/Packages/Help/Viewer (Bottom-Right Pane):
1. Files Tab:
- Purpose: Shows the file structure of your working directory.
- Features: You can browse files, open them, or change the working directory.
1. Plots Tab:
- Purpose: Displays the plots generated by your R code.
- Features: You can export or clear plots, and navigate through plot history.
1. Packages Tab:
- Purpose: Lists all the installed R packages.
- Features: Install new packages, load them into your session, or update existing ones.
1. Help Tab:
- Purpose: Provides access to R documentation and help files.
- Features: Use the search bar to find help topics, or browse documentation.
1. Viewer Tab:
- Purpose: Displays HTML content, such as web pages or R Markdown reports.
- Features: Browse the content rendered within RStudio, like a mini-browser.

Keyboard Shortcuts for Easier Navigation

Though not required to use, keyboard shortcuts are an efficient way of working with R. Here are a few helpful keyboard shortcuts commonly used when working with R:

Switch between panes: Ctrl + 1 to Ctrl + 4 (Windows) / Cmd + 1 to Cmd + 4 (Mac).
Run Code: Run code: Ctrl + Enter (Windows) / Cmd + Enter (Mac).
Clear console: Ctrl + L (Windows) / Cmd + L (Mac).
Close all tabs in Source Editor: Ctrl + Shift + W (Windows) / Cmd + Shift + W (Mac).
Left Arrow (for defining objects/variables): Alt + - (Windows) / Option + - (Mac).
Move cursor to beginning of line: Home (Windows) / Cmd + Left (Mac).
Move cursor to end of line: End (Windows) / Cmd + Right (Mac).
Highlight lines of code: Shift + Down Arrow (Windows) / Shift + Option + Down Arrow (Mac).

R as a Calculator

To understand R and its maximum capabilities, let’s first think of R to function as a calculator. By default, R will read expressions following the order of operations; PEMDAS (Parentheses-Exponents-Multiplication-Division-Addition-Subtraction) or GEMS (Groupings-Exponents-Multiplication/Division-Subtraction/Addition). Later in this section, we will practice using parentheses to group expressions we want done in our desired order.

Basic Math Operations

Addition: We use the + in order to add or sum values up.
```
2 + 2
```
```
## [1] 4
```
Subtraction: We use the - to subtract or find the difference between values.
```
6 - 7
```
```
## [1] -1
```
Multiplication: We use the * (asterisk) to multiply values.
```
4 * 3
```
```
## [1] 12
```
Division: We use the / (forward slash) to divide values
```
2/3
```
```
## [1] 0.6666667
```
Logarithms: By default, R reads the log function with base e (as a natural log) as opposed to common log with base 10 and uses log( ) to evaluate logarithms.

Review:
- General Form: \(log_{a}(b) = x \Longrightarrow b=a^{x}\)
- Common Log: \(log(10) = x \Longrightarrow 10 = 10^{x}\)
- Natural Log: \(ln(10)=x \Longrightarrow 10=e^{x}\)
```
log(10)
```
```
## [1] 2.302585
```
```
#This will not output 1 but rather a value closer to e.
```
Although R uses base e by default, we can still define the base for the log function to change it from e to what we define it as.
```
log(10, base = 10) 
```
```
## [1] 1
```
```
#Here I defined the base with a comma followed by "base = 10" and now we would expect this to output as 1 because the base is no longer e but is now 10. 

log(10, 10)
```
```
## [1] 1
```
```
#Even bettter we can define the base by following with just a comma and then the base we want it defined as.

#Both operations should output as 1.
```

Square Root: R uses sqrt( ) to evaluate SQUARE roots.

sqrt(16)

## [1] 4

sqrt(4)

## [1] 2

sqrt(6)

## [1] 2.44949

Exponents: R uses ^ (caret symbol) to evaluate exponents.
```
2^2
```
```
## [1] 4
```
```
7^3
```
```
## [1] 343
```
```
2^37
```
```
## [1] 1.37439e+11
```
By default R can output numbers up to 7 digits total (or 6 digits after the decimal point) before it turns it into scientific notation.

\(2^{37}\) is equal to \(1.37439e+11\) which means \(1.37439 \times 10^{11}\) or \(137439000000\)
Bigger Roots/Fractional Exponents: Sometimes we want to find larger roots of numbers such as

\(\sqrt[3]{10}\) or \(\sqrt[5]{10^{2}}\)

We are able to compute larger roots if we first change these roots into fractional exponents.

Review:

General form: \(\sqrt[a]{b^{c}} = b^{\frac{c}{a}}\)

Therefore,

\(\sqrt[3]{10} = 10^{1/3}\) and \(\sqrt[5]{10^{2}} = 10^{2/5}\) . Since we know how to compute exponents in R, we can apply the same technique for fractional exponents. Be sure to encase your fractional exponent in parentheses so R knows how to read the operations the way you want it to.
```
10^(1/3)
```
```
## [1] 2.154435
```
```
10^(2/5)
```
```
## [1] 2.511886
```
```
8^(1/3)
```
```
## [1] 2
```
e Raised to a Power: In order to evaluate something like:

\(e^{3}\) we do not write e^3. Instead we evaluate any e raised to a power x as exp(x).

Therefore, \(e^3\) in R is:
```
exp(3)
```
```
## [1] 20.08554
```
Absolute Value: Absolute values can be expressed as abs( ).
```
abs(1)
```
```
## [1] 1
```
```
abs(-1)
```
```
## [1] 1
```

Math Constants

Pi (3.14….): Pi is expressed as pi in R.
```
3*pi
```
```
## [1] 9.424778
```
e (2.71….): We can express the constant e as exp(1).
```
exp(1) / 4
```
```
## [1] 0.6795705
```

Calculating Complex Expressions

In addition to basic math operations, we can calculate more complex expressions in R. As mathematical expressions get more complicated, it is a good idea to use parentheses to group and let R know what should be done first in terms of operations.

Lets calculate the following in R:

\(\dfrac{(2 + 3)}{5}\)
```
(2+3)/5
```
```
## [1] 1
```
```
2+3/5 
```
```
## [1] 2.6
```

Notice: Although the expressions are exactly the same, the use of parentheses tells R what to do first. In the 1st expression, R will know to compute 2+3 then divide by 5 because we encased 2+3 in parentheses. As opposed to the 2nd expression where R will, by default, use the order of operations to calculate the expression. Thus, in the 2nd expression, R will compute 3/5 first then add 2 to that result.

\(\dfrac{5}{9} \times (40 - 32)\)
```
(5/9) * (40-32)
```
```
## [1] 4.444444
```

The preceding expression is the conversion of temperature from degrees-Fahrenheit to degrees-Celsius, where 40 was the given temperature in degrees-Fahrenheit and R computed and outputted the temperature in degrees-Celsius. Later we will introduce defining objects/vectors (variables) to see how the same expressions can be easily computed, without having to continuously type the same expression and changing the input value.

\(|5-\sqrt{2} \left(\sqrt[3]{5^{5}} - 2\right)|\)
```
abs(5-(sqrt(2)*(5^(5/3)-2)))
```
```
## [1] 12.8475
```

\(\dfrac{1}{\sqrt{2 \pi (3.1)^{2}}} e^{-\dfrac{(12-10.7)^{2}}{2(3.1)}}\)

(1/(sqrt(2 * pi * 3.1^2))) * exp(-((12-10.7)^2)/(2 * 3.1))

## [1] 0.09798692

With evaluating larger expressions and using groupins (parentheses) will result in “+” when trying to run the code. The “+” most oftenly means that there is a mismatch and/or missing parentheses when using it for grouping. Be sure that parenheses match in terms of your groupings and there are no missing parentheses. If the issue still persists, you may restart R by going “Session” > “Restart R.” Then run your code again.

If you are having a difficult time computing more complex expressions like the one above, a helpful tip is to break the complex expressions into simpler expressions then combining them at the end. That way, we can see prematurely if our smaller codes and computations are running before running the bigger, complex expression.

(1/(sqrt(2 * pi * 3.1^2))) #1st Part of equation

## [1] 0.1286911

exp(-((12-10.7)^2)/(2 * 3.1)) #2nd part of equation

## [1] 0.761412

(1/(sqrt(2 * pi * 3.1^2))) * exp(-((12-10.7)^2)/(2 * 3.1)) #1st part multiplied by 2nd part

## [1] 0.09798692

Saving your RScript and Code

When analyzing your data, it’s essential to keep a record of all commands and notes to retrace your steps later. In RStudio, create a script by selecting “File” > “New File” > “R Script.” This opens a new script window where you can type or paste commands (excluding the “>” prompt). Save the script to reuse it in the future by clicking on the “floppy-disk” logo at the top menu bar. It’s best to type and run commands from the script window to maintain a record of your session for easier replication.

Comments

In scripts, it can be very useful to save a bit of text which is not to be evaluated by R. You can leave a note to yourself (or a colleague) about what the next line is supposed to do, what its strengths and limitations are, or anything else you want to remember later. To leave a note, we use “comments”, which are a line of text that starts with the hash symbol # (Hash tag). Anything on a line after a # will be ignored by R.

21 + 7 # This is a comment. Running this in R will

## [1] 28

# have no effect and only everything that doesnt start with # will run.

Defining Objects

Scalar Object - A Single Value Object

In R, we can store information of various sorts by assigning them to objects. For example, if we want to create a object called x and give it a value of 4, we would write

x <- 4

The middle bit of this—a less than sign and a hyphen typed together to make something that looks a little like a left-pointing arrow—tells R to assign the value on the right to the object on the left. We can also use keyboard shortcuts to denote this symbol using: Alt + - (Windows) / Option + - (Mac). After running the command above, whenever we use x in a command it would be replaced by its value 4. For example, if we add 3 to x, we would expect to get 7.

x + 3

## [1] 7

We can always reassign a new value to a object. If we now tell R that x is equal to 32:

x <- 32

then x will update and take its new value.

## [1] 32

Vector Object - A Multiple Value Object

Just like a scalar (or single value object) we can create what we call a vector in which a single object has multiple values as oppose to a singular value. For example, what if we want to define the object y and give it values 1, 2, 3, 4, 5. We can achieve this by defining the object as we would with a singular value with our left-pointing arrow but embedding our set of values in c( ) and separating each value with a comma.

y = c(6, 7, 3, 4, 2)

This is very useful when we are inputting data manually into a data vector to put that vector in a data frame which we will talk about more in the next lab. Notice that in the environment, we now have an object y created with num [1:5] 1 2 3 4 5 assigned to this object. num represents the type of object it is in this case num is an abbreviation for numeric, which is a data type that represents numbers. [1:5] indicates that the vector contains 5 elements, with the first element (which is the value 6) indexed as the first element and the last (which is the value 2) as the fifth element.

If we define an object to be a vector with multiple values and do some computation with that object in an expression, it will not only output a single value, but it will out however many values there are in that vector because the transformation through the expression applies to all single vector value that we defined.

Let’s use the example with temperature conversion from degrees-Fahrenheit to degrees-Celsius:

\(C = \dfrac{5}{9} \times (F - 32)\)

Lets first appropriately define a vector with a set of temperature values in degrees-Fahrenheit:

temp.f <- c(67, 43, 78, 90, 81)

We then will write the expression conversion for degrees-Fahrenheit to degrees-Celsius replacing the degrees-Fahrenheit with our object name:

(5/9) * (temp.f - 32)

## [1] 19.444444  6.111111 25.555556 32.222222 27.222222

Notice that we have 5 values outputted. This is because R takes each value in temp.f vector and applies it to the conversion expression to get our degrees-Celsius. For example:

\(\dfrac{5}{9} (\textbf{67} - 32) = 19.444444\)

\(\dfrac{5}{9} (\textbf{43} - 32) = 6.111111\)

and so on so forth.

Names/Naming Objects

When naming your objects whether they be scalars or vectors, be sure that it is appropriate and that it easily represents the data that you are trying to use for analysis. Two main rules that we must follow when naming objects for R to read is to ensure that 1) Object names DO NOT have any spaces and 2) Object names DO NOT start with a number. One other thing to note is that R is case sensitive, in terms of object naming and defining. Meaning weight.lbs is not the same as Weight.lbs.

Here are a list of conventional and acceptable ways of naming objects: For the examples below we will refer to appropriately naming caterpillar lengths in centimeters.

Mixed Letter Cases: Can use capital letters for the first letter in every word and every other letter is lowercase
- Example: CaterpillarLengthCM
```
CaterpillarLengthCM <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)
```
Underscores in Between Words: Can leave all letters lower-cased but inputs an underscore in between every word
- Example: caterpillar_length_cm
```
caterpillar_length_cm <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)
```
Periods in Between Words: Can leave all letters lower-cased but inputs a period in between every word
- Example: caterpillar.length.cm
```
caterpillar.length.cm <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)
```

If we include spaces or put a number first for our object names, we get an error. For example:

21CaterpillarLengthCM <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)
Caterpillar Length CM <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)

If you try running it yourself in R, it will output an error message.