To understand R and its maximum capabilities, let’s first think of R to function as a calculator. By default, R will read expressions following the order of operations; PEMDAS (Parentheses-Exponents-Multiplication-Division-Addition-Subtraction) or GEMS (Groupings-Exponents-Multiplication/Division-Subtraction/Addition). Later in this section, we will practice using parentheses to group expressions we want done in our desired order.
Addition: We use the + in order to add or sum values up.
2 + 2
## [1] 4Subtraction: We use the - to subtract or find the difference between values.
6 - 7
## [1] -1Multiplication: We use the * (asterisk) to multiply values.
4 * 3
## [1] 12Division: We use the / (forward slash) to divide values
2/3
## [1] 0.6666667Logarithms: By default, R reads the log function with base e (as a natural log) as opposed to common log with base 10 and uses log( ) to evaluate logarithms.
Review:
General Form: \(log_{a}(b) = x \Longrightarrow b=a^{x}\)
Common Log: \(log(10) = x \Longrightarrow 10 = 10^{x}\)
Natural Log: \(ln(10)=x \Longrightarrow 10=e^{x}\)
log(10)
## [1] 2.302585
#This will not output 1 but rather a value closer to e.
Although R uses base e by default, we can still define the base for the log function to change it from e to what we define it as.
log(10, base = 10)
## [1] 1
#Here I defined the base with a comma followed by "base = 10" and now we would expect this to output as 1 because the base is no longer e but is now 10.
log(10, 10)
## [1] 1
#Even bettter we can define the base by following with just a comma and then the base we want it defined as.
#Both operations should output as 1.Square Root: R uses sqrt( ) to evaluate SQUARE roots.
sqrt(16)
## [1] 4
sqrt(4)
## [1] 2
sqrt(6)
## [1] 2.44949Exponents: R uses ^ (caret symbol) to evaluate exponents.
2^2
## [1] 4
7^3
## [1] 343
2^37
## [1] 1.37439e+11
By default R can output numbers up to 7 digits total (or 6 digits after the decimal point) before it turns it into scientific notation.
\(2^{37}\) is equal to \(1.37439e+11\) which means \(1.37439 \times 10^{11}\) or \(137439000000\)
Bigger Roots/Fractional Exponents: Sometimes we want to find larger roots of numbers such as
\(\sqrt[3]{10}\) or \(\sqrt[5]{10^{2}}\)
We are able to compute larger roots if we first change these roots into fractional exponents.
Review:
General form: \(\sqrt[a]{b^{c}} = b^{\frac{c}{a}}\)
Therefore,
\(\sqrt[3]{10} = 10^{1/3}\) and \(\sqrt[5]{10^{2}} = 10^{2/5}\) . Since we know how to compute exponents in R, we can apply the same technique for fractional exponents. Be sure to encase your fractional exponent in parentheses so R knows how to read the operations the way you want it to.
10^(1/3)
## [1] 2.154435
10^(2/5)
## [1] 2.511886
8^(1/3)
## [1] 2e Raised to a Power: In order to evaluate something like:
\(e^{3}\) we do not write e^3. Instead we evaluate any e raised to a power x as exp(x).
Therefore, \(e^3\) in R is:
exp(3)
## [1] 20.08554Absolute Value: Absolute values can be expressed as abs( ).
abs(1)
## [1] 1
abs(-1)
## [1] 1Pi (3.14….): Pi is expressed as pi in R.
3*pi
## [1] 9.424778e (2.71….): We can express the constant e as exp(1).
exp(1) / 4
## [1] 0.6795705In addition to basic math operations, we can calculate more complex expressions in R. As mathematical expressions get more complicated, it is a good idea to use parentheses to group and let R know what should be done first in terms of operations.
Lets calculate the following in R:
\(\dfrac{(2 + 3)}{5}\)
(2+3)/5
## [1] 1
2+3/5
## [1] 2.6Notice: Although the expressions are exactly the same, the use of parentheses tells R what to do first. In the 1st expression, R will know to compute 2+3 then divide by 5 because we encased 2+3 in parentheses. As opposed to the 2nd expression where R will, by default, use the order of operations to calculate the expression. Thus, in the 2nd expression, R will compute 3/5 first then add 2 to that result.
\(\dfrac{5}{9} \times (40 - 32)\)
(5/9) * (40-32)
## [1] 4.444444The preceding expression is the conversion of temperature from degrees-Fahrenheit to degrees-Celsius, where 40 was the given temperature in degrees-Fahrenheit and R computed and outputted the temperature in degrees-Celsius. Later we will introduce defining objects/vectors (variables) to see how the same expressions can be easily computed, without having to continuously type the same expression and changing the input value.
\(|5-\sqrt{2} \left(\sqrt[3]{5^{5}} - 2\right)|\)
abs(5-(sqrt(2)*(5^(5/3)-2)))
## [1] 12.8475\(\dfrac{1}{\sqrt{2 \pi (3.1)^{2}}} e^{-\dfrac{(12-10.7)^{2}}{2(3.1)}}\)
(1/(sqrt(2 * pi * 3.1^2))) * exp(-((12-10.7)^2)/(2 * 3.1))
## [1] 0.09798692With evaluating larger expressions and using groupins (parentheses) will result in “+” when trying to run the code. The “+” most oftenly means that there is a mismatch and/or missing parentheses when using it for grouping. Be sure that parenheses match in terms of your groupings and there are no missing parentheses. If the issue still persists, you may restart R by going “Session” > “Restart R.” Then run your code again.
If you are having a difficult time computing more complex expressions like the one above, a helpful tip is to break the complex expressions into simpler expressions then combining them at the end. That way, we can see prematurely if our smaller codes and computations are running before running the bigger, complex expression.
(1/(sqrt(2 * pi * 3.1^2))) #1st Part of equation
## [1] 0.1286911
exp(-((12-10.7)^2)/(2 * 3.1)) #2nd part of equation
## [1] 0.761412
(1/(sqrt(2 * pi * 3.1^2))) * exp(-((12-10.7)^2)/(2 * 3.1)) #1st part multiplied by 2nd part
## [1] 0.09798692
When analyzing your data, it’s essential to keep a record of all commands and notes to retrace your steps later. In RStudio, create a script by selecting “File” > “New File” > “R Script.” This opens a new script window where you can type or paste commands (excluding the “>” prompt). Save the script to reuse it in the future by clicking on the “floppy-disk” logo at the top menu bar. It’s best to type and run commands from the script window to maintain a record of your session for easier replication.
In R, we can store information of various sorts by assigning them to objects. For example, if we want to create a object called x and give it a value of 4, we would write
x <- 4
The middle bit of this—a less than sign and a hyphen typed together to make something that looks a little like a left-pointing arrow—tells R to assign the value on the right to the object on the left. We can also use keyboard shortcuts to denote this symbol using: Alt + - (Windows) / Option + - (Mac). After running the command above, whenever we use x in a command it would be replaced by its value 4. For example, if we add 3 to x, we would expect to get 7.
x + 3
## [1] 7
We can always reassign a new value to a object. If we now tell R that x is equal to 32:
x <- 32
then x will update and take its new value.
x
## [1] 32
Just like a scalar (or single value object) we can create what we call a vector in which a single object has multiple values as oppose to a singular value. For example, what if we want to define the object y and give it values 1, 2, 3, 4, 5. We can achieve this by defining the object as we would with a singular value with our left-pointing arrow but embedding our set of values in c( ) and separating each value with a comma.
y = c(6, 7, 3, 4, 2)
This is very useful when we are inputting data manually into a data vector to put that vector in a data frame which we will talk about more in the next lab. Notice that in the environment, we now have an object y created with num [1:5] 1 2 3 4 5 assigned to this object. num represents the type of object it is in this case num is an abbreviation for numeric, which is a data type that represents numbers. [1:5] indicates that the vector contains 5 elements, with the first element (which is the value 6) indexed as the first element and the last (which is the value 2) as the fifth element.
If we define an object to be a vector with multiple values and do some computation with that object in an expression, it will not only output a single value, but it will out however many values there are in that vector because the transformation through the expression applies to all single vector value that we defined.
Let’s use the example with temperature conversion from degrees-Fahrenheit to degrees-Celsius:
\(C = \dfrac{5}{9} \times (F - 32)\)
Lets first appropriately define a vector with a set of temperature values in degrees-Fahrenheit:
temp.f <- c(67, 43, 78, 90, 81)
We then will write the expression conversion for degrees-Fahrenheit to degrees-Celsius replacing the degrees-Fahrenheit with our object name:
(5/9) * (temp.f - 32)
## [1] 19.444444 6.111111 25.555556 32.222222 27.222222
Notice that we have 5 values outputted. This is because R takes each value in temp.f vector and applies it to the conversion expression to get our degrees-Celsius. For example:
\(\dfrac{5}{9} (\textbf{67} - 32) = 19.444444\)
\(\dfrac{5}{9} (\textbf{43} - 32) = 6.111111\)
and so on so forth.
When naming your objects whether they be scalars or vectors, be sure that it is appropriate and that it easily represents the data that you are trying to use for analysis. Two main rules that we must follow when naming objects for R to read is to ensure that 1) Object names DO NOT have any spaces and 2) Object names DO NOT start with a number. One other thing to note is that R is case sensitive, in terms of object naming and defining. Meaning weight.lbs is not the same as Weight.lbs.
Here are a list of conventional and acceptable ways of naming objects: For the examples below we will refer to appropriately naming caterpillar lengths in centimeters.
Mixed Letter Cases: Can use capital letters for the first letter in every word and every other letter is lowercase
CaterpillarLengthCM <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)Underscores in Between Words: Can leave all letters lower-cased but inputs an underscore in between every word
caterpillar_length_cm <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)Periods in Between Words: Can leave all letters lower-cased but inputs a period in between every word
caterpillar.length.cm <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)If we include spaces or put a number first for our object names, we get an error. For example:
21CaterpillarLengthCM <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)
Caterpillar Length CM <- c(3.288878, 9.788281, 4.408389, 6.508248, 7.137628, 2.025207)
If you try running it yourself in R, it will output an error message.
Comments
In scripts, it can be very useful to save a bit of text which is not to be evaluated by R. You can leave a note to yourself (or a colleague) about what the next line is supposed to do, what its strengths and limitations are, or anything else you want to remember later. To leave a note, we use “comments”, which are a line of text that starts with the hash symbol # (Hash tag). Anything on a line after a # will be ignored by R.