R Tutorial 1: The Basics

MKT 410: Marketing Analytics

Author

Levin Zhu

Introduction

Data analytics (and marketing analytics) is a fun and exciting area. The above figure outlines the main steps of a typical data or marketing analytics project workflow.

In these five tutorials, we will cover the main steps for working with data that are necessary for us to begin communicating data insights. We will first cover the basics of R in this Tutorial. We will then discuss data transformation, a crucial step in data analysis that allows you to make necessary changes and manipulations of the data to make better sense of them, in tutorial 2. Tutorials 3 and 4 cover data importing and data tidying, the precursors to data transformation. These concepts are covered after transformation since they are, unfortunately, “boring”, and it is better to start with more interesting concepts such as data transformation. Finally, we will take our data to visualizations, which allow us to visually understand vast datasets in simple, easy-to-digest figures, in Tutorial 5. Note that we will not cover the modeling and communication portions of the above workflow.

Acknowledgements

These tutorials were sourced and modified mainly from R for Data Science (2e) by Hadley Wickham, Mine Cetinkaya-Rundel, and Garrett Grolemund and Hands-On Programming with R by Garrett Grolemund, which are excellent and more in-depth resources that I encourage you to read for additional information on data analytics and programming in R.

RStudio User Interface

The RStudio interface is simple. You type R code into the bottom line of the RStudio console pane and then click Enter to run it. The code you type is called a command, and the line you type it into is called the command line.

For example, if you type 1 + 1 and hit Enter, RStudio will display:

> 1 + 1
[1] 2

The [1] to the left of the result lets you know that this line begins the first value of your result. If R needs multiple lines to show all the results, it will put another number within square brackets [] that will tell you which result the first value on that line is showing.

> 1:100
 [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
[19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
[37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
[55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
[73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
[91]  91  92  93  94  95  96  97  98  99 100

In the example above, the command 1:100 creates a sequence of integers from 1 to 100.

If you type an incomplete command and press Enter, R will display a + prompt, which means R is waiting for you to type the rest of your command. Either finish the command or press Escape to start over.

> 5 -
+
+ 1
[1] 4

If you type a command that R doesn’t recognize, it will return an error message.

> 3 % 5
Error: unexpected input in "3 % 5"
> 

Basic Calculations

You can use R to do basic math calculations:

1 / 200 * 30
#> [1] 0.15
(59 + 73 + 2) / 3
#> [1] 44.66667
sin(pi / 2)
#> [1] 1

Objects

You can create new objects with the assignment operator <-:

x <- 3 * 4

Note that the value of x is not printed, it’s just stored. If you want to view the value, type x in the console.

x
#> [1] 12

All R statements where you create objects, assignment statements, have the same form:

object_name <- value

While <- is a pain to type, you can save time with RStudio’s keyboard shortcut: Alt + - (the minus sign). Notice that RStudio automatically surrounds <- with spaces, which is a good code formatting practice.

Naming Objects

Object names must start with a letter and can only contain letters, numbers, _, and .. You want your object names to be descriptive, so you’ll need to adopt a convention for multiple words. A common convention is snake_case, where you separate lowercase words with _. There are other conventions as well.

i_use_snake_case
otherPeopleUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention

Make another assignment:

this_is_a_really_long_name <- 2.5

To inspect this object, try out RStudio’s completion facility: type “this”, press TAB, add characters until you have a unique prefix, then press return.

Let’s assume you made a mistake, and that the value of this_is_a_really_long_name should be 3.5, not 2.5. You can use another keyboard shortcut to help you fix it. For example, you can press ↑ to bring the last command you typed and edit it. Or, type “this” then press Cmd/Ctrl + ↑ to list all the commands you’ve typed that start with those letters. Use the arrow keys to navigate, then press enter to retype the command. Change 2.5 to 3.5 and rerun.

Make yet another assignment:

r_rocks <- 2^3

Let’s try to inspect it:

r_rock
#> Error: object 'r_rock' not found
R_rocks
#> Error: object 'R_rocks' not found

This illustrates the implied contract between you and R: R will do the tedious computations for you, but in exchange, you must be completely precise in your instructions. If not, you’re likely to get an error that says the object you’re looking for was not found.

  • Typos matter; R can’t read your mind and say, “oh, they probably meant r_rocks when they typed r_rock”.

  • Case matters; similarly, R can’t read your mind and say, “oh, they probably meant r_rocks when they typed R_rocks”.

Vectorized Calculations

You can combine multiple elements into a vector with c():

primes <- c(2, 3, 5, 7, 11, 13)

And basic arithmetic on vectors is applied to every element of the vector (element-wise execution):

primes * 2
#> [1]  4  6 10 14 22 26
primes - 1
#> [1]  1  2  4  6 10 12

When you run arithmetic on multiple vectors, R line up the vectors and apply the same operation between each corresponding element in the set of vectors.

die <- 1:6
die * die
#> [1]  1  4  9 16 25 36

When we run die * die, R lines up the two die vectors and then multiplies the first element of vector 1 by the first element of vector 2, then multiplies the second element of vector 1 by the second element of vector 2, and so on.

If the vectors are of unequal length, R will repeat the shorter vector until it is as long as the longer vector, and then do the math (the original length of the vector won’t change).

die * 1:2
#> [1]  1  4  3  8  5 12

Matrix Calculations

You can also do matrix calculations within R, using %*% for inner multiplication (dot product) and %o% for outer multiplication (cross product).

# inner multiplication (dot product)
die %*% die
#>      [,1]
#> [1,]   91

# outer multiplication (cross product)
die %o% die
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    2    3    4    5    6
#> [2,]    2    4    6    8   10   12
#> [3,]    3    6    9   12   15   18
#> [4,]    4    8   12   16   20   24
#> [5,]    5   10   15   20   25   30
#> [6,]    6   12   18   24   30   36

Comments

R will ignore any text after # for that line. This allows you to write comments, text that is ignored by R but read by other humans.

Comments can be helpful for briefly describing what the following code does.

# create vector of primes
primes <- c(2, 3, 5, 7, 11, 13)

# multiply primes by 2
primes * 2
#> [1]  4  6 10 14 22 26

With short pieces of code like this, leaving a comment for every single line of code might not be necessary. But as the code you’re writing gets more complex, comments can save you (and your collaborators) a lot of time figuring out what was done in the code.

Functions

R has a large collection of built-in functions that are called like this:

function_name(argument1 = value1, argument2 = value2, ...)

Let’s try using seq(), which makes regular sequences of numbers, and while we’re at it, learn more helpful features of RStudio. Type se and hit TAB. A popup shows you possible completions. Specify seq() by typing more (a q) to disambiguate or by using ↑/↓ arrows to select.

When you’ve selected the function you want, press TAB again. RStudio will add matching opening (() and closing ()) parentheses for you. Type the name of the first argument, from, and set it equal to 1. Then, type the name of the second argument, to, and set it equal to 10. Finally, hit return.

seq(from = 1, to = 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10

It is not necessary to include the names of the arguments, as long as the values of each argument are entered in order:

seq(1, 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10

There are a lot of functions included in “base” R. Let’s check out a few of them.

round(3.1415)
#> [1] 3
factorial(3)
#> [1] 6

We can also pass the output of one function directly into another function.

mean(1:6)
#> [1] 3.5
mean(die)
#> [1] 3.5
round(mean(die))
#> [1] 4

Writing Your Own Functions

You can write your own functions using the following syntax:

my_function <- function(arg1, arg2) {
  # do things
}

For example, let’s make a function for evaluating the sum of multiple dice rolls. For this function, we will call the sample() function which takes as arguments 1) the values to sample from, 2) the number of samples to take, and 3) whether to sample with or without replacement.

roll <- function(die = 1:6, rolls = 2) {
  # roll dice
  results <- sample(die, size = rolls, replace = TRUE)
  # sum the results
  sum(results)
}

Notice that we’ve included values next to the arguments in our new function. These are the “default” values when the function is called without specifying any arguments. Default values are not necessary when writing a new function.

# using default values
roll()
#> [1] 9
roll()
#> [1] 11
roll()
#> [1] 10

# rolling three 12-sided dice
roll(1:12, 3)
#> [1] 19
roll(1:12, 3)
#> [1] 17
roll(1:12, 3)
#> [1] 27

Environment

Note that the environment tab in the upper right pane displays all of the objects that you’ve created:

Scripts

Let’s say you want to edit the function roll() again. Instead of going back to re-type everything, it is easier to save a draft of the code in a script. An R script is just a plain text file that you save R code in. Create a new script in RStudio by going to File > New File > R script in the menu bar.

After editing your script, you can save it using File > Save As and naming the script.

Projects

The Working Directory

Every time you work in R, there is a notion called the working directory. This is where R looks for files that you ask it to load, and where it will put any files that you ask it to save. RStudio shows you the current working directory at the top of the console:

And you can print this out in R code by running getwd():

getwd()
#> [1] "/Users/levinzhu/Dropbox/UH Manoa/Teaching/2024-2025/MKT 410 Marketing Analytics/Tutorial"

Think of the working directory as the “home” to an R session.

To ensure all your scripts and code relevant for your project are all in one place, you can use an RStudio project.

RStudio Projects

Keeping all the files associated with a given project (input data, R scripts, analytical results, and figures) together in one directory is a very common practice and RStudio has built-in support for this via projects. To make a project, click File > New Project, then follow the steps as follows:

Name the project what you like (in the above example, I named it tutorial), and store it in a directory that you can easily find in the future. You can check that the current working directory is the one that you specified as the new “home” for your project, using getwd().

Creating an R Project will generate a new directory dedicated to your project as well as an associated .RProj file associated with your project. Now, whenever you quit RStudio and later want to get back to the same project, you can just double-click the .RProj file to resume from where you left off.

Libraries

Before we start the rest of the tutorial, we will need to install a very important library to R. Libraries (or packages) contain functions and datasets that are not part of the “base” version of R, but are nonetheless very helpful for us when analyzing data.

The tidyverse

We will load the package tidyverse, which is one of the most important libraries when working with data in R. First, let’s install the package using the install.packages() function.

install.packages("tidyverse")

Once the library is installed, we can load it using the library() function.

library(tidyverse)
#> ── Attaching core tidyverse packages ───────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.0     ✔ stringr   1.5.1
#> ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
#> ✔ purrr     1.0.2     
#> ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

The messages shown says that loading tidyverse loads nine packages: dplyr, forcats, ggplot2, lubridate, purrr, readr, stringr, tibble, tidyr. These are considered the core of the tidyverse because you’ll use them in almost every analysis.

Most of what we will cover in this tutorial uses functions from the various packages that are loaded within tidyverse.

Next: Working with Data

Now that we have a sense of how R code works, let’s start talking about what we can do in R for data analysis and visualization.

There are four main tools in data analytics:

  1. Importing data
  2. Tidying data
  3. Transforming data
  4. Visualizing data

In each of the next four tutorials, we’ll cover each of these tools using R. We’ll first discuss data transformation, as that is the central component to understanding your data. We will then talk about how to get the data to the right “state” for the transformation step, which are importing and tidying. Lastly, we will learn how to visualize our final datasets. In this last step, we will discuss visualization both within R as well as with Tableau.