0. Prerequisites

Before you get started, make sure you have R and RStudio installed on your machine. If you don’t have these installed on your machine, the following are instructions for you to install this software.

0.1 Installing R

R is a powerful, open-source statistical programming language that anyone can download for free! You can simply go to CRAN’s website and install R by following these steps:

0.1.1 Under the ‘Download’ heading, select ‘CRAN’

0.1.2 Install a CRAN Mirror

Install the CRAN mirror that’s nearest to your geographic location. For example, if you live in Orange County, California, you should install the UCLA CRAN Mirror.

0.1.3 Install your Machine’s Version of R

0.1.3.1 R for Windows

Install R for the first time.

After you click this link, follow the instructions given in the installation.

0.1.3.2 R for (Mac) OS X

After you click this link, follow the instructions given in the installation.

0.1.3.3 R for Linux

Install the R version relevant to your Linux server, and follow the instructions given in the installation.

0.2 Installing RStudio

RStudio is an open-source professional software that makes R much easier to use. Download the free, open-source license version from RStudio’s website. The installation steps are very similar to those of R’s for all operating systems.

1. Importing and Exporting Excel Files in R

The open-source community has developed a few smooth and convenient ways for you to upload and download Excel files in R.

1.1 Importing Excel Files to R

Install the readxl package into RStudio by typing

install.packages("readxl")

into the Console.

After the package is finished installing, call the package by typing

library(readxl)

into the Console. You can now upload your Excel file into R (and assign it to the variable data) by typing

data <- read_excel("D:/path/file.xlsx")

into the Console.

Another way you can import an Excel file into R after calling the readxl library is to, in RStudio, go to File > Import Dataset > From Excel, and browse for your Excel file.

After you’re satisfied with how your data is being imported, select Import on the bottom right of the window, and your Excel data will be imported into R.

1.2 Exporting Excel Files from R

Install the openxlsx package into RStudio by typing

install.packages("openxlsx")

into the Console.

After the package is finished installing, call the package by typing

library(openxlsx)

into the Console. You can now download your data (assigned to the variable data here) to a custom file name by typing

write.xlsx(data, file = "D:/path/your_file_name.xlsx")

into the Console.

2. How to Transition From Excel to R

2.1 Navigating RStudio

We will be working out of RStudio, a powerful IDE (Integrated Development Environment) for R.

2.2 Data Structures and Variables in R

Try to get comfortable with the different types of data structures in R.

Excel spreadsheets are typically recognized as dataframes in R, which are composed of multiple lists.

2.2.1 Creating Your First Dataframe

Start by creating some lists and assigning them to variables using <- (not =, as this may confuse R in some function calls):

teams <- c("Angels","Dodgers","Cubs") # c() command stands for combine; use whenever creating a list of objects in R
wins <- c(70,56,64)
ca_team <- c(TRUE, TRUE, FALSE)

baseball <- data.frame(teams,wins,ca_team) # combine lists into dataframe

baseball # view the dataframe

Now you have a dataframe with 3 columns, or lists (string, numeric, and boolean).

2.3 Working with Dataframes

2.3.1 Looking Up Number of Observations

Using the iris dataset that comes with R, and the nrow() function, you can find the number of observations or rows:

# View(iris) # use this command to view the iris dataset
nrow(iris) # displays the number of rows in the iris dataset

## [1] 150

Similarly, you can also find the number of columns:

ncol(iris) # displays the number of columns in the iris dataset

## [1] 5

To find the number of rows and columns at the same time, use dim():

dim(iris)

## [1] 150   5

2.3.2 Focusing on the First/Last X Rows

head(iris, 10) # displays the first 10 rows of the iris dataset

tail(iris, 10) # displays the last 10 rows of the iris dataset

2.3.3 Looking at the Structure of a Dataframe

str(iris) # investigate the structure of the iris dataframe

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

2.3.4 Focusing on Certain Columns, Rows, or Cells

Sometimes you may want to look at a single cell, column, or row. To do so, add brackets to the dataframe in the console, like:

iris[1,3] # look at the value in the 1st row, 3rd column

iris[1,] # look at all of the values in the 1st row; don't forget the comma!

iris[,3] # look at all of the values in the 3rd column; iris[3] also works

iris[1:3,] # look at all of the values in rows 1 through 3; ':' can also be applied to selecting columns

A helpful phrase to remember is “Data RoCks!” As in, Dataframe[Rownumber,Columnnumber].

Most of the time, it’s difficult to reference columns by number - especially when there’s so many columns. Here are alternative ways to select one or multiple columns:

iris$Sepal.Length # selecting the one Sepal.Length column from the iris dataset

iris$Sepal.Length[3] # selecting the 3rd value in the Sepal.Length column, or the 3rd row value in the Sepal.Length column

iris[c("Sepal.Length","Sepal.Width")] # selecting multiple columns (Sepal.Length, Sepal.Width) from the iris dataset

2.3.5 Changing the Format of a Column

Say you imported a dataset from Excel with $ or % signs appended to some numeric values. If you didn’t specify these values to be numeric when you imported your dataset, R will likely interpret those values to be strings instead of currency or percentages. This means that R will not be able to perform calculations with these values.

However, in the R console, you can perform the following short commands to remove $ or % signs and convert the string values to numeric values that can then be used in calculations:

# delete the dollar signs in a currency column in the dataset
dataset$currency <- gsub("\\$", "", dataset$currency)
# change the format of the currency column to numeric
dataset$currency <- as.numeric(dataset$currency)

# similarly, with percentages, delete the percent signs in a percentages column in the dataset
dataset$percentages <- gsub("\\%", "", dataset$percentages)
# change the format of the percentages column to numeric
dataset$percentages <- as.numeric(dataset$percentages)

2.3.6 Sorting a Dataframe Based on One Column

Back to the iris dataset:

iris_sort_lg <- iris[order(iris$Sepal.Length),] # order the iris dataset from least to greatest value of Sepal.Length
iris_sort_lg

iris_sort_gl <- iris[order(-iris$Sepal.Length),] # order the iris dataset from greatest to least value of Sepal.Length
iris_sort_gl

2.3.7 Filtering Data by Column Content

# Create a new dataset called 'iris2' that only contains Sepal.Length > 6.7
iris2 <- subset(iris, Sepal.Length > 6.7)
iris2

2.3.8 Calculating Quantities with Columns

# Adding 2 columns and assigning values to new column, Sepal_Sum
iris$Sepal_Sum <- iris$Sepal.Length + iris$Sepal.Width

# Subtracting 2 columns and assigning values to new column, Sepal_Diff
iris$Sepal_Diff <- iris$Sepal.Length - iris$Sepal.Width

head(iris)

2.3.9 Calculations on Columns

# Adding the values of the Sepal.Length column and assigning the value to sepal_sum
sepal_sum <- sum(iris$Sepal.Length)
sepal_sum

## [1] 876.5

# Finding the average of the values of the Sepal.Length column and assigning the value to sepal_avg
sepal_avg <- mean(iris$Sepal.Length)
sepal_avg

## [1] 5.843333

# Finding the median of the values of the Sepal.Length column and assigning the value to sepal_median
sepal_median <- median(iris$Sepal.Length)
sepal_median

## [1] 5.8

2.4 File Names and Version Control

2.4.1 File Names

As in Excel spreadsheets, the file names of R scripts are important, as they need to be discernable by both users and computers.

Here are some bad examples of file names:

myscript.R
Joe's Filenames Use Spaces and Punctuation;"'.R
figure 1.R
fig 2.R
JW7d^(2sl@pleasedontdeletethisWx2*.R

and here are some good examples of file names:

2010-07-17_disneyland-55th-bday.R
joes-filenames-are-getting-better.R
fig01_scatterplot-length-vs-interest.R
fig02_histogram-attendance.R
2018-06-18_predictive-modeling-contest-week1.R

The following sections detail why the above examples are categorized as good and bad file names.

2.4.1.1 Three Principles of Good File Names

1. Machine readable

Avoid spaces, punctuation, accented characters, and case sensitivity to achieve optimal machine readability. Make your file names easy to compute on with a deliberate use of delimiters.

The deliberate use of _ and - allows you to recover metadata from the file names.

For example, for the following filenames,

2018-07-05_Q2ANALYSIS_GMWB-Mix_T01.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T02.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T03.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_all.R
...
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_all.R

you can create a list of file names only containing “Non-GMWB-Mix” in R by typing the following into the Console.

# Make sure you have the `dplyr` package installed first!
flist <- list.files(pattern = "Non-GMWB-Mix") %>% head

If you also type

# Make sure you have the `stringr` package installed first!
stringr::str_split_fixed(flist,"[_\\.]",4)

into the Console, the date, quarter analysis, analysis type, and output type will be displayed separately, creating a nicely organized metadata table.

The underscore _ is used to delimit units of metadata you want to retrieve later, and the hyphen - is used to delimit separate words within these units of metadata.

By making your file names machine readable, you make it easy to:

search for files later,
narrow file lists based on names, and
extract information from file names (e.g. by splitting).

2. Human readable

Include information on content in your file names to make them human readable. This connects with the concept of slugs from semantic URLs.

For example,

2018-07-05_Q2ANALYSIS_GMWB-Mix_T01.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T02.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T03.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_all.R
...
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_all.R

these file names “embrace the slug” by having each of the subanalyses and subtypes in the Q2 Analysis labeled at the end of each file name.

“Embracing the slug” allows you to more easily figure out which R files you’re looking at or trying to find.

3. Plays well with default ordering

To play well with default ordering, you need to:

put something numeric first,
use the ISO 8601 standard for dates (YYYY-MM-DD), and
left-pad other numbers with zeros.

Our example file names

2018-07-05_Q2ANALYSIS_GMWB-Mix_T01.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T02.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T03.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_all.R
...
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_all.R

put something numeric first (the date, which also fits the ISO 8601 standard), so they’re organized in chronological order. Also, the version numbers at the end of each file name are left-padded with zeros so when double-digit versions are created, the later versions won’t be ordered before the single-digit versions.

2.4.2 Version Control

An example of a series of good file names for version control is:

2018-07-05_Q2ANALYSIS_GMWB-Mix_T01.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T02.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_T03.R
2018-07-05_Q2ANALYSIS_GMWB-Mix_all.R
...
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_T02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P01.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_P02.R
2018-07-18_Q2ANALYSIS_Non-GMWB-Mix_all.R

Note how these file names not only follow all of the conventions for good file names, but they also indicate the type of script (with T for table(s) and P for plot(s)) and version number at the end.

3. Common Excel Functions in R

To find more information about how to use each of these R functions, type ?function (e.g. ?sum) in the RStudio Console, and you’ll be provided with comprehensive and clear R documentation. If you need even more information about an R function, type “how to use ___ function in R” or something similar into Google.

Excel Formula	R Function	Type
ABS	`abs`	Arithmetic
ADDRESS	`assign`	Essentials
AND	`&`,`&&`,`all`	Boolean
AVERAGE, AVG, AVERAGEIF	`mean`	Arithmetic
BETADIST	`pbeta`	Statistics
BETAINV	`qbeta`	Statistics
BINOMDIST	`pbinom` when cumulative,`dbinom` when not	Statistics
CEILING	`ceiling`	Arithmetic
CELL	`str` has the same idea	Essentials
CHIDIST, CHISQDIST	`pchisq`	Statistics
CHIINV, CHISQINV	`qchisq`	Statistics
CHITEST	`chisq.test`	Statistics
CHOOSE	`switch`	Essentials
CLEAN	`gsub`	Text
COLS, COLUMNS	`ncol`	Essentials
COLUMN	`col`,`:`,`seq`	Essentials
COMBIN	`choose`	Essentals
CONCATENATE	`paste`	Text
CONFIDENCE	`-qnorm(alpha/2)*std/sqrt(n)`	Statistics
CORREL	`cor`	Statistics
COUNT, COUNTIF	`length`	Arithmetic
COVAR	`cov`	Statistics
CRITBINOM	`qbinom`	Statistics
DELTA	`identical`	Boolean
EXACT	`==`	Boolean
EXP	`exp`	Arithmetic
EXPONDIST	`pexp` when cumulative,`dexp` when not	Statistics
FACT	`factorial`	Arithmetic
FACTDOUBLE	`dfactorial` in the `phangorn` package	Arithmetic
FDIST	`pf`	Statistics
FIND	`regexpr`,`grepl`,`grep`	Text
FINV	`qf`	Statistics
FISHER	`atanh`	Arithmetic
FISHERINV	`tanh`	Arithmetic
FIXED	`format`,`sprintf`,`formatC`	Essentials
FLOOR	`floor`	Arithmetic
FORECAST	`predict` on an `lm` object	Statistics
FREQUENCY	`cut`,`table`	Arithmetic
FTEST	`var.test`	Statistics
GAMMADIST	`pgamma` if last argument T,`dgamma` if last arg. F	Statistics
GAMMAINV	`qgamma`	Statistics
GAMMALN	`lgamma`	Statistics
GAUSS	`pnorm(x) - 0.5`	Statistics
GCD	`gcd`	Arithmetic
GEOMEAN	`exp(mean(log(x)))`	Arithmetic
GESTEP	`>=`	Boolean
HARMEAN	`harmonic.mean` in the `psych` package	Arithmetic
HLOOKUP	`match`,`merge`	Essentials
HYPGEOMDIST	`dhyper`	Statistics
IF	`if`,`ifelse`	Essentials
IFERROR	`try`,`tryCatch`	Essentials
INDEX	`x[y,z]`	Essentials
INDIRECT	`get`	Essentials
INT	`as.integer`(not for negative numbers),`floor`	Arithmetic
INTERCEPT	first element of `coef` of an `lm` object	Statistics
ISLOGICAL	`is.logical`	Boolean
ISNA	`is.na`	Boolean
ISNUMBER	`is.numeric`	Boolean
ISTEXT	`is.character`	Boolean
KURT	`kurtosis` in the `moments` package	Statistics
LARGE	`sort`	Statistics
LCM	`scm` in the `schoolmath` package	Arithmetic
LEFT	`substr`	Text
LEN, LENGTH	`nchar`	Text
LINEST	`lm`	Statistics
LN, LOG	`log`	Arithmetic
LOG10	`log10`	Arithmetic
LOGINV	`qlnorm`	Statistics
LOGNORMDIST	`plnorm`	Statistics
LOWER	`tolower`	Text
MATCH	`match`,`which`	Essentials
MAX	`max` (sometimes `pmax`)	Arithmetic
MDETERM	`det`	Arithmetic
MEDIAN	`median`	Arithmetic
MID	`substr`	Text
MIN	`min` (sometimes `pmin`)	Arithmetic
MINVERSE	`solve`	Arithmetic
MMULT	`%*%`	Arithmetic
MOD	`%%`	Arithmetic
MODE	`as.numeric(names(which.max(table(x))))`	Arithmetic
MUNIT	`diag`	Arithmetic
N	`as.numeric`	Arithmetic
NEGBINOMDIST	`dnbinom`	Statistics
NORMDIST, NORMSDIST	`pnorm` when cumulative,`dnorm` when not	Statistics
NORMINV, NORMSINV	`qnorm`	Statistics
NOT	`!`	Boolean
NOW	`date`,`Sys.time`	Essentials
OR	`\|`,`\|\|`,`any`	Boolean
PEARSON	`cor`	Statistics
PERCENTILE	`quantile`	Statistics
PERCENTRANK	`ecdf`	Statistics
PERMUT	`function(n,k) {choose(n,k)*factorial(k)}`	Arithmetic
PERMUTATIONA	`n^k`	Arithmetic
PHI	`dnorm`	Statistics
POISSON	`ppois` when cumulatic,`dpois` when not	Statistics
POWER	`^`	Arithmetic
PROB	`ecdf`	Statistics
PRODUCT	`prod`	Arithmetic
PROPER	`toupper`	Text
QUARTILE	`quantile`	Arithmetic
QUOTIENT	`%/%`	Arithmetic
RAND	`runif`	Arithmetic
RANDBETWEEN	`sample`	Arithmetic
RANK	`rank`	Essentials
REPLACE	`sub`,`gsub`	Text
REPT	`rep` and `paste` or `paste0`	Text
RIGHT	`substring`	Text
ROUND	`round`	Arithmetic
ROUNDDOWN	`floor`	Arithmetic
ROUNDUP	`ceiling`	Arithmetic
ROW	`row`,`:`,`seq`	Essentials
ROWS	`nrow`	Essentials
RSQ	`summary` of `lm` object	Statistics
SEARCH	`regexpr`,`grep`	Text
SIGN	`sign`	Arithmetic
SKEW	`skewness` in the `moments` package	Statistics
SLOPE	in `coef` of `lm` object	Statistics
SMALL	`sort`	Arithmetic
SQRT	`sqrt`	Arithmetic
STANDARDIZE	`scale`	Statitics
STD, STDEV	`sd`	Arithmetic
STEYX	`predict` on an `lm` object	Statistics
STRING	`format`,`sprintf`,`formatC`	Text
SUBSTITUTE	`sub`,`gsub`,`paste`	Essentials
SUM, SUMIF	`sum`	Arithmetic
SUMPRODUCT	`crossprod`	Arithmetic
TDIST	`pt`	Statistics
TEXT	`format`,`sprintf`,`formatC`	Text
TINV	`abs(qt(x/2,data))`	Statistics
TODAY	`Sys.Date`	Essentials
TRANSPOSE	`t`	Arithmetic
TREND	`fitted` of an `lm` object	Statistics
TRIM	`sub`	Essentials
TRIMMEAN	`mean(x,trim=tr/2)`	Arithmetic
TRUNC	`trunc`	Essentials
TTEST	`t.test`	Statistics
TYPE	`typeof`,`mode`,`class`	Essentials
UPPER	`toupper`	Text
VALUE	`as.numeric`	Arithmetic
VAR	`var`	Essentials
VLOOKUP	`match`,`merge`	Essentials
WEEKDAY	`weekdays`	Essentials
WEIBULL	`pweibull` when cumulative,`dweibull` when not	Statistics
ZTEST	`pnorm`	Statistics

4. Common Excel Plots in R

4.1 Distribution Plots

4.1.1 Density Plot

Using the iris dataset that’s pre-installed in R,

# If you don't have the ggplot2 library, be sure to
# install it by typing `install.packages("ggplot2")`
# into the Console.
library(ggplot2)  

ggplot(iris, aes(Petal.Width, colour=Species, fill=Species)) +
  geom_density(alpha=0.55)

4.1.2 Histogram

Using the iris dataset that’s pre-installed in R,

# If you don't have the ggplot2 library, be sure to
# install it by typing `install.packages("ggplot2")`
# into the Console.
library(ggplot2)

# Basic histogram
ggplot(iris, aes(x=Petal.Width)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Custom Bin Width
ggplot(iris, aes(x=Petal.Width)) + geom_histogram(binwidth = 0.05)

# Custom color
ggplot(iris, aes(x=Petal.Width)) + 
    geom_histogram(binwidth = 0.2, color="white",
                   fill=rgb(0.2,0.7,0.1,0.4))

4.1.3 Boxplot (Box and Whisker Plot)

Using the iris dataset that’s pre-installed in R,

# If you don't have the ggplot2 library, be sure to
# install it by typing `install.packages("ggplot2")`
# into the Console.
library(ggplot2)

ggplot(iris, aes(x=Species, y=Petal.Width)) + 
    geom_boxplot(fill="slateblue", alpha=0.2)

4.2 Correlation Plots

4.2.1 Scatterplot

Using the iris dataset that’s pre-installed in R,

# If you don't have the ggplot2 library, be sure to
# install it by typing `install.packages("ggplot2")`
# into the Console.
library(ggplot2)

# Basic scatterplot
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) + 
    geom_point()

# Basic scatterplot with regression line
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
    geom_smooth(method = 'lm', formula = y~x) +
    geom_point()

# Basic connected scatterplot
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) + 
    geom_line() +
    geom_point()

4.2.2 Heatmap

Using the mtcars dataset that’s pre-installed in R,

# converting `mtcars` from a dataframe to a matrix
data <- as.matrix(mtcars) 
# making the heatmap
heatmap(data, Colv = NA, Rowv = NA, scale="column")

4.2.3 Bubble

Using the iris dataset that’s pre-installed in R,

# If you don't have the ggplot2 library, be sure to
# install it by typing `install.packages("ggplot2")`
# into the Console.
library(ggplot2)

# Bubble Chart
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, 
                 size = Petal.Width, color = Species)) + 
    geom_point()

4.3 Ranking Plots

4.3.1 Barplot

Using the mtcars dataset that’s pre-installed in R,

# If you don't have the ggplot2 library, be sure to
# install it by typing `install.packages("ggplot2")`
# into the Console.
library(ggplot2)

# Vertical Barplot
ggplot(mtcars, aes(x=as.factor(cyl), fill=as.factor(cyl))) +
  geom_bar()

# Horizontal Barplot
ggplot(mtcars, aes(x=as.factor(cyl), fill=as.factor(cyl))) +
  geom_bar() + 
  coord_flip()

# Horizontal Barplot with Custom Bar Width
ggplot(mtcars, aes(x=as.factor(cyl), fill=as.factor(cyl))) +
  geom_bar(width=0.4) +
  coord_flip()

For stacked and grouped barplots:

# Create the data
set.seed(1234)
data=matrix(sample(1:30,15), nrow=3)
colnames(data)=c("A","B","C","D","E")
rownames(data)=c("var1","var2","var3")
head(data)

##       A  B  C  D  E
## var1  4 17  1 11 26
## var2 19 23  6 14 16
## var3 18 27 15 21  5

# Stacked barplot
barplot(data, col=colors()[c(23,89,12)], border="white",
        space=0.04, font.axis=2, xlab="group")

# Grouped barplot
barplot(data, col=colors()[c(23,89,12)], border="white",
        font.axis=2, beside=T, legend=rownames(data),
        xlab="group", font.lab=2)

4.4 “Part of a Whole” Plots

4.4.1 Treemap

Using the mtcars dataset that’s pre-installed in R,

# If you don't have the ggplot2 or treemapify libraries, 
# be sure to install them by typing 
# `install.packages("ggplot2")` and
# `install.packages("treemapify")
# into the Console.
library(ggplot2)
library(treemapify)

# creating a smaller copy of `mtcars`
cars <- head(mtcars)
# creating a new column (`carname`) in `cars`
cars$carname <- rownames(cars)
# making `cyl` in `cars` a factor
cars$cyl <- as.factor(cars$cyl)
# making the treemap
ggplot(cars, aes(area = disp, fill = cyl, label = carname)) +
  geom_treemap() +
  geom_treemap_text()

4.4.2 Pie Chart

# Create Data
Prop=c(3,7,9,1,2)
 
# Default Pie Chart
pie(Prop)

# Pie Chart with Custom Labels
pie(Prop, labels = c("Gr-A","Gr-B","Gr-C","Gr-D","Gr-E"))

4.5 Evolution/Time Series Plots

4.5.1 Area Chart

# If you don't have the ggplot2 library, be sure to
# install it by typing `install.packages("ggplot2")`
# into the Console.
library(ggplot2)

# Create data
data=data.frame(my_x=seq(1,10), my_size=sample(seq(1,20),10))
 
# Area Chart
ggplot(data, aes(x=my_x , y=my_size)) + 
  geom_area( fill="blue", alpha=.2) + 
  geom_line()

4.5.2 Stacked Area Chart

Using the iris dataset that’s pre-installed in R,

# If you don't have the ggplot2 or RColorBrewer libraries,
# be sure to install them by typing 
# `install.packages("ggplot2")` and
# `install.packages("RColorBrewer") into the Console.
library(ggplot2)
library(RColorBrewer)

# Creating data
set.seed(1234)
Sector <- rep(c("S01","S02","S03","S04",
                "S05","S06","S07"),times=7)
Year <- as.numeric(rep(c("1950","1960","1970",
                         "1980","1990","2000","2010"),each=7))
Value <- runif(49, 10, 100)
data <- data.frame(Sector,Year,Value)

# Stacked Area Chart
ggplot(data, aes(x=Year, y=Value, fill=Sector)) + 
    geom_area()

# Stacked Area Chart with Custom Color Palette
ggplot(data, aes(x=Year, y=Value, fill=Sector)) +
    geom_area(colour="black", size=.2, alpha=.4) +
    scale_fill_brewer(palette="Greens",
                      breaks=rev(levels(data$Sector)))

4.6 Interactive Plots

To make any of the above plots interactive, use the plotly package. Here is an example of how to use plotly:

# If you don't have the ggplot2 or plotly libraries, be sure to
# install them by typing `install.packages("ggplot2")` and 
# `install.packages("plotly")` into the Console.
library(ggplot2)
library(plotly)

# Basic scatterplot with interactivity
graph <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
    geom_smooth(method = 'lm', formula = y~x) +
    geom_point()
ggplotly(graph)

4.7 Other Plots

If you can’t find your desired plot in this manual, please refer to the R Graph Gallery.

5. Integrating R and Excel with BERT (Basic Excel R Toolkit)

BERT is a tool for connecting Excel and R on Windows machines. Specifically, BERT is designed to:

Support running R functions from Excel spreadsheet cells. In Excel terms, this is known as writing User-Defined Functions (UDFs) in R.

All you have to do is write the function. Everything else - loading the function into Excel, managing parameters, and handling type conversion - is done automatically for you.

Control Excel in real time, right from your R code and a convenient BERT console.
Call R functions from VBA.

5.1 Installing BERT

Since BERT is free and open-source (just like R), you can download it without any assistance from IT. Go to this download page and follow the instructions given for downloading the software.

5.2 Getting Started with BERT

Refer to BERT’s well-written documentation for help getting started with BERT.

5.2.1 BERT Example Functions

BERT has conveniently written some example functions for basic data management and analysis tasks.

5.2.2 Excel Scripting Interface in R

You can talk to Excel from R using the Excel scripting interface. Refer to BERT’s well-written documentation for help getting started.

Appendix

Updating R and RStudio

If the aforementioned packages and functions start to not work after an extended period of time, you may need to update your versions of R, R packages, and RStudio software to the latest versions.

Updating R

To update your version of R, first close any R or RStudio windows you have open.

Updating R on Windows

Open the R GUI (x64, not i386). This is not the same as RStudio. The R GUI program icon should look very similar to this:

Install the installr package into R by typing

install.packages("installr")

into the Console.

After the package is finished installing, call the package by typing

library(installr)

into the Console. You can now update your R software and packages to the latest versions by typing

updateR()

into the Console. Then R will walk you through a detailed and intuitive process of updating your R software and packages to the latest versions.

Updating R on (Mac) OS X

Open RStudio again, and type the following lines of code into the Console:

install.packages('devtools') #assuming it isn't already installed
library(devtools)
install_github('andreacirilloac/updateR')
library(updateR)
updateR(admin_password = "os_admin_user_password")

R will then walk you through a detailed and intuitive process of updating your R software and packages to the latest versions.

Updating R on Linux

This resource will walk you through how to update your R software and packages to the latest versions on Linux.

Updating R Packages Without Updating R

Updating out-of-date packages that were installed from CRAN (with install.packages()) is easy with the update.packages() function. Type this function into the RStudio Console.

update.packages()

After entering this function, it will ask you what packages you want to update. To update all packages at once, use ask = FALSE.

update.packages(ask = FALSE)

To update packages installed from devtools::install_github(), type the following function into your RStudio Console (I would also recommend saving this function in an R Script for later use):

update_github_pkgs <- function() {
  # check/load necessary packages
  # devtools package
  if (!("package:devtools" %in% search())) {
    tryCatch(require(devtools), error = function(x) {warning(x); cat("Cannot load devtools package \n")})
    on.exit(detach("package:devtools", unload=TRUE))
  }

  pkgs <- installed.packages(fields = "RemoteType")
  github_pkgs <- pkgs[pkgs[, "RemoteType"] %in% "github", "Package"]

  print(github_pkgs)
  lapply(github_pkgs, function(pac) {
    message("Updating ", pac, " from GitHub...")

    repo = packageDescription(pac, fields = "GithubRepo")
    username = packageDescription(pac, fields = "GithubUsername")

    install_github(repo = paste0(username, "/", repo))
  })
}

Then call the function.

update_github_pkgs()

Updating RStudio

To update RStudio, open RStudio and go to Help > Check for Updates to install the newest version.

References

Contact

If you have any comments or suggestions about this manual, please contact the author, Alyssa Columbus.

Please consider a donation!

I am a huge advocate of open source software (e.g. R) and open science, so a version of this manual will always be freely available! That said, updating this manual takes a bit of time, so if you like this book and want to see it get even better, consider donating at paypal.me/AlyssaColumbus. It really means a lot!

How To Use R With Excel

A Comprehensive Guide to Transitioning from Excel to R

A Comprehensive Guide to Transitioning from Excel to R

Alyssa Columbus

July 30, 2018

0. Prerequisites

0.1 Installing R

0.1.1 Under the ‘Download’ heading, select ‘CRAN’

0.1.2 Install a CRAN Mirror

0.1.3 Install your Machine’s Version of R

0.1.3.1 R for Windows

0.1.3.2 R for (Mac) OS X

0.1.3.3 R for Linux

0.2 Installing RStudio

1. Importing and Exporting Excel Files in R

1.1 Importing Excel Files to R

1.2 Exporting Excel Files from R

2. How to Transition From Excel to R

2.1 Navigating RStudio

2.2 Data Structures and Variables in R

2.2.1 Creating Your First Dataframe

2.3 Working with Dataframes

2.3.1 Looking Up Number of Observations

2.3.2 Focusing on the First/Last X Rows

2.3.3 Looking at the Structure of a Dataframe

2.3.4 Focusing on Certain Columns, Rows, or Cells

2.3.5 Changing the Format of a Column

2.3.6 Sorting a Dataframe Based on One Column

2.3.7 Filtering Data by Column Content

2.3.8 Calculating Quantities with Columns

2.3.9 Calculations on Columns

2.4 File Names and Version Control

2.4.1 File Names

2.4.1.1 Three Principles of Good File Names

2.4.2 Version Control

3. Common Excel Functions in R

4. Common Excel Plots in R

4.1 Distribution Plots

4.1.1 Density Plot

4.1.2 Histogram

4.1.3 Boxplot (Box and Whisker Plot)

4.2 Correlation Plots

4.2.1 Scatterplot

4.2.2 Heatmap

4.2.3 Bubble

4.3 Ranking Plots

4.3.1 Barplot

4.4 “Part of a Whole” Plots

4.4.1 Treemap

4.4.2 Pie Chart

4.5 Evolution/Time Series Plots

4.5.1 Area Chart

4.5.2 Stacked Area Chart

4.6 Interactive Plots

4.7 Other Plots

5. Integrating R and Excel with BERT (Basic Excel R Toolkit)

5.1 Installing BERT

5.2 Getting Started with BERT

5.2.1 BERT Example Functions

5.2.2 Excel Scripting Interface in R

Appendix

Updating R and RStudio

Updating R

Updating R on Windows

Updating R on (Mac) OS X

Updating R on Linux

Updating R Packages Without Updating R

Updating RStudio

References

Contact

Please consider a donation!