class: center, middle, inverse, title-slide # Introduction to R and RStudio ## You will never look back⦠### Michael Hunt ### Eden Project Learning ### 2021-01-11 (updated: 2022-01-25) --- class: center, middle ### Have I got R and RStudio on my machine? You should already have R and RStudio installed on your laptop. If you have, then on your machine you will see these icons .pull-left[  __R__ __Never__ open this ] .pull-right[  __R Studio__ __Always__ open this ] R is the engine, RStudio is the 'IDE' - the window through which you use R. IDE = 'Integrated Development Environment' --- ### Why use R? * It is free * It is open source - hence benefits from a global community of expertise * It will always be available to you * It works on any operating system * Anything in statistics can be done in R. (And anything in GIS, and plotting is fantastic) * It aids reproducible research * Along with Python, it has become the de facto standard language for data analysis within the life sciences community. -- ### Why use RStudio? You don't have to but.... It makes using R _so_ much easier and more productive --- ## Let's play <img src="./figures/rs_icon.png" width="50" height="50" /> Open RStudio You should see something like this:  --- ### The Console window .pull-left[  ] .pull-right[ The pane on the left is the Console pane. This is the window where you can give instructions to R. These are worked on by R, then the answer will appear back in the console pane. ] --- ### The Environment Window .pull-left[ The top right pane has several tabs. We normally only look at the Environment tab. This shows all the things that R has in its head, such as datasets and other objects. It is probably empty right now, but will soon fill up. (Note though the important button: Import Dataset. I never use it, but you may wish to.) ] .pull-right[  ] --- ### The Plot (and other things) window <br/><br/> <br/><br/> <br/><br/> <br/><br/> .pull-left[ The pane at bottom right has several tabs which are fairly self-explanatory. We will discuss each of them in turn as they come up. As we go on, watch what appears in them. ] .pull-right[  ] --- ### Back to the console window .pull-left[  <br/><br/> The `[1]` in the answer is telling us that 2 is the first and in this case the only part of the answer. Some answers have more than one part. ] .pull-right[ When you start RStudio, the console pane gives you some information about R. The most useful thing by far however is the __<span style="color: darkred;">command prompt: ></span>__ This is where you enter instructions in the Console. Try it! Type `1+1` and then press enter/return ```r 1 + 1 ``` ``` ## [1] 2 ``` ] --- ### R can be used as a giant calculator R can do all the arithmetic that you can do on a normal calculator, and much (_much_) else besides. For example, it does BODMAS, trigonometry, logarithms, exponentials, raises numbers to powers and more. Try typing in these simple examples and see if you get what I get .pull-left[ ```r 2*4 ``` ``` ## [1] 8 ``` ```r 3/8 ``` ``` ## [1] 0.375 ``` ```r 11.75-4.83 ``` ``` ## [1] 6.92 ``` ] .pull-right[ ```r 10^2 ``` ``` ## [1] 100 ``` ```r log(10) ``` ``` ## [1] 2.302585 ``` ```r 7 < 10 ``` ``` ## [1] TRUE ``` ] --- ### R uses functions For example, the `seq()` function will return a sequence of numbers. ```r seq(from=0,to=10,by=1) ``` ``` ## [1] 0 1 2 3 4 5 6 7 8 9 10 ``` Most functions have __arguments__, enclosed by brackets/parentheses `()`. These are the bits of information that the function needs to give us the answer. Some functions have one argument, most have many. In `seq()`, the first argument tells `seq()` at which number to start the sequence, the second tells it where to end and the third tells it the size of the increment. What is the argument of the square root function being used below? ```r sqrt(25) ``` ``` ## [1] 5 ``` --- ### I'll never remember all this! Fear not! There is lots and lots and _lots_ of help available, not least within R itself. .pull-left[ For answers on what arguments a function needs, you can type `?<name of function>` into the console window. For example, if you want some help on the `seq()` function: you would type > \> ?seq Try it! For 'How do I....' type queries you will also almost always get a good answer if you type your exact question into Google. Look out for StackExchange answers. The R community is very large, and is growing. ] .pull-right[  ] --- ### Now for something Really Important! So far we have used functions to calculate answers, but we have not asked R to save those answers anywhere. This means that we cannot use them. To get around that, we can __assign__ the answer of a question to an __object__. We do this using the _assignment arrow_, which is a 'less than' sign followed by a left-pointing arrow. Try this: ```r a <- sqrt(9) ``` Now just type the letter `a` at the prompt. What do you see? -- ``` >a ``` ``` ## [1] 3 ``` We have created an object called `a` with the value 3. This object is now in R's brain and can be used in subsequent calculations. Try typing `2 * a` or `a^2` and so on. --- ### Now for something Really Important! (2) The objects we create are listed in the environment window. The object `a` now appears there. <img src="./figures/a_in_env_pane.png" width="200" height="200" /> -- Sometimes, we wish to 'clear the decks' and remove all objects from R's brain. Try this: ```r rm(list=ls()) ``` Poof! We __do not__ need* to start scripts with this command, but occasionally it can be useful as a way of starting afresh. *In fact, [Jenny Bryan](https://www.stat.ubc.ca/users/jennifer-bryan) would come into your office and set fire to your computer if you did, so maybe best not. --- ### R works with vectors Vectors are sequences of objects. The objects can be made up from anything and there is more than one way to set them up: For example: ```r seq1<-1:10 seq1 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r seq2<-10:1 seq2 ``` ``` ## [1] 10 9 8 7 6 5 4 3 2 1 ``` ```r seq3<-c(1,2,3,4,5,6,7,8,9,10) #c(x,y,z...) means combine x,y, and z into a single entity. seq3 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r seq4<-c("a","b","c") seq4 ``` ``` ## [1] "a" "b" "c" ``` --- ### Vectorised calculations We have used a compact way to create the first two sequences, `seq1` and `seq2`. ```r seq1 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r seq2 ``` ``` ## [1] 10 9 8 7 6 5 4 3 2 1 ``` Let us add `seq1` to `seq2`. -- ```r seq1+seq2 ``` ``` ## [1] 11 11 11 11 11 11 11 11 11 11 ``` -- Do you see how we can add the two vectors together? This is like doing lots of sums all at once. This is called vectorisation and can massively speed up calculations when you have a lot of data. This is a really cool and powerful feature of R. --- ### Our First Script If we are doing anything that we want to preserve and use again, we write our sequence of commands as a __script__, and save it. We do this in the script pane which we open by clicking on the two boxes at the top right of the Console pane: .pull-left[ <img src="./figures/open_script_pane.png" height="200" /> Our RStudio window now has four panes and looks something like this: ] -- .pull-right[ <img src="./figures/four_panes.png" height="300" /> The script window is top-left. ] --- ### Get organised We are going to write a notebook script to read in some data from a file. But first, let's get organised. In a sensible place on your laptop, or somewhere else you can easily get at again, create the following folder and sub-folders: .pull-left[ <img src="./figures/rstuff.png" height="400" /> ] .pull-right[ <br/><br/> <br/><br/> Use lower case for the subfolder names. <br/><br/> Guess what is going to go in them? <br/><br/> Let's start by saving a new script to the scripts folder. We have set up this folder structure in Teams/Files ] --- ### Make your R Stuff folder a project This obscure act will pay dividends, so I recommend you do it. Having set up your RStuff folder, make it a project like this: <img src="./figures/new_project1.png" height="200" /> <img src="./figures/new_project2.png" height="200" /> then navigate to your RStuff folder and press Create Project. This will make your RSTuff folder be the centre of your R world and isolate it from turbulence elsewhere in the R universe. A .Rproj file will appear inside it. --- ### So let's start a notebook: open a new R notebook file <img src="./figures/new_notebook.png" height="250" /> Save it into your scripts folder, using a suitable name. No spaces. You now have somewhere to write your instructions. The section between the two lines of three dashes is called `yaml`. It controls the way your file will be output, but we can ignore it for now. Delete everything beneath it. --- ### R Markdown Form here on, everything will be written in R Markdown. This enables us to mix human readable text with R code. There is help on how to do this in the Help menu. Actual R Code is included as a series of 'chunks', like this: ````r ```{r,a succinct label that indicates what this chunk does} library(tidyverse) ``` ```` The label is not necessary, but is good practice. In longer documents, the label can be useful for cross-referencing of Figures, and in general having one can help with debugging. In between the chunks we can have human readable text that we can format according to a few simples rules, all listed in the Help/Markdown Quick Reference guide. For more extensive guidance on R Markdown, see th [R Markdown cheat sheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) or the R Markdown cookbook[https://bookdown.org/yihui/rmarkdown-cookbook/]. --- ### Packages Much of the power of R derives from the use of 'packages' which add extra functions. If you have never used a package on your machine you first have to install it, for example like this for the `tidyverse` package. > \>`install.packages("tidyverse")` You only need to do this once, so we would type this line in the console window rather than in a script. Thereafter, to make a package available, you need to load it using the `library` function each time you want to use it. Thus, the line of code that does this _is_ included in your script. To load the `tidyverse` package we would have a code chunk like this: ````r ```{r} library(tidyverse) ``` ```` --- ### The `tidyverse` and `here` packages If we want to read some data from a file into R, R has to know where to look. The `here` package is incredibly useful for this. The `tidyverse` package is all-round plain awesome and useful. We will use it all the time. To load these packages into your script you need a chunk like this in your script: ````r ### Load Packages ```{r} library(tidyverse) library(here) ``` ```` The `### Load Packages` line is not part of the chunk. It is a header, a reminder to us of what this chunk does. It is a good idea to pepper your script with headers and explanatory text like this so that next time you come to read the script, you will know what every part of it is supposed to be doing. --- ### The data to be read into R Data for use in R is typically stored as a `.csv` file where that stands for **c**omma **s**eparated **v**alues. If you have an Excel file and want to use it in R, the simplest thing is to save it as a `.csv` file, although there are packages that enable Excel files and other formats to be read in directly. We are going to read in the `iris.csv` file which you should have in your `data` folder. Now that we have the `here` and `tidyverse` packages installed, we can read in the data and then inspect it like this: ````r ```{r read-in-data} filepath<-here("data","iris.csv") iris<-read_csv(filepath) glimpse(iris) ``` ```` Put the cursor on each of these lines and type control-Enter. See what has appeared in the Environment pane. --- ### Exploring data in R Our data has been read in as an object called `iris`. It is a type of object known as a `data frame` (or, by modern types, as a `tibble`). You can think of it as being like a spreadsheet, with columns of data. Each column is a `variable` Each row is an `observation` <img src="./figures/tidy-1.png" height="200" /> How many variables and how many observations do we have in our data set? Is that what you were expecting? We will see that R likes data to be 'tidy'. This means having one observation per row and one variable per column. Often, that is not the case for data collection sheets as we might write them in Excel. Hence one necessary step in data analysis is often to tidy the data. We can do this easily in R. --- ### Different ways to explore the data Now type each of these commands into your script. Implement each one in turn by putting the cursor on the new line and typing `Ctrl-Enter` or `Cmd-Enter` on a Mac. ```r str(mydata) ``` ```r head(mydata) ``` ```r summary(mydata) ``` What does each command tell you about the data? --- ### Plot the data (1) .pull-left[ We won't explore plotting in R in any detail today, but for kicks, lets just plot this data using the simplest way that R offers: As loaded, R has powerful plotting capability as used here. However, we are more likely to use the powerful `ggplot2` package for even better plots. Note the use of the $ symbol which enables us to pick out particular columns from a data frame. ] .pull-right[ ```r x<-iris$Sepal.Width y<-iris$Sepal.Length plot(x,y, xlab="Sepal Width", ylab="Sepal Length", col="red", pch=19, # choose the symbol ) ``` <img src="intro_to_R_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] --- ### Plot the data (2) For even more kicks we could plot this using the awesomely powerful plotting package `ggplot2` that is part of `tidyverse`. This is an example of what we could get: <img src="./figures/iris.png" height="220" /> ```r iris %>% ggplot(aes(x=Sepal.Width,y=Sepal.Length,colour=Species))+ geom_point()+ xlab("Sepal Width")+ ylab("Sepal Length")+ theme_bw() ``` `ggplot2`, along with `dplyr` which is used for data manipulation, are two of the most useful packages in R. We will next explore their use in some detail. --- ### And finally .pull-left[ Finally, our script might look something like this: There could be much more text between the code chunks. In fact there are many ways to elaborately format the knitted document. Knitted? Press the Knit button at the top and see what happens. In development, it is useful to run the chunks one by one by pressing the green arrows at the top-right of each chunk. ] .pull-right[ <img src="./figures/sample_script.png" height="500" /> ]