Getting started
1 Assessment
This session is assessed using MCQs (questions highlighted below). The actual MCQs can be found on the BS1040 Blackboard site under Assessments and Feedback/Data analysis MCQs. The deadline is listed here and on the front page of the BS1040 blackboard site. This assessment contributes 2.5% of module marks. You will receive feedback on this assessment after the submission deadline.
2 Getting R and Rstudio onto your computer
This of course depends on what type of computer you have.
2.1 Mac
2.1.1 Install the latest version of R.
- On a web browser open up https://cran.r-project.org/bin/macosx/
- Click on the latest binary (R 4.0.2.pkg, when this was written). You can find it as the blue writing on the left hand side
- Follow the installation instructions
2.1.2 Install the latest version of Rstudio.
- Click on the Mac installer for Rstudio Desktop
2.2 Windows
2.2.1 Install the latest version of R.
- On a web browser open up https://cran.r-project.org/bin/windows/base/
- Click on the ‘Download R 4.0.2 for Windows’ link (as of time of writing)
- The distribution is distributed as an installer R-4.0.2-win.exe. Just run this for a Windows-style installer.
2.2.2 Install the latest version of Rstudio.
- Click on the Windows installer for Rstudio Desktop
2.3 Linux
- R is part of many Linux distributions, you should check with your Linux package management system.
- If you don’t have it, have a look here https://cloud.r-project.org/ for your distribution.
2.3.1 Install the latest version of Rstudio.
- Click on your distribution’s installer for Rstudio Desktop
2.4 Chromebook/Android Tablet/iPad (Actually any computer)
A chromebook doesn’t really allow you to download software. It wants you to work on the cloud. So lets do that.
- On a web browser open in a new window
- Log in or sign up
- Create a new project (Blue New Project button)
- On the cloud you’ll first have to upload (bottom right pane) the fish data before you import it (see below).
2.5 University PCs (for when you want to work in the computer rooms)
2.5.1 Install the latest version of R.
- You can install R using the Software center (click on the windows icon and look at the second column) on CFS machines
- choose R and click install
- R should now be in your program list.
2.5.2 Install the latest version of Rstudio.
- Same as above, but you might need to search for it in the Software center. It wasn’t on the front page when I looked.
- choose Rstudio and click install
- Rstudio should now be in your program list. Open it.
3 R, Rstudio what?
R is the programming language we will be using during this course. Rstudio is an integrated development environment (IDE) for R. They are separate programs, but we will use R through Rstudio. This is an easier and all round better way of using R. You can use R by itself but everything is harder. The console on the bottom left in Rstudio is what R looks like by itself. You need to install both R and Rstudio.
4 Getting the data into R
There are lots of ways of getting data into R. Its one of the most annoying things about it as a beginner. During most of the sessions, I’m going to try and use built-in datasets. But for somethings in the course and for your own data you are going to need to know how to get data in R. Most people will have their data intially on an excel sheet (.xlsx). But a .csv (comma separated values) is a simpler and more useful format to keep data in. So the first thing you’ll usually have to do when you are doing a data analysis in the wild is to convert .xlsx to .csv. Full instructions here. BUT you don’t need to do this during this course as any external data I give you will already be in .csv form.
- Look at the right hand top window in Rstudio. See the Import Dataset. Use this to import the data as textfile or From Text (base) in newer versions. Make sure that the heading option is on.
- Notice what you really did was displayed in the console.
- That means if you typed that into the console you would get the same effect (with your filepath not mine).
- Have a look at the data it should have 235 observations of 10 variables.
5 Some R basics
5.1 Using R as a ridculously overpowered calculator
In the console window, type some mathematical operations. I’ll get you started.
[1] 6
[1] -2.449294e-16
Try adding subtracting and dividing. What does log do? Hint: not what you think.
5.2 Creating variables
An important skill you need is creating variables. At its simplest, I want x to be equal to 2.
Whats with <-? <- is called the assignment operator. Why don’t we use =? Short answer, thats just the way R is. Long answer, if you’re interested.
Bit more complicated, make y equal 0,2,4,6,8,10
You just used a function. Its called seq. Want to find out about it? Use your googlefu skills. More and more doing the practicals, we won’t give you the answers. Instead, we’ll expect you to find them yourself. Why? Because thats how everyone does analysis. Once you figure that out, you’ll realise that with the basics we are teaching you and the ability to look things up, no analysis or complicated data visualization is beyond your abilities.
Task: Calculate z which is the product of x and y. (Hint: In mathematics, a product is the result of multiplying)
Blackboard MCQ: z is a vector with six values. What is the highest value?
5.4 R packages
What we have been using so far is called base R. Its the stuff that comes working out of the box with R. You can do an amazing amount with this. But the capabilities of R have being extended hugely over the years with packages. The below is from Hadley Wickham’s R package book:
In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. As of January 2015, there were over 6,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearing house for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package.
- You install them from CRAN with install.packages(“x”).
- You use them in R with library(“x”).
- You get help on them with package?x and help(package = “x”).
So lets use a package called skimr. The install.packages command only has to be used the first time you use a package (from then on its on your computer). After that the library command turns it on.
install.packages("skimr") #You will only have to do this once on your computer.
library("skimr") #You will only have to do this once per session. It 'turns on' the package
skim(iris) #skim is a command in r. iris is a built-in dataset we are going to use.
Name | iris |
Number of rows | 150 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 0 | 1 | 5.84 | 0.83 | 4.3 | 5.1 | 5.80 | 6.4 | 7.9 | ▆▇▇▅▂ |
Sepal.Width | 0 | 1 | 3.06 | 0.44 | 2.0 | 2.8 | 3.00 | 3.3 | 4.4 | ▁▆▇▂▁ |
Petal.Length | 0 | 1 | 3.76 | 1.77 | 1.0 | 1.6 | 4.35 | 5.1 | 6.9 | ▇▁▆▇▂ |
Petal.Width | 0 | 1 | 1.20 | 0.76 | 0.1 | 0.3 | 1.30 | 1.8 | 2.5 | ▇▁▇▅▃ |
I haven’t run this package on all possible computers. If you are having problems with it, we could get similar information using base R with a command unsurprisingly called summary.
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
Task: Perform the skim (or summary if there is a problem) command on your fish.data (the data you loaded from a csv file in Section 4)
Blackboard MCQ: Whats the mean value of Standard.Length? (clue:The mean value for Sepal.Length was 5.84 from skim and 5.843 from summary)
5.3 Comments
Comments are remarks in a program that is intended to help human readers understand what is going on, but are ignored by the computer. Comments in R start with a # character and run to the end of the line. Why do we use comments? To remind ourselves or to tell our collaborators what a line of code was meant to do. Remember the adage, we write comments for the next idiot to read the code, because it will probably be us.