This session is assessed using MCQs (questions highlighted below). The actual MCQs can be found on the BS1040 Blackboard site under Assessments and Feedback/Data analysis MCQs. The deadline is listed here and on the front page of the BS1040 blackboard site. This assessment contributes 2.5% of module marks. You will receive feedback on this assessment after the submission deadline.
This of course depends on what type of computer you have.
A chromebook doesn’t really allow you to download software. It wants you to work on the cloud. So lets do that.
R is the programming language we will be using during this course. Rstudio is an integrated development environment (IDE) for R. They are separate programs, but we will use R through Rstudio. This is an easier and all round better way of using R. You can use R by itself but everything is harder. The console on the bottom left in Rstudio is what R looks like by itself. You need to install both R and Rstudio.
There are lots of ways of getting data into R. Its one of the most annoying things about it as a beginner. During most of the sessions, I’m going to try and use built-in datasets. But for somethings in the course and for your own data you are going to need to know how to get data in R. Most people will have their data intially on an excel sheet (.xlsx). But a .csv (comma separated values) is a simpler and more useful format to keep data in. So the first thing you’ll usually have to do when you are doing a data analysis in the wild is to convert .xlsx to .csv. Full instructions here. BUT you don’t need to do this during this course as any external data I give you will already be in .csv form.
In the console window, type some mathematical operations. I’ll get you started.
[1] 6
[1] -2.449294e-16
Try adding subtracting and dividing. What does log do? Hint: not what you think.
Blackboard MCQ 1: In your calculator section, R prints sin(2*pi) as -2.449294e-16. The best interpretation is:
An important skill you need is creating variables. At its simplest, I want x to be equal to 2.
Whats with <-? <- is called the assignment operator. Why don’t we use =? Short answer, thats just the way R is. Long answer.
Blackboard MCQ 2: Why do we use <- for assignment in this course? Select the best reason.
Bit more complicated, make y equal 0,2,4,6,8,10
You just used a function. Its called seq. Want to find out about it? Use your googlefu skills. More and more doing the practicals, we won’t give you the answers. Instead, we’ll expect you to find them yourself. Why? Because thats how everyone does analysis. Once you figure that out, you’ll realise that with the basics we are teaching you and the ability to look things up, no analysis or complicated data visualization is beyond your abilities.
Task: Calculate z which is the product of x and y. (Hint: In mathematics, a product is the result of multiplying)
Blackboard MCQ 3: z is a vector with six values. What is the highest value?
What we have been using so far is called base R. Its the stuff that comes working out of the box with R. You can do an amazing amount with this. But the capabilities of R have being extended hugely over the years with packages. The below is from Hadley Wickham’s R package book:
In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. As of 2nd October 2025, there were 22,818 packages available on the Comprehensive R Archive Network, or CRAN, the public clearing house for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package.
So lets use a package called skimr. The install.packages command only has to be used the first time you use a package (from then on its on your computer). After that the library command turns it on.
install.packages("skimr") #You will only have to do this once on your computer.
library("skimr") #You will only have to do this once per session. It 'turns on' the package
skim(iris) #skim is a command in r. iris is a built-in dataset we are going to use.| Name | iris |
| Number of rows | 150 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| factor | 1 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Sepal.Length | 0 | 1 | 5.84 | 0.83 | 4.3 | 5.1 | 5.80 | 6.4 | 7.9 | ▆▇▇▅▂ |
| Sepal.Width | 0 | 1 | 3.06 | 0.44 | 2.0 | 2.8 | 3.00 | 3.3 | 4.4 | ▁▆▇▂▁ |
| Petal.Length | 0 | 1 | 3.76 | 1.77 | 1.0 | 1.6 | 4.35 | 5.1 | 6.9 | ▇▁▆▇▂ |
| Petal.Width | 0 | 1 | 1.20 | 0.76 | 0.1 | 0.3 | 1.30 | 1.8 | 2.5 | ▇▁▇▅▃ |
I haven’t run this package on all possible computers. If you are having problems with it, we could get similar information using base R with a command unsurprisingly called summary.
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
Task: Perform the skim (or summary if there is a problem) command on your fish.data (the data you loaded from a csv file in Section 4)
Blackboard MCQ 4: Whats the mean value of Standard.Length? (clue:The mean value for Sepal.Length was 5.84 from skim and 5.843 from summary)
5.3 Comments
Comments are remarks in a program that is intended to help human readers understand what is going on, but are ignored by the computer. Comments in R start with a # character and run to the end of the line. Why do we use comments? To remind ourselves or to tell our collaborators what a line of code was meant to do. Remember the adage, we write comments for the next idiot to read the code, because it will probably be us.