Getting started

1 Assessment

This session is assessed using MCQs (questions highlighted below). The actual MCQs can be found on the BS1040 Blackboard site under Assessments and Feedback/Data analysis MCQs. The deadline is listed here and on the front page of the BS1040 blackboard site. This assessment contributes 2.5% of module marks. You will receive feedback on this assessment after the submission deadline.

2 Getting R and Rstudio onto your computer

This of course depends on what type of computer you have.

2.1 Mac

2.1.1 Install the latest version of R.

  • On a web browser open up https://cran.r-project.org/bin/macosx/
  • Click on the latest binary (R 3.6.0.pkg, when this was written). You can find it as the yellow icon on the left hand side
  • Follow the installation instructions
  • Possible issue: To carry out the dfSummary stuff you need an x11 library to do the graphics. This is part of MacOsX that is sometimes already installed and sometimes not. When you get to the dfSummary try it, but if an error comes up mentioning X11 or XQuartz you will need to install it. Download it from here and follow the installation instructions.

2.1.2 Install the latest version of Rstudio.

2.2 Windows

2.2.1 Install the latest version of R.

  • On a web browser open up https://cran.r-project.org/bin/windows/base/
  • Click on the ‘Download R 3.6.0 for Windows’ link
  • The distribution is distributed as an installer R-3.6.0-win.exe. Just run this for a Windows-style installer.

2.2.2 Install the latest version of Rstudio.

2.3 Linux

  • R is part of many Linux distributions, you should check with your Linux package management system.
  • If you don’t have it, have a look here https://cloud.r-project.org/ for your distribution.

2.3.1 Install the latest version of Rstudio.

2.4 Chromebook/Tablet

A chromebook doesn’t really allow you to download software. It wants you to work on the cloud. So lets do that.

  • On a web browser open https://rstudio.cloud/
  • Log in or sign up
  • Create a new project (Blue New Project button)
  • On the cloud you’ll first have to upload (bottom right pane) the fish data before you import it (see below).

2.5 University PCs (for when you want to work in the computer rooms)

2.5.1 Install the latest version of R.

  • You can install R using the Software center (click on the windows icon and look at the second column) on CFS machines
  • choose R and click install
  • R should now be in your program list.

2.5.2 Install the latest version of Rstudio.

  • Same as above, but you might need to search for it in the Software center. It wasn’t on the front page when I looked.
  • choose Rstudio and click install
  • Rstudio should now be in your program list. Open it.

3 Getting the data into R

There are lots of ways of getting data into R. Its one of the most annoying things about it as a beginner. During most of the sessions, I’m going to try and use built-in datasets. But for somethings in the course and for your own data you are going to need to know how to get data in R. Most people will have their data intially on an excel sheet (.xlsx). But a .csv (comma separated values) is a simpler and more useful format to keep data in. So the first thing you’ll usually have to do when you are doing a data analysis in the wild is to convert .xlsx to .csv. Full intructions here. BUT you don’t need to do this during this course as any external data I give you will already be in .csv form.

  1. Look at the right hand top window in Rstudio. See the Import Dataset. Use this to import the data as textfile or From Text (base) in newer versions. Make sure that the heading option is on.
  2. Notice what you really did was displayed in the console.
  1. That means if you typed that into the console you would get the same effect (with your filepath not mine).
  2. Have a look at the data it should have 235 observations of 10 variables.

4 Some R basics

4.1 Using R as a ridculously overpowered calculator

In the console window, type some mathematical operations. I’ll get you started.

[1] 6
[1] -2.449294e-16

Try adding subtracting and dividing. What does log do? Hint: not what you think.

4.2 Creating variables

An important skill you need is creating variables. At its simplest, I want x to be equal to 2.

Whats with <-? <- is called the assignment operator. Why don’t we use =? Short answer, thats just the way R is. Long answer, if you’re interested.

Bit more complicated, make y equal 0,2,4,6,8,10

You just used a function. Its called seq. Want to find out about it? Use your googlefu skills. More and more doing the practicals, we won’t give you the answers. Instead, we’ll expect you to find them yourself. Why? Because thats how everyone does analysis. Once you figure that out, you’ll realise that with the basics we are teaching you and the ability to look things up, no analysis or complicated data visualization is beyond your abilities.

Task: Calculate z which is the product of x and y. (Hint: In mathematics, a product is the result of multiplying)

Blackboard MCQ: z is a vector with six values. What is the highest value?

4.4 R packages

What we have been using so far is called base R. Its the stuff that comes working out of the box with R. You can do an amazing amount with this. But the capabilities of R have being extended hugely over the years with packages. The below is from Hadley Wickham’s R package book:

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. As of January 2015, there were over 6,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearing house for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package.

  • You install them from CRAN with install.packages(“x”).
  • You use them in R with library(“x”).
  • You get help on them with package?x and help(package = “x”).

So lets use a package called summarytools. The install.packages command only has to be used the first time you use a package (from then on its on your computer). After that the library command turns it on.

Data Frame Summary  
iris  
Dimensions: 150 x 5  
Duplicates: 1  

----------------------------------------------------------------------------------------------------------------------
No   Variable        Stats / Values           Freqs (% of Valid)   Graph                            Valid    Missing  
---- --------------- ------------------------ -------------------- -------------------------------- -------- ---------
1    Sepal.Length    Mean (sd) : 5.8 (0.8)    35 distinct values     . . : :                        150      0        
     [numeric]       min < med < max:                                : : : :                        (100%)   (0%)     
                     4.3 < 5.8 < 7.9                                 : : : : :                                        
                     IQR (CV) : 1.3 (0.1)                            : : : : :                                        
                                                                   : : : : : : : :                                    

2    Sepal.Width     Mean (sd) : 3.1 (0.4)    23 distinct values           :                        150      0        
     [numeric]       min < med < max:                                      :                        (100%)   (0%)     
                     2 < 3 < 4.4                                         . :                                          
                     IQR (CV) : 0.5 (0.1)                              : : : :                                        
                                                                   . . : : : : : :                                    

3    Petal.Length    Mean (sd) : 3.8 (1.8)    43 distinct values   :                                150      0        
     [numeric]       min < med < max:                              :         . :                    (100%)   (0%)     
                     1 < 4.3 < 6.9                                 :         : : .                                    
                     IQR (CV) : 3.5 (0.5)                          : :       : : : .                                  
                                                                   : :   . : : : : : .                                

4    Petal.Width     Mean (sd) : 1.2 (0.8)    22 distinct values   :                                150      0        
     [numeric]       min < med < max:                              :                                (100%)   (0%)     
                     0.1 < 1.3 < 2.5                               :       . .   :                                    
                     IQR (CV) : 1.5 (0.6)                          :       : :   :   .                                
                                                                   : :   : : : . : : :                                

5    Species         1. setosa                50 (33.3%)           IIIIII                           150      0        
     [factor]        2. versicolor            50 (33.3%)           IIIIII                           (100%)   (0%)     
                     3. virginica             50 (33.3%)           IIIIII                                             
----------------------------------------------------------------------------------------------------------------------

Task: Perform the above summary command on your fish.data

Blackboard MCQ: Whats the mean value of Standard.Length? (clue:The mean value for Sepal.Length was 5.8)

4.5 Getting your session info

Task: Finally, as a test that you have R up and running on your computer, I’d like you to type the following command

Blackboard MCQ: What version of R do you have? For example, if you got R version 1.4.3 (2010-04-26), select 1 as the answer.

Eamonn Mallon

2020-01-17