Downloading R and RStudio (for homework)
In this course we will use the statistical package R to compliment the material we cover in lectures.
All of these practicals will be done using RStudio which is an add-on environment for using the R statistical package.
The R package can be downloaded from R Download. This must be installed before you can use RStudio
The RStudio interface can be downloaded from RStudio Dowlonad. Choose the free option when you download, as this has all the features we will require in this module.
The R Notebook Environment
Viewing the notebook
This is an R Markdown Notebook. This means everything we write in this white space is written in the Markdown language.
To see how this notebook appears when we post it to a web browser, you can click the Preview tab above, and this will render the contents of this notebook in html.
This workbook can also be viewed in a web browser by clicking the Open in Browser tab which appear in the pop-up window.
Creating and running R code
To input R code into a workbook we need to create a code-chunk as shown below.
rr # Code chunk (# creates a comment in the code chunk)
You can type the characters at start and end of the code-chunk shown above to create one directly.
Alternatively, you can use the shortcut Ctrl+Alt+I to insert a code-chunk.
Otherwise, you can insert a code-chunk from the menu by clicking the Code tab above and selecting the option Insert Chunk option from that menu.
To execute a code-chunk, click the Run button (green triangle) within the chunk.
- Alternatively, placing your cursor inside the chunk and pressing Ctrl+Shift+Enter will also execute the code.
Example:
- Run the code chunk below. The file cars is a data-file prepackaged with R and plot is a command to plot the data in the data set.
rr plot(cars)
Creating and Importing Data Files
In many of the practicals we will obtain data files from various sources and use R to perform various tasks with this data.
We will also create our own data sets and import them into R
In most cases, the data files we create will be of .csv type (csv= comma separated values).
These files are easily created in spread-sheet packages like Excel, LibreOffice etc.
To import these files we use the embedded commands read.csv(file.choose())
Example:
The cars in a car-park were counted by make, with the following data collected
| Audi |
3 |
| BMW |
2 |
| Citroen |
5 |
| Ford |
8 |
| Hyundai |
9 |
| Opel |
6 |
| Toyota |
8 |
| VW |
6 |
Create a file to represent this data in Excel and save this file with the extension .csv
Import this data file as Data1 into R using the embedded commands read.csv(file.choose())
rr Data1 <-read.csv(file.choose())
In the code chunk above I have imported a data file I created myself, and saved the contents of this file as a data structure I call Data1.
To display this data structure simply run Data1 inside a code chunk as follows
rr Data1
The column titles Make and Number are named by myself in the data file I created.
To select the data from one of these columns we simply call Data1$ColumnName. For example to display all the car makes in the data file we run
rr Data1$Make
- Display the number of cars of each type, using the data structure you created.
Exercise 1
The closing price of shares in Glanbia PLC on the Irish Stock Exchange, from 28th August – 5th September, 2018 are shown in the table below.
| 5/9/2018 |
14.80 |
| 4/9/2018 |
14.83 |
| 3/9/2018 |
14.71 |
| 31/8/2018 |
14.53 |
| 30/8/2018 |
14.53 |
| 29/8/2018 |
14.50 |
| 28/8/2018 |
14.53 |
Create a .csv file for this data.
Import the data from this file into R.
Display the data from the individual columns of this data structure.
R Functions on Data Sets
The main aim of this course is to find the best way to represent the information in a data set. For that reason, there will be a certain amount of mathematical work involved in this course, which we will cover using simple examples and simple data sets during lectures.
While the mathematics involved is not particularly difficult, when we use real-world data sets (which tend become large data sets), then mathematics can become cumbersome and tedious, and practically impossible to complete by hand.
The R statistical package can do a huge amount of this work for us for these larger data sets, and during the practicals we will try to implement some of the material we learnt during lectures.
Some of the matematical functions we will be applying to data sets will be
- Mean (i.e. Average)
- Median
- Standard Deviation among others.
Example:
Create a data structure to represent the data set \[S=\{1,5,-32,1,1,4,33,6,-6,10,12,-15,22,3,3,-4,18,-19,2,-2,2,1\}\]
Using this data structure answer the following:
Find the mean of \(S\)
Find the median of \(S\)
Find the standard deviation of \(S^2\), where \(S^2\) means each element of \(S\) should be squared individually.
Find the mean of \(S^2+3S\)
Find the median of \(4S^2+S+4\)
To begin we create a data structre which we will name S as follows:
rr S<-c(1,5,-32,1,1,4,33,6,-6,10,12,-15,22,3,3,-4,18,-19,2,-2,2,1) S
- Typing S a second time forces R to display the data in S.
Note: It is crucial that a c is put before the parentheses of the data set, without it R will not interpret this as a data set.
- The mean is found using the function mean()
rr mean(S)
- So the average value of all the numbers in \(S\) is 2.090909.
- The meadian is found with median()
rr median(S)
- So the median value of all the numbers in \(S\) is 2. This means half the data in \(S\) is less than 2 and half is greater than 2.
- To find \(S^2\) we simply write
rr S^2
- We see that each individual value in \(S\) has been squared.
Exercise 2
The closing price of shares in
- The standard deviation is found using the function sd()
rr sd(S^2)
- The values of \(S^2+3S\) are given by
rr S^2+3*S
Note: Do not forget to use * when you want to apply multiplication. For example R does not know how to interpret 3S e.g.
rr # 3S
- The mean of \(S^2+3S\) is
rr mean(S^2+3*S)
- The median of \(4S^2+S+4\) is
rr median(4*S^2+S+4)
Exercise 3
The historical data of shares in Amazon.com, Inc. on the NASDAQ Stock Exchange, from 6th May 2018 - 6th September 2018, is shown in the table below. The data for this table is available at Amazon.com NASDAQ Price, and can be down loaded as a .csv file from Moodle->Data Visualisation->Data Files->Quotes(AMZN).csv.
rr AMZN<-read.csv(file.choose()) AMZN
- Extract the close, open, high and low data columns from this data structure (see Exercise 1) and answer the following:
Find the mean closing price of the shares over the three months, i.e. find the mean of close
Find the median difference between the opening and closing values, i.e. the mean of ( close - open )
Find the standard deviation of high
Find the mean, median and standard deviation of ( high - low )
Exercise 4
The historical data of shares in Red Hat, Inc. on the NASDAQ Stock Exchange is available at Red Hat NASDAQ Price. Downlowad the historical data for the past 18 months as a .csv file, from this website. Import this data into R as RHT and repeat the steps of Exercise 3 for this data structure.
