Downloading R and RStudio (for homework)
In this course we will use the statistical package R to compliment the material we cover in lectures.
All of these practicals will be done using RStudio which is an add-on environment for using the R statistical package.
The R package can be downloaded from R Download. This must be installed before you can use RStudio
The RStudio interface can be downloaded from RStudio Dowlonad. Choose the free option when you download, as this has all the features we will require in this module.
The R Notebook Environment
Viewing the notebook
This is an R Markdown Notebook. This means everything we write in this white space is written in the Markdown language.
To see how this notebook appears when we post it to a web browser, you can click the Preview tab above, and this will render the contents of this notebook in html.
This workbook can also be viewed in a web browser by clicking the Open in Browser tab which appear in the pop-up window.
Creating and running R code
To input R code into a workbook we need to create a code-chunk as shown below.
rr # Code chunk (# creates a comment in the code chunk)
You can type the characters at start and end of the code-chunk shown above to create one directly.
Alternatively, you can use the shortcut Ctrl+Alt+I to insert a code-chunk.
Otherwise, you can insert a code-chunk from the menu by clicking the Code tab above and selecting the option Insert Chunk option from that menu.
To execute a code-chunk, click the Run button (green triangle) within the chunk.
- Alternatively, placing your cursor inside the chunk and pressing Ctrl+Shift+Enter will also execute the code.
Example:
- Run the code chunk below. The file cars is a data-file prepackaged with R and plot is a command to plot the data in the data set.
rr plot(cars)
Creating and Importing Data Files
In many of the practicals we will obtain data files from various sources and use R to perform various tasks with this data.
We will also create our own data sets and import them into R
In most cases, the data files we create will be of .csv type (csv= comma separated values).
These files are easily created in spread-sheet packages like Excel, LibreOffice etc.
To import these files we use the embedded commands read.csv(file.choose())
Example:
The cars in a car-park were counted by make, with the following data collected
| Audi |
3 |
| BMW |
2 |
| Citroen |
5 |
| Ford |
8 |
| Hyundai |
9 |
| Opel |
6 |
| Toyota |
8 |
| VW |
6 |
Create a file to represent this data in Excel and save this file with the extension .csv
Import this data file as Data1 into R using the embedded commands read.csv(file.choose())
rr Data1 <-read.csv(file.choose())
In the code chunk above I have imported a data file I created myself, and saved the contents of this file as a data structure I call Data1.
To display this data structure simply run Data1 inside a code chunk as follows
rr Data1
The column titles Make and Number are named by myself in the data file I created.
To select the data from one of these columns we simply call Data1$ColumnName. For example to display all the car makes in the data file we run
rr Data1$Make
- Display the number of cars of each type, using the data structure you created.
Exercise 1
The closing price of shares in Glanbia PLC on the Irish Stock Exchange, from 28th August – 5th September, 2018 are shown in the table below.
| 5/9/2018 |
14.80 |
| 4/9/2018 |
14.83 |
| 3/9/2018 |
14.71 |
| 31/8/2018 |
14.53 |
| 30/8/2018 |
14.53 |
| 29/8/2018 |
14.50 |
| 28/8/2018 |
14.53 |
Create a .csv file for this data.
Import the data from this file into R.
Display the data from the individual columns of this data structure.
R Functions on Data Sets
The main aim of this course is to find the best way to represent the information in a data set. For that reason, there will be a certain amount of mathematical work involved in this course, which we will cover using simple examples and simple data sets during lectures.
While the mathematics involved is not particularly difficult, when we use real-world data sets (which tend become large data sets), then mathematics can become cumbersome and tedious, and practically impossible to complete by hand.
The R statistical package can do a huge amount of this work for us for these larger data sets, and during the practicals we will try to implement some of the material we learnt during lectures.
Some of the matematical functions we will be applying to data sets will be
- Mean (i.e. Average)
- Median
- Standard Deviation among others.
Example:
Create a data structure to represent the data set \[S=\{1,5,-32,1,1,4,33,6,-6,10,12,-15,22,3,3,-4,18,-19,2,-2,2,1\}\]
Using this data structure answer the following:
Find the mean of \(S\)
Find the median of \(S\)
Find the standard deviation of \(S^2\), where \(S^2\) means each element of \(S\) should be squared individually.
Find the mean of \(S^2+3S\)
Find the median of \(4S^2+S+4\)
To begin we create a data structre which we will name S as follows:
rr S<-c(1,5,-32,1,1,4,33,6,-6,10,12,-15,22,3,3,-4,18,-19,2,-2,2,1) S
- Typing S a second time forces R to display the data in S.
Note: It is crucial that a c is put before the parentheses of the data set, without it R will not interpret this as a data set.
- The mean is found using the function mean()
rr mean(S)
- So the average value of all the numbers in \(S\) is 2.090909.
- The meadian is found with median()
rr median(S)
- So the median value of all the numbers in \(S\) is 2. This means half the data in \(S\) is less than 2 and half is greater than 2.
- To find \(S^2\) we simply write
rr S^2
- We see that each individual value in \(S\) has been squared.
Exercise 2
The closing price of shares in
- The standard deviation is found using the function sd()
rr sd(S^2)
- The values of \(S^2+3S\) are given by
rr S^2+3*S
Note: Do not forget to use * when you want to apply multiplication. For example R does not know how to interpret 3S e.g.
rr # 3S
- The mean of \(S^2+3S\) is
rr mean(S^2+3*S)
- The median of \(4S^2+S+4\) is
rr median(4*S^2+S+4)
Exercise 3
The historical data of shares in Amazon.com, Inc. on the NASDAQ Stock Exchange, from 6th May 2018 - 6th September 2018, is shown in the table below. The data for this table is available at Amazon.com NASDAQ Price, and can be down loaded as a .csv file from Moodle->Data Visualisation->Data Files->Quotes(AMZN).csv.
rr AMZN<-read.csv(file.choose()) AMZN
- Extract the close, open, high and low data columns from this data structure (see Exercise 1) and answer the following:
Find the mean closing price of the shares over the three months, i.e. find the mean of close
Find the median difference between the opening and closing values, i.e. the mean of ( close - open )
Find the standard deviation of high
Find the mean, median and standard deviation of ( high - low )
Exercise 4
The historical data of shares in Red Hat, Inc. on the NASDAQ Stock Exchange is available at Red Hat NASDAQ Price. Downlowad the historical data for the past 18 months as a .csv file, from this website. Import this data into R as RHT and repeat the steps of Exercise 3 for this data structure.
---
title: "Data Visualisation 2018 -- Introduction to R"
output:
  html_notebook: default
  pdf_document: default
---
## Downloading R and RStudio (for homework)

* In this course we will use the statistical package __R__ to compliment the material we cover in lectures.

* All of these practicals will be done using __RStudio__ which is an add-on environment for using the __R__ statistical package.

* The __R__ package can be downloaded from 
[R Download](http://ftp.heanet.ie/mirrors/cran.r-project.org/).
This must be installed before you can use __RStudio__

* The __RStudio__ interface can be downloaded from
[RStudio Dowlonad](https://www.rstudio.com/products/rstudio/download/).
Choose the free option when you download, as this has all the features we will require in this module.


# The R Notebook Environment

## Viewing the notebook

* This is an R Markdown Notebook. This means everything we write in this white space is written in the __Markdown language__.

* To see how this notebook appears when we post it to a web browser, you can click the __Preview__ tab above, and this will render the contents of this notebook in html.

* This workbook can also be viewed in a web browser by clicking the __Open in Browser__ tab which appear in the pop-up window.



## Creating and running __R__ code

To input __R__ code into a workbook we need to create a code-chunk as shown below.

```{r}
# Code chunk (# creates a comment in the code chunk)
```

* You can type the characters at start and end of the code-chunk shown above to create one directly.

* Alternatively, you can use the shortcut __Ctrl+Alt+I__ to insert a code-chunk.

* Otherwise, you can insert a code-chunk from the menu by clicking the __Code__ tab above and selecting the option __Insert Chunk__ option from that menu.

To __execute__ a code-chunk, click the __Run button__ (green triangle) within the chunk.

* Alternatively, placing your cursor inside the chunk and pressing __Ctrl+Shift+Enter__ will also execute the code.

### Example:

* Run the code chunk below. The file _cars_ is a data-file prepackaged with __R__ and __plot__ is a command to plot the data in the data set.

```{r}
plot(cars)
```



# Creating and Importing Data Files

* In many of the practicals we will obtain data files from various sources and use __R__ to perform various tasks with this data.

* We will also create our own data sets and import them into __R__

* In most cases, the data files we create will be of __.csv__ type (__csv__= __comma separated values__).

* These files are easily created in spread-sheet packages like __Excel__, __LibreOffice__ etc.

* To __import__ these files we use the embedded commands __read.csv(file.choose())__

### Example:

The cars in a car-park were counted by make, with the following data collected

|  Make  |  Number |
|--------|---------|
| Audi   |    3    |
| BMW    |    2    |
| Citroen|    5    |
| Ford   |    8    |
| Hyundai|    9    |
| Opel   |    6    |
| Toyota |    8    |
| VW     |    6    |

1. Create a file to represent this data in __Excel__ and save this file with the extension __.csv__ 

2. Import this data file  as __Data1__ into __R__ using the embedded commands __read.csv(file.choose())__


```{r}
Data1 <-read.csv(file.choose())
```


* In the code chunk above I have imported a data file  I created myself, and saved the contents of this file as a data structure I call __Data1__.


* To display this data structure simply run __Data1__ inside a code chunk as follows
```{r}
Data1
```

* The column titles __Make__ and __Number__ are named by myself in the data file I created.

* To select the data from one of these columns we simply call __Data1$ColumnName__. For example to display all the car makes in the data file we run
```{r}
Data1$Make
```

3. Display the number of cars of each type, using the data structure you created.


### Exercise 1

The closing price of shares in __Glanbia PLC__ on the Irish Stock Exchange, from 28th August -- 5th September, 2018 are shown in the table below.


|   Date   | Price |
|----------|-------|
| 5/9/2018 | 14.80 |
| 4/9/2018 | 14.83 |
| 3/9/2018 | 14.71 |
| 31/8/2018| 14.53 |
| 30/8/2018| 14.53 |
| 29/8/2018| 14.50 |
| 28/8/2018| 14.53 |

1. Create a __.csv__ file for this data.

2. Import the data from this file into __R__.

3. Display the data from the individual columns of this data structure.


# R Functions on Data Sets

* The main aim of this course is to find the best way to represent the information in a data set. For that reason, there will be a certain amount of mathematical work involved in this course, which we will cover using simple examples and simple data sets during lectures.

* While the mathematics involved is not particularly difficult, when we use real-world data sets (which tend become large data sets), then mathematics can become cumbersome and tedious, and practically impossible to complete by hand.

* The __R__ statistical package can do a huge amount of this work for us for these larger data sets, and during the practicals we will try to implement some of the material we learnt during lectures.


* Some of the matematical functions we will be applying to data sets will be
  
    * __Mean__ (i.e.  __Average__)
    * __Median__
    * __Standard Deviation__ 
among others. 

### Example:

* Create a data structure to represent the data set
\[S=\{1,5,-32,1,1,4,33,6,-6,10,12,-15,22,3,3,-4,18,-19,2,-2,2,1\}\]

* Using this data structure answer the following:

  1. Find the __mean__ of $S$
  
  2. Find the __median__ of $S$
  
  3. Find the __standard deviation__ of $S^2$, where $S^2$ means each element of $S$ should be squared individually.
  
  4. Find the __mean__ of $S^2+3S$
  
  5. Find the __median__ of $4S^2+S+4$

To begin we create a data structre which we will name __S__ as follows:
```{r}
S<-c(1,5,-32,1,1,4,33,6,-6,10,12,-15,22,3,3,-4,18,-19,2,-2,2,1)
S
```
* Typing __S__ a second time forces __R__ to display the data in __S__.

__Note:__ It is crucial that a __c__ is put before the parentheses of the data set, without it __R__ will not interpret this as a data set. 

1. The mean is found using the function __mean()__

```{r}
mean(S)
```
  * So the average value of all the numbers in $S$ is 2.090909.
  
2. The meadian is found with __median()__

```{r}
median(S)
```
* So the median value of all the numbers in $S$ is 2. This means __half the data__ in $S$ is less than 2 and half is greater than 2.

3. To find $S^2$ we simply write
```{r}
S^2
```
* We see that each individual value in $S$ has been squared.

# Exercise 2

The closing price of shares in 

* The standard deviation is found using the function __sd()__
```{r}
sd(S^2)
```

4. The values of $S^2+3S$ are given by
```{r}
S^2+3*S
```
__Note:__ Do not forget to use * when you want to apply multiplication. For example __R__ does not know how to interpret 3S e.g.
```{r}
# 3S
```
* The mean of $S^2+3S$ is 
```{r}
mean(S^2+3*S)
```


* The median of $4S^2+S+4$ is
```{r}
median(4*S^2+S+4)
```

### Exercise 3

The historical data of shares in __Amazon.com, Inc.__ on the NASDAQ Stock Exchange, from 6th May 2018 - 6th September 2018, is shown in the table below. The data for this table is available at [Amazon.com NASDAQ Price](https://www.nasdaq.com/symbol/amzn/historical), and can be down loaded as a __.csv__ file from __Moodle->Data Visualisation->Data Files->Quotes(AMZN).csv__.

```{r}
AMZN<-read.csv(file.choose())
AMZN
```

* Extract the __close__, __open__, __high__ and __low__ data columns from this data structure (see Exercise 1) and answer the following:

1. Find the mean closing price of the shares over the three months, i.e. find the mean of __close__

2. Find the median difference between the opening and closing values, i.e. the mean of ( __close__ - __open__ )

3. Find the standard deviation of __high__ 

4. Find the mean, median and standard deviation of ( __high__ - __low__ )

### Exercise 4

The historical data of shares in __Red Hat, Inc.__ on the NASDAQ Stock Exchange is available at [Red Hat NASDAQ Price](https://www.nasdaq.com/symbol/rht/historical). Downlowad the historical data for the past __18 months__ as a __.csv__ file, from this website. Import this data into __R__ as __RHT__ and repeat the steps of Exercise 3 for this data structure.
