This R Markdown document contains exercises to accompany the course “Data analysis and visualization using R”.
This is part 1 of the exercises.

This document contains the exercises themselves plus (in most cases) a R code chunk to complete, correct or create.

Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Before Knitting this document, check if you have the devtools package installed, by typing library(devtools) in the console. If this fails, you need to install it by typing install.packages("devtools").

Section 1. R as calculator

In this section you will need to use R as a calculator to get the requested output. The first question is already solved in part so you can see what is expected of you.

Exercise 1.1. Simple number operations

Calculate the following

(a) calculate 42 times 3

42 * 3

## [1] 126

(b) 20 divided by 7

## your code here

(c) the remainder of 20 divided by 7

## your code here

(d) 13 divided by 3, rounded down

## your code here

(e) 13 divided by 3, rounded up

## your code here

(f) 5 divided by 2, rounded

## your code here

(g) 7 divided by 2, rounded

## your code here

(h) can you explain the two results of questions f and g?
<YOUR ANSWER HERE>

Exercise 1.2. Simple vector operations

Calculate/carry out the following

(a) create a vector with the numbers 2, 4, 7, 1, 5 and assign it to a variable called myNumbers

##your code here

(b) create a new vector, squared, that contains the numbers from my_numbers, but squared

##your code here

(c) create a new logical vector, isEven, that holds a Boolean value for each number in my_numbers, indicating whether it is an even number or not

##your code here

(d) create a vector with 9 empty strings and (bonus) put the string “hello, world”" in the 5th position

##your code here

(e) create a factor with the values “blue”, “red”, “blue”, “green”, “blue”, “red”. Plot the frequencies using plot()

##your code here

(f) create a vector with the numbers 6 to 10 in it

##your code here

(g) create a vector with the values 1 2 3 1 2 3 1 2 3 without exactly typing this sequence

##your code here

(h) create a vector with the values 1 1 1 2 2 2 4 4 5 5 6 6 4 4 5 5 6 6, but without exactly typing this sequence

##your code here

(i) create a vector with the sequence of numbers 3.5, 3.7, 3.9, … 4.7 in it, but without exactly typing this sequence

##your code here

Section 2. Basic plotting

Calculate/carry out the following. With all plots, take care to adhere to the rules regarding titles and other decorations. Tip: the site Quick-R has nice detailed information with examples on the different plot types and their configuration. Especially the section on plotting is helpful for these assignments.

Exercise 2.1.

The vectors below hold data for a staircase walking experiment. A subject of normal weight and height was asked to ascend a (long) stairs wearing a heart-rate monitor. The subjects’ heart was registered for different step heights. Create a line (!) plot showing the relationship between heart rate and stair height.

#number of steps on the stairs
stair_height <- c(0, 5, 10, 15, 20, 25, 30, 35)
#heart rate after ascending the stairs
heart_rate <- c(66, 65, 67, 69, 73, 79, 86, 97)
##your code here creating the plot

Exercise 2.2.

The experiment from the previous question was extended with three more subjects. One of these subjects was also of normal weight, while two of the subjects were obese. The data are given below. Create a single scatter plot with connector lines between the points showing the data for all four subjects. Give the normal-weighted subjects a green line/marker and the obese subjects a red line/marker. You can add new data series to a plot by using the points(x, y) function. Use the ylim() function to adjust the Y-axis range.

#number of steps on the stairs
stair_height <- c(0, 5, 10, 15, 20, 25, 30, 35)
#heart rates for subjects with normal weight
heart_rate_1 <- c(66, 65, 67, 69, 73, 79, 86, 97)
heart_rate_2 <- c(61, 61, 63, 68, 74, 81, 89, 104)
#heart rates for obese subjects
heart_rate_3 <- c(58, 60, 67, 71, 78, 89, 104, 121)
heart_rate_4 <- c(69, 73, 77, 83, 88, 96, 102, 127)

##your code here creating the plot

Exercise 2.3.

The body weights of chicks were measured at birth and every second day thereafter until day 20. They were also measured on day 21. There were four groups on chicks on different protein diets. Here are the data for the first four chicks. Chick one and two were on diet 1 and chick three and four on diet 2. Create a single line plot showing the data for all four chicks. Give each chick its own color

# chick weight data
time <- c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 21)
chick_1 <- c(42, 51, 59, 64, 76, 93, 106, 125, 149, 171, 199, 205)
chick_2 <- c(40, 49, 58, 72, 84, 103, 122, 138, 162, 187, 209, 215)
chick_3 <- c(42, 53, 62, 73, 85, 102, 123, 138, 170, 204, 235, 256)
chick_4 <- c(41, 49, 61, 74, 98, 109, 128, 154, 192, 232, 280, 290)

##your code here creating the plot

Exercise 2.4.

With the data from the previous question, create a barplot of the maximum weights of the chicks.

##your code here

Exercise 2.5.

The R language comes with a wealth of datasets for you to use as practice materials. We will see many of these. One of these datasets is The Time-Series dataset called discoveries holding the numbers of “great” inventions and scientific discoveries in each year from 1860 to 1959. Create plot(s) answering these two questions:

(a) What is the frequency distribution of numbers of discoveries per year?

(b) What is the 5-number summary of discoveries per year?

(c) What is the trend over time for the numbers of discoveries per year?

PS actually this is not a simple vector, but a vector with some time=-related attributes called a Time-Series (a ts class), but this does not really matter for this assignment.

#load datasets, if not already loaded
library(datasets)
#look ate the discoveries dataset
discoveries

## Time Series:
## Start = 1860 
## End = 1959 
## Frequency = 1 
##   [1]  5  3  0  2  0  3  2  3  6  1  2  1  2  1  3  3  3  5  2  4  4  0  2
##  [24]  3  7 12  3 10  9  2  3  7  7  2  3  3  6  2  4  3  5  2  2  4  0  4
##  [47]  2  5  2  3  3  6  5  8  3  6  6  0  5  2  2  2  6  3  4  4  2  2  4
##  [70]  7  5  3  3  0  2  2  2  1  3  4  2  2  1  1  1  2  1  4  4  3  2  1
##  [93]  4  1  1  1  0  0  2  0

##your code here

Exercise 2.6.

The R datasets package has three related timeseries datasets relating to lung cancer deaths. These are ldeaths, mdeaths and fdeaths for total, male and female deatchs, respectively. Create a line plot showing the montly mortality holding all three of these datasets. Use the legend() function to add a legend to the plot, as shown in this example:

t <- 1:5
y1 <- c(2, 3, 5, 4, 6)
y2 <- c(1, 3, 4, 5, 7)
plot(t, y1, type = "b", ylab = "response", ylim = c(0, 8))
points(t, y2, col = "blue", type = "b")
legend("topleft", legend = c("series 1", "series 2"), col = c("black", "blue"), pch = 1, lty = 1)

(a) Create the mentioned line plot. Do you see trends and/or patterns and if so, can you explain these?

(b) Create a combined boxplot of the three time-series. Are there outliers? If so, can you figure out when this occurred?

#load datasets, if not already loaded
library(datasets)
#look ate the fdeaths dataset
fdeaths

##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1974  901  689  827  677  522  406  441  393  387  582  578  666
## 1975  830  752  785  664  467  438  421  412  343  440  531  771
## 1976  767 1141  896  532  447  420  376  330  357  445  546  764
## 1977  862  660  663  643  502  392  411  348  387  385  411  638
## 1978  796  853  737  546  530  446  431  362  387  430  425  679
## 1979  821  785  727  612  478  429  405  379  393  411  487  574

##your code here

Section 3. Advanced vector operations

Calculate/carry out the following

Exercise 3.1.

create a vector with the numbers 2, 4, 7, 1, 5 and assign it to a variable called myNumbers

##your code here

Exercise 3.2.

given the vectors below, generate a logical vector that has TRUE values for each position where a is greater than b

a <- c(6, 1, 4, 1, 5, 1, 2)
b <- c(1, 3, 4, 2, 7, 1, 5)

Exercise 3.3.

given the vectors below, generate a logical vector that has TRUE values for each position where a is even or smaller than b

##your code here

Exercise 3.4.

give the actual values of both a and b from the previous question

##your code here

Exercises part 1: basics

Course: Data analysis and visualization using R