Assignment 1 (10%)

[Ryan Tarafder]

[CIND123 D30, 501025354]


Instructions

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. Review this website for more details on using R Markdown http://rmarkdown.rstudio.com.

Use RStudio for this assignment. Complete the assignment by inserting your code wherever you see the string “#INSERT YOUR ANSWER HERE”.

When you click the Knit button, a document (PDF, Word, or HTML format) will be generated that includes both the assignment content as well as the output of any embedded R code chunks.

NOTE: YOU SHOULD NEVER HAVE install.packages IN YOUR CODE; OTHERWISE, THE Knit OPTION WILL GIVE AN ERROR. COMMENT OUT ALL PACKAGE INSTALLATIONS.

Submit both the rmd and generated output files. Failing to submit both files will be subject to mark deduction. PDF or HTML is preferred.

Sample Question and Solution

Use seq() to create the vector (3,5,29).

seq(3, 30, 2)
##  [1]  3  5  7  9 11 13 15 17 19 21 23 25 27 29
seq(3, 29, 2)
##  [1]  3  5  7  9 11 13 15 17 19 21 23 25 27 29

Question 1 (32 points)

Q1a (8 points)

Create and print a vector x with all integers from 1 to 80 and a vector y containing multiples of 3 in the same range. Hint: use seq()function. Calculate the difference in lengths of the vectors x and y. Hint: use length()

#INSERT YOUR ANSWER HERE
x<-c(1:80)
x
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## [76] 76 77 78 79 80
y<-seq(3,80,3)
y
##  [1]  3  6  9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75
## [26] 78
length(x)-length(y)
## [1] 54

Q1b (8 points)

Create a new vector, x_square, with the square of elements at indices 1, 3, 5, 7, 9, 15, 20, 23, 24, and 29 from the variable x. Hint: Use indexing rather than a for loop. Calculate the mean and median of the FIRST five values from x_square.

#INSERT YOUR ANSWER HERE
x<-c(1,3,5,7,9,15,20,23,24,29)
x_square<-x^2
mean(x_square[1:5])
## [1] 33
median(x_square[1:5])
## [1] 25

Q1c (8 points)

To convert factor to number, would it be correct to use the following commands? Explain your answer.

factorVar <- factor(c(1, 6, 5.4, 3.2)) as.numeric(factorVar)

#INSERT YOUR ANSWER HERE
#The following command is incomplete, although it prints the factorVar variable 
#as numeric in the console, the variable itself is still classified as 
#a factor (class(factorVar) = "factor"). The correct method of completing the 
#command would be to reassign/assign and apply the "as.numeric" function 
#under a new variable. After that, the class would come up as numeric 
#(class(factorVar) = "numeric") as shown below.

factorVar <- factor(c(1, 6, 5.4, 3.2))
factorVar<-as.numeric(factorVar)

## To check if the factor converted to a number
class(factorVar)
## [1] "numeric"

Q1d (8 points)

A comma-separated values file dataset.csv consists of missing values represented by Not A Number (NaN) and question mark (?). How can you read this type of files in R? NOTE: Please make sure you have saved the dataset.csv file at your current working directory.

#INSERT YOUR ANSWER HERE
getwd()
## [1] "C:/Users/ryant/OneDrive/Desktop"
dataset <- read.csv(file = 'C:/Users/ryant/OneDrive/Desktop/dataset.csv' , header=FALSE, stringsAsFactors = FALSE, na.strings=c("?","NaN"))
na.omit(dataset)
##     V1  V2  V3  V4  V5  V6  V7  V8  V9 V10
## 1    1   2   3   4   5   6   7   8   9  10
## 2   11  12  13  14  15  16  17  18  19  20
## 3   21  22  23  24  25  26  27  28  29  30
## 4   31  32  33  34  35  36  37  38  39  40
## 7   61  62  63  64  65  66  67  68  69  70
## 10  91  92  93  94  95  96  97  98  99 100
## 12 111 112 113 114 115 116 117 118 119 120
## 13 121 122 123 124 125 126 127 128 129 130
## 15 141 142 143 144 145 146 147 148 149 150
## 16 151 152 153 154 155 156 157 158 159 160

Question 2 (32 points)

Q2a (8 points)

Compute:

n=1503nn!
Hint: Use factorial(n) to compute n!.

#INSERT YOUR ANSWER HERE
n<-1:50
sum(3^n/(factorial(n)))
## [1] 19.08554

Q2b (8 points)

Compute:

n=222(3n+3n3)
NOTE: The symbol Π represents multiplication.

#INSERT YOUR ANSWER HERE
n<-2:22
pie<-prod((3*n)+(3/n^(1/3)))

Q2c (8 points)

Describe what the following R command does: c(0:5)[NA]

#INSERT YOUR ANSWER HERE
#The following command creates a character vector from 0 to 5 and indexes for [NA], the command prints "NA"
#5 times as there are no null values within the vector and the command is 
#"recycled" by the length of the vector

Q2d (8 points)

What is the difference between is.vector() and is.numeric() functions?

#INSERT YOUR ANSWER HERE
#the is.vector() command checks whether an object is a vector or not. The 
#is.numeric() function checks whether a vector consists of numeric data 
#(integer,double, etc.). Both functions return TRUE or FALSE

Question 3 (36 points)

The airquality dataset contains daily air quality measurements in New York from May to September 1973. The variables include Ozone level, Solar radiation, wind speed, temperature in Fahrenheit, month, and day. Please see the detailed description using help("airquality").

Install the airquality data set on your computer using the command install.packages("datasets"). Then load the datasets package into your session.

install.packages("datasets")
## Warning: package 'datasets' is in use and will not be installed
library(datasets)

Q3a (4 points)

Display the first 6 rows of the airquality data set.

#INSERT YOUR ANSWER HERE
head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

Q3b (8 points)

Compute the average of the first four variables (Ozone, Solar.R, Wind and Temp) using the sapply() function. Hint: You might need to consider removing the NA values; otherwise, the average will not be computed.

#INSERT YOUR ANSWER HERE
airquality<-na.omit(airquality)

sapply(airquality[1:4],mean)
##     Ozone   Solar.R      Wind      Temp 
##  42.09910 184.80180   9.93964  77.79279

Q3c (8 points)

Construct a boxplot for the Ozone and Solar.R variables, then display the values of all the outliers which lie beyond the whiskers.

#INSERT YOUR ANSWER HERE
#airquality<-na.omit(airquality)
par(mfcol=c(1,2))

ozone<-boxplot(airquality$Ozone, main="Ozone")
ozone$out
## [1] 135 168
solar<-boxplot(airquality$Solar.R, main="Solar")

solar$out
## numeric(0)

Q3d (8 points)

Compute the upper quartile of the Ozone variable with two different methods. HINT: Only show the upper quartile using indexing. For the type of quartile, please see https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile.

#INSERT YOUR ANSWER HERE
#airquality<-na.omit(airquality)
quartile <- quantile(airquality$Ozone, na.rm = TRUE)
quartile[4]
## 75% 
##  62
quartile2 <- quantile(airquality$Ozone, probs = seq(0, 1, 1/4), na.rm = TRUE) 
quartile2[4]
## 75% 
##  62

Q3e (8 points)

Construct a pie chart to describe the number of entries by Month. HINT: use the table() function to count and tabulate the number of entries within a Month.

#INSERT YOUR ANSWER HERE
rm(airquality)
month_names<-c("May (5)","June (6)","July (7)","August (8)","September (9)")
months<-table(airquality$Month)
rownames(months)<-month_names

pie(months,main="Entries by Month (Null values included)")

END of Assignment #1.