This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. Review this website for more details on using R Markdown http://rmarkdown.rstudio.com.
Use RStudio for this assignment. Complete the assignment by inserting your code wherever you see the string “#INSERT YOUR ANSWER HERE”.
When you click the Knit button, a document (PDF, Word, or HTML format) will be generated that includes both the assignment content as well as the output of any embedded R code chunks.
NOTE: YOU SHOULD NEVER HAVE
install.packages IN YOUR CODE; OTHERWISE, THE
Knit OPTION WILL GIVE AN ERROR. COMMENT OUT ALL PACKAGE
INSTALLATIONS.
Submit both the rmd and generated
output files. Failing to submit both files will be subject
to mark deduction. PDF or HTML is preferred.
Use seq() to create the vector \((3,5\ldots,29)\).
seq(3, 30, 2)
## [1] 3 5 7 9 11 13 15 17 19 21 23 25 27 29
seq(3, 29, 2)
## [1] 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Create and print a vector x with all integers from 4 to
115 and a vector y containing multiples of 4 in the same
range. Hint: use seq()function. Calculate the difference in
lengths of the vectors x and y. Hint: use
length()
x<-c(4:115)
x
## [1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## [19] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
## [37] 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
## [55] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## [73] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
## [91] 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
## [109] 112 113 114 115
y<-seq(4,115,by=4)
length(x)-length(y)
## [1] 84
Create a new vector, y_square, with the square of
elements at indices 1, 3, 7, 12, 17, 20, 22, and 24 from the variable
y. Hint: Use indexing rather than a for loop.
Calculate the mean and median of the FIRST five values from
y_square.
y_square<-y[c(1,3,7,17,17,20,22,24)]^2
mean(y_square[1:5])
## [1] 2038.4
median(y_square[1:5])
## [1] 784
For a given factor variable of
factorVar <- factor(c(1, 6, 5.4, 3.2)), would it be
correct to use the following commands to convert factor to number?
as.numeric(factorVar)
This command will only provide the integer codes associated with the respective levels of the factor. Which would be 1 4 3 2, in increasing order. To correctly convert the factor to character then back to numeric. If not, explain your answer and provide the correct one.
factorVar <- (c(1,6,5.4,3.2))
as.numeric(as.character(factorVar))
## [1] 1.0 6.0 5.4 3.2
A comma-separated values file dataset.csv consists of
missing values represented by Not A Number (null) and
question mark (?). How can you read this type of files in
R? NOTE: Please make sure you have saved the dataset.csv
file at your current working directory.
By specifying “null” and”?” in the na.strings argument, it specifies that those strings should be treated as NA or missing values.
null_dataset<-read.csv("dataset.csv",
na.strings = c("null","?"))
Compute: \[\sum_{n=5}^{20}\frac{(-1)^{n}}{(n!)^2}\]
Hint: Use factorial(n) to compute \(n!\).
sum((-1)^(5:20) / factorial(5:20)^2)
## [1] -6.755419e-05
Compute: \[\prod_{n=1}^{5} \left( 4n + \frac{1}{2^n} \right)\] NOTE: The symbol \(\Pi\) represents multiplication.
prod(4*(1:5)+1/(2^(1:5)))
## [1] 144833.6
Describe what the following R command does: c(0:5)[NA]
The output of c(0:5)[NA] is the result of asking for elements at the
position NA to be returned using infex operation of []. Because NA is
not a valid index, the result is NA for all the values in the vector
that has been created. c(0:5) creates a numeric vector with values 0 to
5. [NA] is an indexing funciton the is directed to extract the element
at the position specified within the [].
c(0:5)[NA]
## [1] NA NA NA NA NA NA
Describe the purpose of is.vector(),
is.character(), is.numeric(), and
is.na() functions? Please use
x <- c("a", "b", NA, 2) to explain your description.
Each of these functions are a way to check/figure out certain attributes about a dataset.
is.vector() checks whether the variable is a vector. TRUE = variable is a vector, FALSE = variable is not a vector For something to be a vector all the elements must be of the same mode so it also tells us that all the elements in our vector are being read as the same mode (type).
is.character() checks whether the elements in the vector are of character type. TRUe = vector elements are chacter, FALSE = vector elements are not character. This tells us that all elements in our vector are stored as characters, which is important to know because it means that the number 2 is not stored as number but as a character which will impact further computations.
is.numeric() checks whether the elements in the vector are of numerica type. TRUE = vector elements are numeric, FALSE = vector elements are not numeric.
In the vector x, we are able to see which value is NA, the function is.na() is also able to tell us which values in a dataset or vector are NA by returning logical vectors, TRUE = corresponding element of data set is NA, FALSE = corresponding element of dataset is not NA.
x <- c("a", "b", NA, 2)
is.vector(x)
## [1] TRUE
is.character(x)
## [1] TRUE
is.numeric(x)
## [1] FALSE
is.na(x)
## [1] FALSE FALSE TRUE FALSE
The airquality dataset contains daily air quality
measurements in New York from May to September 1973. The variables
include Ozone level, Solar radiation, wind speed, temperature in
Fahrenheit, month, and day. Please see the detailed description using
help("airquality").
Install the airquality data set on your computer using
the command install.packages("datasets"). Then load the
datasets package into your session.
library(datasets)
airquality<-read.csv("airquality.csv")
Display the first 10 rows of the airquality data
set.
head(airquality, n=10)
## X Ozone Solar.R Wind Temp Month Day
## 1 1 41 190 7.4 67 5 1
## 2 2 36 118 8.0 72 5 2
## 3 3 12 149 12.6 74 5 3
## 4 4 18 313 11.5 62 5 4
## 5 5 NA NA 14.3 56 5 5
## 6 6 28 NA 14.9 66 5 6
## 7 7 23 299 8.6 65 5 7
## 8 8 19 99 13.8 59 5 8
## 9 9 8 19 20.1 61 5 9
## 10 10 NA 194 8.6 69 5 10
Compute the average of the first four variables (Ozone, Solar.R, Wind
and Temp) for the fifth month using the sapply() function.
Hint: You might need to consider removing the NA values;
otherwise, the average will not be computed.
y<-[!is.na(x)] this will isolate all the none NA variables from
the dataset x
averages_may <- sapply(airquality[, 1:4], function(x)median(x[airquality$Month==5],simplify=TRUE, na.rm=TRUE))
averages_may
## X Ozone Solar.R Wind
## 16.0 18.0 194.0 11.5
Construct a boxplot for the all Wind and
Temp variables, then display the values of all the outliers
which lie beyond the whiskers.
boxplot(Wind ~ Temp, data =
airquality, col = "blue")
boxplot(airquality$Wind, main="Boxplot of Wind", col="blue")
boxplot(airquality$Temp, main="Boxplot of Temp", col="red")
bp_wind <- boxplot(airquality$Wind, plot = FALSE)
wind_outliers <- bp_wind$out
print(wind_outliers)
## [1] 20.1 18.4 20.7
bp_temp <- boxplot(airquality$Temp, plot = FALSE)
temp_outliers <- bp_temp$out
print(temp_outliers)
## numeric(0)
Compute the upper quartile of the Wind variable with two
different methods. HINT: Only show the upper quartile using indexing.
For the type of quartile, please see https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile.
quantile(airquality$Wind)
## 0% 25% 50% 75% 100%
## 1.7 7.4 9.7 11.5 20.7
quantile(airquality$Wind, 0.75)
## 75%
## 11.5
quantile(airquality$Wind, probs = 0.75, na.rm = FALSE,
names = TRUE, type = 3)
## 75%
## 11.5
Construct a pie chart to describe the number of entries by
Month. HINT: use the table() function to count
and tabulate the number of entries within a Month.
table(airquality$Month)
##
## 5 6 7 8 9
## 31 30 31 31 30
air.month<-c(31,30,31,31,30)
names(air.month)<-c("May","June","July","August","September")
pie(air.month)
END of Assignment #1.