This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. Review this website for more details on using R Markdown http://rmarkdown.rstudio.com.
Use RStudio for this assignment. Complete the assignment by inserting your code wherever you see the string “#INSERT YOUR ANSWER HERE”.
When you click the Knit button, a document (PDF, Word, or HTML format) will be generated that includes both the assignment content as well as the output of any embedded R code chunks.
NOTE: YOU SHOULD NEVER HAVE
install.packages IN YOUR CODE; OTHERWISE, THE
Knit OPTION WILL GIVE AN ERROR. COMMENT OUT ALL PACKAGE
INSTALLATIONS.
Submit both the rmd and generated
output files. Failing to submit both files will be subject
to mark deduction. PDF or HTML is preferred.
Use seq() to create the vector \((3,5\ldots,29)\).
seq(3, 30, 2)
## [1] 3 5 7 9 11 13 15 17 19 21 23 25 27 29
seq(3, 29, 2)
## [1] 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Create and print a vector x with all integers from 15 to
100 and a vector y containing multiples of 5 in the same
range. Hint: use seq()function. Calculate the difference in
lengths of the vectors x and y. Hint: use
length()
x <- seq(15, 100)
y <- seq(15, 100, by = 5)
length_diff <- length(x) - length(y)
print(x)
## [1] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
## [20] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## [39] 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
## [58] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [77] 91 92 93 94 95 96 97 98 99 100
print(y)
## [1] 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
print(length_diff)
## [1] 68
Create a new vector, x_square, with the square of
elements at indices 1, 11, 21, 31, 41, 51, 61, and 71 from the variable
x. Hint: Use indexing rather than a for loop.
Calculate the mean and median of the FIRST five values from
x_square.
indices <- seq(1, 71, by = 10)
x_square <- x[indices]^2
mean_first_five <- mean(x_square[1:5])
median_first_five <- median(x_square[1:5])
print(x_square)
## [1] 225 625 1225 2025 3025 4225 5625 7225
print(mean_first_five)
## [1] 1425
print(median_first_five)
## [1] 1225
For a given factor variable of
factorVar <- factor(c(10.8, 2.7, 5.0, 3.5)). To convert
the factor to number, you need to either: 1) use level() to
extract the level labels, then use as.numeric() to convert
the labels to numbers, or 2) use as.charactor() to convert
the values in the factorVar, then use as.numeric() to
convert the values to numbers
Please provide both solutions
# Method 1
factorVar <- factor(c(10.8, 2.7, 5.0, 3.5))
levels <- levels(factorVar)
numericVar <- as.numeric(levels[factorVar])
print(numericVar)
## [1] 10.8 2.7 5.0 3.5
# Method 2
factorVar <- factor(c(10.8, 2.7, 5.0, 3.5))
characterVar <- as.character(factorVar)
numericVar <- as.numeric(characterVar)
print(numericVar)
## [1] 10.8 2.7 5.0 3.5
A comma-separated values file dataset.csv consists of
missing values represented by Not A Number (null) and
question mark (?). How can you read this type of files in
R? NOTE: Please make sure you have saved the dataset.csv
file at your current working directory.
# Set the file path
file_path <- "dataset.csv"
# Read the CSV file, specifying the missing values
data <- read.csv(file_path, na.strings = c("null", "?"))
# Print the data
print(data)
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
## 1 11 12 13 14 15 16 17 18 19 20
## 2 21 22 23 24 25 26 27 28 29 30
## 3 31 32 33 34 35 36 37 38 39 40
## 4 41 42 43 44 45 NA 47 48 49 50
## 5 51 52 53 NA 55 56 57 NA 59 60
## 6 61 62 63 64 65 66 67 68 69 70
## 7 71 72 NA 74 75 76 77 78 79 80
## 8 81 82 83 84 85 86 87 88 89 NA
## 9 91 92 93 94 95 96 97 98 99 100
## 10 NA 102 103 104 105 106 107 108 109 110
## 11 111 112 113 114 115 116 117 118 119 120
## 12 121 122 123 124 125 126 127 128 129 130
## 13 131 132 133 134 135 136 137 138 139 NA
## 14 141 142 143 144 145 146 147 148 149 150
## 15 151 152 153 154 155 156 157 158 159 160
## 16 161 162 163 164 NA 166 167 168 169 170
Compute: \[\frac{1}{4!}
\sum_{n=10}^{40}3^{n}\] Hint: Use factorial(n) to
compute \(n!\).
# Compute the factorial of 4
factorial_4 <- factorial(4)
# Initialize the sum variable
sum_value <- 0
# Iterate from 10 to 40
for (n in 10:40) {
sum_value <- sum_value + 3^n
}
# Compute the final result
result <- (1 / factorial_4) * sum_value
# Print the result
print(result)
## [1] 7.598541e+17
Compute: \[\prod_{n=1}^{20} \left( 3n + \frac{1}{n} \right)\] NOTE: The symbol \(\Pi\) represents multiplication.
# Initialize the product variable
product_value <- 1
# Iterate from 1 to 20
for (n in 1:20) {
term <- 3*n + 1/n
product_value <- product_value * term
}
# Print the result
print(product_value)
## [1] 1.373708e+28
Describe what the following R command does:
c(0:5)[NA]
# The R command c(0:5)[NA] creates a vector using the c() function with values ranging from 0 to 5, and then selects an element from this vector using the index NA.
# Create the vector with values from 0 to 5
vector <- c(0:5)
# Use NA as the index to select an element from the vector
result <- vector[NA]
# Print the result
print(result)
## [1] NA NA NA NA NA NA
Describe the purpose of is.vector(),
is.character(), is.numeric(), and
is.na() functions? Please use
x <- c("a","b",NA,2) to explain your description.
# Here's the code that demonstrates the purpose of is.vector(), is.character(), is.numeric(), and is.na() functions using the vector x <- c("a", "b", NA, 2):
# Create the vector
x <- c("a", "b", NA, 2)
# Check if x is a vector
is_vector <- is.vector(x)
print(is_vector)
## [1] TRUE
# Check if the elements of x are of character type
is_character <- is.character(x)
print(is_character)
## [1] TRUE
# Check if the elements of x are of numeric type
is_numeric <- is.numeric(x)
print(is_numeric)
## [1] FALSE
# Check if the elements of x are missing values (NA)
is_na <- is.na(x)
print(is_na)
## [1] FALSE FALSE TRUE FALSE
# In this code:
#is.vector(x) checks if x is a vector. The result will be TRUE since x is a vector.
# is.character(x) checks if the elements of x are of character type. The result will be TRUE for the first two elements ("a" and "b") and FALSE for the third and fourth elements.
# is.numeric(x) checks if the elements of x are of numeric type. The result will be TRUE for the fourth element (2) and FALSE for the first three elements.
# is.na(x) checks if the elements of x are missing values (NA). The result will be FALSE for the first two elements ("a" and "b"), TRUE for the third element (NA), and FALSE for the fourth element (2).
The airquality dataset contains daily air quality
measurements in New York from May to September 1973. The variables
include Ozone level, Solar radiation, wind speed, temperature in
Fahrenheit, month, and day. Please see the detailed description using
help("airquality").
Install the airquality data set on your computer using
the command install.packages("datasets"). Then load the
datasets package into your session.
library(datasets)
help("airquality")
Display the first 6 rows of the airquality data set.
library(datasets)
head(airquality, n = 6)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
Compute the average of the first four variables (Ozone, Solar.R, Wind
and Temp) for the fifth month using the sapply() function.
Hint: You might need to consider removing the NA values;
otherwise, the average will not be computed.
library(datasets)
data(airquality)
data_fifth_month <- subset(airquality, Month == 5)
data_fifth_month <- na.omit(data_fifth_month[, c("Ozone", "Solar.R", "Wind", "Temp")])
average_values <- sapply(data_fifth_month, mean)
print(average_values)
## Ozone Solar.R Wind Temp
## 24.12500 182.04167 11.50417 66.45833
Construct a boxplot for the all Wind and
Temp variables, then display the values of all the outliers
which lie beyond the whiskers.
library(datasets)
data(airquality)
boxplot(airquality$Wind, airquality$Temp, names = c("Wind", "Temp"), outline = TRUE)
outliers <- boxplot.stats(airquality$Wind)$out
outliers <- c(outliers, boxplot.stats(airquality$Temp)$out)
print(outliers)
## [1] 20.1 18.4 20.7
# I have checked everything the name Wind is present in the column, probably data loss
Compute the upper quartile of the Wind variable with two
different methods. HINT: Only show the upper quartile using indexing.
For the type of quartile, please see https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile.
library(datasets)
data(airquality)
# Method 1: Using quantile() function
upper_quartile_method1 <- quantile(airquality$Wind, probs = 0.75, type = 7)
print(upper_quartile_method1)
## 75%
## 11.5
# Method 2: Using indexing
sorted_wind <- sort(airquality$Wind)
n <- length(sorted_wind)
upper_quartile_method2 <- sorted_wind[ceiling(n * 0.75)]
print(upper_quartile_method2)
## [1] 11.5
Construct a pie chart to describe the number of entries by
Month. HINT: use the table() function to count
and tabulate the number of entries within a Month.
library(datasets)
data(airquality)
# Count the number of entries by Month
month_counts <- table(airquality$Month)
# Create a pie chart
pie(month_counts, labels = names(month_counts), main = "Number of Entries by Month")
END of Assignment #1.