Download the coding_assignment_1.Rmd file from LMS.
Open coding_assignment_1.Rmd in RStudio.
Replace the “Your Name Here” text in the author: field with your own name.
Supply your solutions to the homework by editing coding_assignment_1.Rmd.
When you have completed the homework and have checked that your code both runs in the Console and knits correctly when you click Knit HTML, rename the R Markdown file to coding_assignment_1_YourNameHere.Rmd, and submit BOTH your rmarkdown AND html file on LMS (YourNameHere should be changed to your own name.)
| Keystroke | Description |
|---|---|
<tab> |
Autocompletes commands and filenames, and lists arguments for functions. |
<up> |
Cycles through previous commands in the console prompt |
<ctrl-up> |
Lists history of previous commands matching an unfinished one |
<ctrl-enter> |
Runs current line from source window to Console. Good for trying things out ideas from a source file. |
<ESC> |
Aborts an unfinished command and get out of the + prompt |
Note: Shown above are the Windows/Linux keys. For Mac OS X, the <ctrl> key should be substituted with the <command> (⌘) key.
Instead of sending code line-by-line with <ctrl-enter>, you can send entire code chunks, and even run all of the code chunks in your .Rmd file. Look under the
Run your code in the Console and Knit HTML frequently to check for errors.
You may find it easier to solve a problem by interacting only with the Console at first.
Tip: Note that each of the code blocks in this Problem contain the expression
eval = FALSE. This tells R Markdown to display the code contained in the block, but not to evaluate it. To check that your answer makes sense, be sure to try it out in the console with various choices of values for the variablex.
# Install dplyr package
install.packages("dplyr")
# Load dplyr package
library("dplyr")
Given a variable x, write a Boolean expression that evaluates to TRUE if the variable x is equal to -20 (the numeric value).
x <- c(1, 2, 3, 94842, -20, NA)
x == -20 ## checking if x is equal to -20
Given a variable x, write a Boolean expression that evaluates to TRUE if the variable x is not NA (i.e., is not missing).
x <- c(1, 2, 3, 94842, -20, NA)
x != "NA" ## checking if x is not equal to NA
Given a (possibly negative) number x, write a Boolean expression that returns TRUE if and only if x is smaller than -10 or bigger than 40.
x <- c(1, 2, 3, 94842, -20, NA)
x < -10 |
x > 40 ## checking if x is less than -10 and greater than 40
Given an integer number x, write a Boolean expression that returns TRUE if and only if x is an odd number between -8 and 12 or 100 and 150.
x <- c(1, 2, 3, 94842, -20, NA)
x %% 2==1 & between(x, -8, 20) | between(x, 100, 150)
Tip: Recall the modulus operator we saw in lecture 7: %%. For integers x and y, x %% y is the remainder of x divided by y.
&, |) and double (&&, ||).One of these operators takes advantage of something called lazy evaluation while the other does not. They also don’t behave the same way when applied to vectors.
Read the help file (help("||")) and construct some examples to help figure out how the two behave.
To help you get started, try out the following two examples in your console:
# Example: The variable y.prob2a is never defined.
# (Do not define it!)
# What happens when you run this code?
x.prob2a <- 5
(x.prob2a < 10) | (y.prob2a > 2)
(x.prob2a < 10) || (y.prob2a > 2)
x.prob1a <- -3
(x.prob1a<5) | (y.prob1a>-10)
(x.prob1a<5) || (y.prob1a>-10)
# Define vectors
x.prob2a.vec <- c(TRUE, FALSE, FALSE)
y.prob2a.vec <- c(TRUE, TRUE, FALSE)
# Apply various Boolean operations to see what happens
x.prob2a.vec & y.prob2a.vec
x.prob2a.vec && y.prob2a.vec
x.prob2a.vec | y.prob2a.vec
x.prob2a.vec || y.prob2a.vec
# Self made examples
##Define vectors
x.prob1a.vec <- c(TRUE, FALSE, FALSE)
y.prob1a.vec <- c(FALSE, FALSE, TRUE)
## Apply various Boolean operations to see what happens
x.prob1a.vec & y.prob1a.vec
x.prob1a.vec && y.prob1a.vec
x.prob1a.vec | y.prob1a.vec
x.prob1a.vec || y.prob1a.vec
Can you explain what’s happening? Write up a brief explanation below.
Basically, the & operation gives us FALSE if any of the two entries in both vectors is FALSE, it gives us TRUE only if the result is exactly same in both vectors and | gives us TRUE if any of the entries are TRUE and gives us FALSE only if both are FALSE. On the other hand && and || just returns us the first element of the vector given us by the & and |
all()Two people were asked to give their preferences between two options: [Facebook, Twitter], [Safari, Chrome], [Mac, PC], [Summer, Winter]. Their results are given below.
alice.prefs <- c("Twitter", "Safari", "Mac", "Summer")
bob.prefs <- c("Facebook", "Chrome", "PC", "Summer")
Use the all() function to determine if the two people have identical preferences. (Your code should ouput a single Boolean value, either TRUE or FALSE)
##Your code here
all(alice.prefs == bob.prefs) ## checking if both people have exact same preferences
## [1] FALSE
any()Use the any() function to determine if the two people have any preferences in common. (Your code should output a single Boolean value, either TRUE or FALSE)
##Your code here
any (alice.prefs == bob.prefs)
## [1] TRUE
Let age be the vector defined below.
age <- c(18, NA, 25, 71, NA, 45, NA, NA, 18)
Write a Boolean expression that checks whether each entry of age is missing (recall missing values are denoted by NA). Your expression should return a Boolean vector having the same length as age.
##Your code here
###checking if any entry of age is missing and denoting it by NA
age == "NA"
## [1] FALSE NA FALSE FALSE NA FALSE NA NA FALSE
which() practiceWrite code that returns the indexes of age that are missing.
##Your code here
which(is.na(age))
## [1] 2 5 7 8
Write code that uses negative indexes and your solution from (a) to return only the values of age that are not missing. (i.e., your code should result in a vector with elements: 18, 25, 71, 45, 18)
##Your code here
age[-c(2,5,7,8)]
## [1] 18 25 71 45 18
Add the negation operator ! just before is.na() function to write an expression that returns only the values of age that are not missing.
##Your code here
age[which(!is.na(age))]
## [1] 18 25 71 45 18
which() practiceFor the next three problems we’ll use the preloaded cars data set.
speed <- cars$speed
dist <- cars$dist
dist
## [1] 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 34 46
## [20] 26 36 60 80 20 26 54 32 40 32 40 50 42 56 76 84 36 46 68
## [39] 32 48 52 56 64 66 54 70 92 93 120 85
Write code to figure out which cars had a stopping distance of 30 feet or more.
which(dist >= 30) ## the cars that had a stopping distance pf 30 or more feet
## [1] 9 17 18 19 21 22 23 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
## [26] 44 45 46 47 48 49 50
which.min, which.max practiceUse the which.min() function to figure out which car had the shortest stopping distance. (Your code should return the car’s index.)
which.min(dist)
## [1] 1
Use the which.max() function to figure out the speed of the car that had the longest stopping distance. (Your code should return the car’s speed.)
dist[which.max(dist)]
## [1] 120
Use the read.csv() function to import the survey data provided to you on LMS into a variable called survey.
survey <- read.csv(file="E:/work/semester 3/Telling stories with data/assignments/survey_data.csv" )
$ notationUse the $ operator to select the Rexperience column from the survey data
survey $ Rexperience
## [1] "Never used" "Never used" "Basic competence"
## [4] "Basic competence" "Never used" "Never used"
## [7] "Basic competence" "Basic competence" "Basic competence"
## [10] "Never used" "Basic competence" "Installed on machine"
## [13] "Basic competence" "Never used" "Basic competence"
## [16] "Installed on machine" "Basic competence" "Installed on machine"
## [19] "Installed on machine" "Never used" "Basic competence"
## [22] "Installed on machine" "Basic competence" "Basic competence"
## [25] "Basic competence" "R Wizard" "Basic competence"
## [28] "Installed on machine" "Installed on machine" "Never used"
## [31] "Never used"
Repeat part (b) using [,] notation. i.e., Use [,] notation to select the TVhours column from the survey data by name (i.e., obtain this column by using the name “Rexperience” instead of using the column number)
survey[,"TVhours"]
## [1] 2 15 16 0 2 5 0 4 0 0 14 6 10 15 4 4 10 7 33 8 8 3 0 0 0
## [26] 0 6 1 0 0 4
Repeat part (c) with [[]] notation.
survey[["TVhours"]]
## [1] 2 15 16 0 2 5 0 4 0 0 14 6 10 15 4 4 10 7 33 8 8 3 0 0 0
## [26] 0 6 1 0 0 4
Repeat part (d), but this time using single blackets ([ ]) notation.
(Observe that this returns a new single-column data frame, not just a vector.)
survey["TVhours"]
## TVhours
## 1 2
## 2 15
## 3 16
## 4 0
## 5 2
## 6 5
## 7 0
## 8 4
## 9 0
## 10 0
## 11 14
## 12 6
## 13 10
## 14 15
## 15 4
## 16 4
## 17 10
## 18 7
## 19 33
## 20 8
## 21 8
## 22 3
## 23 0
## 24 0
## 25 0
## 26 0
## 27 6
## 28 1
## 29 0
## 30 0
## 31 4
What are datatypes for the Rexperience and TVhours columns in the data provided? Write code to confirm the datatypes.
#First method
library(tidyverse)
survey_specific <- select(survey, TVhours, Rexperience)
sapply(survey_specific, class)
##alternate method
class(survey[["TVhours"]])
class(survey[["Rexperience"]])
Rexperience is a character and TVhours is integer
subset() practiceUse the subset() function to select all the survey data on Program, OperatingSystem, and TVhours for respondents whose Rexperience is “Basic competence” or who watched 8 or more hours of TV last week.
## selecting columns for which the conditions of TVhours and Rexperience meet
survey_specific_1 <- subset(survey, Rexperience == "Basic competence" | TVhours >= 8 )
## sel;ecting the specific columns for which the conditions meet
library(tidyverse)
survey_specific_2 <- select(survey_specific_1, Program, OperatingSystem, TVhours)
subset(survey_specific_2)
One of the greatest benefits of r markdown is its ability to process inline code. Copy the text inside the code chunk below to a space outside the code chunk just below this paragraph and see what happens when you knit the file.
Students of this class on average watch `r round(mean(survey_data$TVhours), 2)` hours of TV.
:::::::::::::::::::::::::::::::::
Students of this class on average watch 5.71 hours of TV. :::::::::::::::::::::::::::::::::
subset (survey, Program == "HSS" & Rexperience =="Basic competence" )
## X Program PriorExp Rexperience OperatingSystem TVhours
## 3 3 HSS Some experience Basic competence Mac OS X 16
## 7 7 HSS Some experience Basic competence Mac OS X 0
## 9 9 HSS Some experience Basic competence Windows 0
## 15 15 HSS Some experience Basic competence Mac OS X 4
## 17 17 HSS Some experience Basic competence Mac OS X 10
## 21 21 HSS Some experience Basic competence Windows 8
## 23 23 HSS Some experience Basic competence Windows 0
## 24 24 HSS Some experience Basic competence Windows 0
## 25 25 HSS Some experience Basic competence Windows 0
## 27 27 HSS Some experience Basic competence Mac OS X 6
## Editor
## 3 Microsoft Word
## 7 Microsoft Word
## 9 Microsoft Word
## 15 Microsoft Word
## 17 R Markdown
## 21 Microsoft Word
## 23 Microsoft Word
## 24 Excel
## 25 R Markdown
## 27 R Markdown
Replace all occurrences of ___in the paragraph below with an inline code chunk supplying the appropriate information.
Of the 31 survey respondents, 10 were NOT from the HSS program. We found that 35.4% of the all students in the class use the Mac OS X operating system. 47.6% of HSS students report having Basic competence in R.