Enter your name and today’s date in Lines 3 & 4, and then run this chunk. Note: Numbering corresponds to chunk numbers. Chunk 1 specified the knitting parameters.
Reminder: This is the R Proficiency Exam Part 1 of 3. Please note the other parts of the R Proficiency Exam are in separate files. Your submission should include BOTH HTML and Rmd files from ALL PARTS. If your file does not knit correctly, submit the Rmd file for partial credit.
Goal: Demonstrate Introductory R Proficiency
Directions for the exam: 1. Your Name on Line 3 and netid in Chunk 2
indicate your compliance with the Fuqua Honor Code (https://www.fuqua.duke.edu/honorcode) for this part of
the Final Exam. FILLING IN YOUR NAME AND NETID ARE BOTH REQUIRED TO
RECEIVE CREDIT FOR THIS EXAM.
2. This exam is an open book/internet/notes/etc.; you must cite all
sources you consult (using the URL or other identifiable information).
3. You can’t discuss with any living being except Dr. Salman Azhar. 4.
This exam uses data similar to the data used in Practice Exams. While
the data is slightly different, the structure is similar. 5. To maintain
consistency across students, we will only answer logistical questions
during the exam. For instance, “When I have submitted my Exam, can I
walk away from my computer?” (Note: You can). If you have a logistical
question, send it to Dr. Salman Azhar in the Zoom chat window. If you
have an emergency, call +1 408-806-3500, and we’ll solve it
together.
Replace “sa239” with your specific NetID (without removing the quotes). Your NetID is composed of your initials followed by numbers. My NetID is sa239. For instance, if your NetID is abc123, then your code should read: myNetID = “abc123”. THIS IS REQUIRED FOR YOU TO GET CREDIT FOR THIS EXAM. Your NetID in this chunk indicates your compliance with the Fuqua Honor Code (https://www.fuqua.duke.edu/honorcode) for this exam.
#REPLACE netid WITH YOUR NETID and DO NOT CHANGE ANY OTHER PART OF THIS CODE CHUNK (just run it).
myNetID = "ks715"
Run the following code chunk to initialize 7 vectors; each vector has 12 elements. a. Each vector’s each element contains a simulation of the daily projected vehicle count using I-885’s Exit 14 after it opens. b. The 7 vectors correspond to the day of the week (Mon, Tue, Wed, Thu, Fri, Sat, Sun). c. The 12 elements of each vector correspond to the first 12 weeks in the calendar order; the 1st element corresponds to the 1st week, the 2nd element corresponds to the 2nd week, etc. Tip: You do not need (and should not try) to understand the code in this chunk. You need to run it and understand the above description.
#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
splitted = strsplit(myNetID, "")
seed = nchar(myNetID)
isnumber = c()
myNetLetters = c()
for (i in 1:nchar(myNetID)) {
isnumber = c(isnumber, is.numeric(splitted[[1]][i]))
myNetLetters = c(myNetLetters, splitted[[1]][i])
}
if (FALSE == (("0" <= myNetLetters[nchar(myNetID)]) & (myNetLetters[nchar(myNetID)] <= "9"))) {
myNetLetters[nchar(myNetID)] = "0"
}
mySeed = 2*(as.numeric(myNetLetters[nchar(myNetID)]) + seed)/2 - nchar(myNetID)
iMax = 7
jMax = 12
ijMax = iMax*jMax
distance = ( (seed-1) %% 4 + 1)*100
set.seed(1)
set.seed(mySeed)
middle = 0.222
stddev = middle/3.012
base = c(9000, rep(10000, 3), 9000, 5000, 3000)
case = round(sqrt(base)/2,2)
weeks12 = paste(c(rep("Week", 12)), c(1:12))
dase = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")
wase = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
check = c()
rain = c()
vec = c(1:ijMax)
mat = matrix(data = vec, nrow = iMax, byrow = TRUE)
for (i in 1:iMax){
for (j in 1:jMax) {
mat[i,j] = trunc(0.5 + rnorm(1, mean = base[i], sd = case[i]))
}
}
Mon = mat[1,]
Tue = mat[2,]
Wed = mat[3,]
Thu = mat[4,]
Fri = mat[5,]
Sat = mat[6,]
Sun = mat[7,]
for (i in 0:seed) {
set.seed(i)
focusDistance = sample(1:7, 1, replace = FALSE)
}
rownames(mat) = wase
colnames(mat) = paste("Week", 1:12)
"Success! Your data is ready, as you can see below..."
## [1] "Success! Your data is ready, as you can see below..."
mat
## Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10
## Mon 8960 9066 8940 9003 9081 8971 8978 8970 8986 9007
## Tue 9946 9992 9946 9993 9970 9891 10012 9987 10045 10047
## Wed 10041 9985 10071 10075 9967 9957 10016 10055 10111 10061
## Thu 9950 9900 9912 9993 10078 9960 9996 10095 9977 10028
## Fri 8966 8997 9069 9009 9048 8972 8995 8956 9036 8995
## Sat 4960 5030 4980 5018 4973 4988 4926 4989 4955 4990
## Sun 3010 3001 3011 2996 3027 3003 3005 2985 3014 2952
## Week 11 Week 12
## Mon 9058 8962
## Tue 10073 10035
## Wed 10074 10048
## Thu 9956 9977
## Fri 8997 9011
## Sat 4993 4992
## Sun 3027 2999
set.seed(mySeed)
myPrint4 = wase[(mySeed %% 7) + 1]
myPrint5 = myPrint4
myPrint6 = myPrint4
myPrint13x = colSums(mat)
myPrint13x = c(myPrint13x, mean(myPrint13x))
myPrint13 = round(sd(myPrint13x)/2,0)
myPrint13
## [1] 63
rm(mat)
weeks12 = paste(c(rep("Week", 12)), c(1:12))
dase = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")
wase = c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
check = c()
rain = c()
Run the following code chunk to get the directions for your next chunk.
#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
paste("Now, print the structure and statistical summary (in exactly that order) of the vector named ", myPrint4, " based on the simulated data.", sep = "")
## [1] "Now, print the structure and statistical summary (in exactly that order) of the vector named Sat based on the simulated data."
Follow the directions printed by Chunk 4 above after “[1]” (starting with “Now, print …”). Rubric: 1 point each for printing each item.
structure(Sat)
## [1] 4960 5030 4980 5018 4973 4988 4926 4989 4955 4990 4993 4992
summary(Sat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4926 4970 4988 4983 4992 5030
Run the following code chunk to get the directions for your next chunk.
#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
paste("Compute the minimum, median, average, maximum, and standard deviation of ", myPrint6, "; store the results in variables named minVehicles, medVehicles, avgVehicles, maxVehicles, and sdVehicles, respectively. Then, print the values of these variables in the order listed above, and check if they look consistent with your results in the previous chunk.", sep = "")
## [1] "Compute the minimum, median, average, maximum, and standard deviation of Sat; store the results in variables named minVehicles, medVehicles, avgVehicles, maxVehicles, and sdVehicles, respectively. Then, print the values of these variables in the order listed above, and check if they look consistent with your results in the previous chunk."
Follow the directions printed by Chunk 6 above after “[1]” (starting with “Compute the …”). Rubric: 1 point each for computing and printing each statistic.
minVehicles = min(Sat)
medVehicles = median(Sat)
avgVehicles = mean(Sat)
maxVehicles = max(Sat)
sdVehicles = sd(Sat)
minVehicles
## [1] 4926
medVehicles
## [1] 4988.5
avgVehicles
## [1] 4982.833
maxVehicles
## [1] 5030
sdVehicles
## [1] 27.62684
Now, combine all 7 vectors in order, starting with the vector named Mon and ending with Sun, and store the result in a matrix named traffic. Specifically, the order is Mon, Tue, Wed, Thu, Fri, Sat, Sun. Then, print the matrix traffic to validate your code. Tip: The matrix traffic should have one column for each day of the week (Mon, Tue, Wed, Thu, Fri, Sat, Sun) and one row for each week in numeric order (Week 1 to Week 12). Tip: Each value is the projected traffic for the corresponding day and week. Tip: Confirm the resulting matrix is the transpose of Chunk 3’s result. Specifically, The first column corresponds to Mon; the second column corresponds to Tue; etc. The first row corresponds to Week 1; the second corresponds to Week 2; etc. Tip: You will specify the row names later. Rubric: 3 points for combining the vectors, 1 point for printing the matrix.
traffic = cbind(Mon,Tue,Wed,Thu,Fri,Sat,Sun)
traffic
## Mon Tue Wed Thu Fri Sat Sun
## [1,] 8960 9946 10041 9950 8966 4960 3010
## [2,] 9066 9992 9985 9900 8997 5030 3001
## [3,] 8940 9946 10071 9912 9069 4980 3011
## [4,] 9003 9993 10075 9993 9009 5018 2996
## [5,] 9081 9970 9967 10078 9048 4973 3027
## [6,] 8971 9891 9957 9960 8972 4988 3003
## [7,] 8978 10012 10016 9996 8995 4926 3005
## [8,] 8970 9987 10055 10095 8956 4989 2985
## [9,] 8986 10045 10111 9977 9036 4955 3014
## [10,] 9007 10047 10061 10028 8995 4990 2952
## [11,] 9058 10073 10074 9956 8997 4993 3027
## [12,] 8962 10035 10048 9977 9011 4992 2999
Rename the rows in the traffic matrix with the following names: Week 1, Week 2, …, Week 12, respectively. Then, print traffic to validate your code. Tip: You can use the paste() command to concatenate/combine chr strings and vectors. Tip: m:n generates numbers from m to n automatically. Rubric: 3 points for setting the rownames, 1 point for printing the matrix.
weeks<- c("Week1", "Week2", "Week3", "Week4", "Week5", "Week6", "Week7", "Week8","Week9","Week10","Week11","Week12")
rownames(traffic)<- weeks
traffic
## Mon Tue Wed Thu Fri Sat Sun
## Week1 8960 9946 10041 9950 8966 4960 3010
## Week2 9066 9992 9985 9900 8997 5030 3001
## Week3 8940 9946 10071 9912 9069 4980 3011
## Week4 9003 9993 10075 9993 9009 5018 2996
## Week5 9081 9970 9967 10078 9048 4973 3027
## Week6 8971 9891 9957 9960 8972 4988 3003
## Week7 8978 10012 10016 9996 8995 4926 3005
## Week8 8970 9987 10055 10095 8956 4989 2985
## Week9 8986 10045 10111 9977 9036 4955 3014
## Week10 9007 10047 10061 10028 8995 4990 2952
## Week11 9058 10073 10074 9956 8997 4993 3027
## Week12 8962 10035 10048 9977 9011 4992 2999
Add a row named avgDay as the last row of the traffic matrix. Tip: The new row should contain the average traffic for each day of the week (over all weeks). Note: The new row must be the last row of traffic and not shift any previous rows up or down. Then, print traffic to validate your code. Rubric: 3 points for computing avgDay, 2 points for combining the avgDay row, 1 point for printing the matrix.
traffic<-rbind(traffic,colSums(traffic)/12)
traffic
## Mon Tue Wed Thu Fri Sat Sun
## Week1 8960.0 9946.00 10041.00 9950.000 8966.00 4960.000 3010.0
## Week2 9066.0 9992.00 9985.00 9900.000 8997.00 5030.000 3001.0
## Week3 8940.0 9946.00 10071.00 9912.000 9069.00 4980.000 3011.0
## Week4 9003.0 9993.00 10075.00 9993.000 9009.00 5018.000 2996.0
## Week5 9081.0 9970.00 9967.00 10078.000 9048.00 4973.000 3027.0
## Week6 8971.0 9891.00 9957.00 9960.000 8972.00 4988.000 3003.0
## Week7 8978.0 10012.00 10016.00 9996.000 8995.00 4926.000 3005.0
## Week8 8970.0 9987.00 10055.00 10095.000 8956.00 4989.000 2985.0
## Week9 8986.0 10045.00 10111.00 9977.000 9036.00 4955.000 3014.0
## Week10 9007.0 10047.00 10061.00 10028.000 8995.00 4990.000 2952.0
## Week11 9058.0 10073.00 10074.00 9956.000 8997.00 4993.000 3027.0
## Week12 8962.0 10035.00 10048.00 9977.000 9011.00 4992.000 2999.0
## 8998.5 9994.75 10038.42 9985.167 9004.25 4982.833 3002.5
Now, using a for-loop, compute a vector named weeklyTraffic that contains the total traffic in the week (by adding up the daily traffic for each day), including the row named avgDay. Then, print the weeklyTraffic vector and the number of elements in it to validate your code. Note: Your code must work for any number of rows and columns in the traffic matrix. Tip: The number of elements in this vector should equal the number of rows in the traffic matrix. Rubric: 4 points for computing weeklyTraffic, 1 point for printing weeklyTraffic, 1 point for printing the number of elements in weeklyTraffic.
sum=0
for (i in 1:nrow(traffic))
{sum = sum + traffic[i,]
sum
}
weeklyTraffic = c(sum)
weeklyTraffic
## Mon Tue Wed Thu Fri Sat Sun
## 116980.50 129931.75 130499.42 129807.17 117055.25 64776.83 39032.50
Now, combine the vector weeklyTraffic with the matrix traffic as the first column of the resulting traffic matrix. Then, print the traffic matrix to validate your code. Note: The new column will become the first column of the resulting traffic matrix (and will shift all previous columns to the right). Rubric: 2 points for combining the weeklyTraffic column, 1 point for printing the matrix.
traffic<- cbind(traffic,weeklyTraffic)
## Warning in cbind(traffic, weeklyTraffic): number of rows of result is not a
## multiple of vector length (arg 2)
traffic
## Mon Tue Wed Thu Fri Sat Sun weeklyTraffic
## Week1 8960.0 9946.00 10041.00 9950.000 8966.00 4960.000 3010.0 116980.50
## Week2 9066.0 9992.00 9985.00 9900.000 8997.00 5030.000 3001.0 129931.75
## Week3 8940.0 9946.00 10071.00 9912.000 9069.00 4980.000 3011.0 130499.42
## Week4 9003.0 9993.00 10075.00 9993.000 9009.00 5018.000 2996.0 129807.17
## Week5 9081.0 9970.00 9967.00 10078.000 9048.00 4973.000 3027.0 117055.25
## Week6 8971.0 9891.00 9957.00 9960.000 8972.00 4988.000 3003.0 64776.83
## Week7 8978.0 10012.00 10016.00 9996.000 8995.00 4926.000 3005.0 39032.50
## Week8 8970.0 9987.00 10055.00 10095.000 8956.00 4989.000 2985.0 116980.50
## Week9 8986.0 10045.00 10111.00 9977.000 9036.00 4955.000 3014.0 129931.75
## Week10 9007.0 10047.00 10061.00 10028.000 8995.00 4990.000 2952.0 130499.42
## Week11 9058.0 10073.00 10074.00 9956.000 8997.00 4993.000 3027.0 129807.17
## Week12 8962.0 10035.00 10048.00 9977.000 9011.00 4992.000 2999.0 117055.25
## 8998.5 9994.75 10038.42 9985.167 9004.25 4982.833 3002.5 64776.83
Run the following code chunk to get the directions for your next chunk.
#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
cutoff = myPrint13
paste("For the next chunk, use a cut-off of", cutoff, "such that:")
## [1] "For the next chunk, use a cut-off of 63 such that:"
paste("a. ‘high’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly greater than the average weekly traffic (based on avgDay row's weeklyTraffic column) plus ", cutoff, ". (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘high’ corresponds to any week with traffic greater than or equal to ", 55000 + cutoff, ".)", sep = "")
## [1] "a. ‘high’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly greater than the average weekly traffic (based on avgDay row's weeklyTraffic column) plus 63. (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘high’ corresponds to any week with traffic greater than or equal to 55063.)"
paste("b. ‘low’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly less than the average weekly traffic (based on avgDay row's weeklyTraffic column) minus ", cutoff, ". (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘low’ corresponds to any week with traffic less than or equal to ", 55000 - cutoff, ".)", sep = "")
## [1] "b. ‘low’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly less than the average weekly traffic (based on avgDay row's weeklyTraffic column) minus 63. (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘low’ corresponds to any week with traffic less than or equal to 54937.)"
paste("c. ‘mid’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is +/- ", cutoff, " (inclusive) of the average weekly traffic (based on avgDay row's weeklyTraffic column). (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘mid’ corresponds to any week with traffic between, but excluding, ", 55000 - cutoff, " and ", 55000 + cutoff, ".)", sep = "")
## [1] "c. ‘mid’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is +/- 63 (inclusive) of the average weekly traffic (based on avgDay row's weeklyTraffic column). (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘mid’ corresponds to any week with traffic between, but excluding, 54937 and 55063.)"
Using a for-loop, compute a vector named trafficBand (of type chr) based on the following specifications: 1. The for-loop should work regardless of the number of rows and columns in traffic. 2. The value of trafficBand for each week must have three possible values (high, mid, and low) based on the weeklyTraffic for that week compared to the value in the weeklyTraffic in the avgDay column. Compute trafficBand’s value (high, mid, and low) for each week based on the criteria specified in the four lines printed by the previous chunk, starting with “For the next chunk…” (ignore the “[1]” at the start of each line). Then, print trafficBand to validate your code. Tip: trafficBand will have 13 values, one for each week (the first 12 values) and one for the average of all weeks (which is the last value, which should be mid). Rubric: 1/2 point each for each week, 1 point for printing trafficBand.
traffic<-cbind(traffic,rowSums(traffic)/7)
traffic
## Mon Tue Wed Thu Fri Sat Sun weeklyTraffic
## Week1 8960.0 9946.00 10041.00 9950.000 8966.00 4960.000 3010.0 116980.50
## Week2 9066.0 9992.00 9985.00 9900.000 8997.00 5030.000 3001.0 129931.75
## Week3 8940.0 9946.00 10071.00 9912.000 9069.00 4980.000 3011.0 130499.42
## Week4 9003.0 9993.00 10075.00 9993.000 9009.00 5018.000 2996.0 129807.17
## Week5 9081.0 9970.00 9967.00 10078.000 9048.00 4973.000 3027.0 117055.25
## Week6 8971.0 9891.00 9957.00 9960.000 8972.00 4988.000 3003.0 64776.83
## Week7 8978.0 10012.00 10016.00 9996.000 8995.00 4926.000 3005.0 39032.50
## Week8 8970.0 9987.00 10055.00 10095.000 8956.00 4989.000 2985.0 116980.50
## Week9 8986.0 10045.00 10111.00 9977.000 9036.00 4955.000 3014.0 129931.75
## Week10 9007.0 10047.00 10061.00 10028.000 8995.00 4990.000 2952.0 130499.42
## Week11 9058.0 10073.00 10074.00 9956.000 8997.00 4993.000 3027.0 129807.17
## Week12 8962.0 10035.00 10048.00 9977.000 9011.00 4992.000 2999.0 117055.25
## 8998.5 9994.75 10038.42 9985.167 9004.25 4982.833 3002.5 64776.83
##
## Week1 24687.64
## Week2 26557.54
## Week3 26632.63
## Week4 26556.31
## Week5 24742.75
## Week6 17216.98
## Week7 13565.79
## Week8 24716.79
## Week9 26579.39
## Week10 26654.20
## Week11 26569.31
## Week12 24725.61
## 17254.75
#trafficBand = c()
#for(i in 1:nrow(traffic))
#{
# if(traffic[weeklyTraffic,i]>"V9"+63)
# {trafficBand=c(trafficBand,high)
# }
# else if(traffic[weeklyTraffic,i]<"V9"-63)
# {trafficBand=c(trafficBand,low)
# }
# else if(traffic[weeklyTraffic,i]=="V9"+63)
# {trafficBand=c(trafficBand,mid)
# }
# else if(traffic[weeklyTraffic,i]=="V9"-63)
# {trafficBand=c(trafficBand,mid)
# }
#}
Now, print the absolute value of the difference between the number of “low” weeks and the number of “high” weeks. Tip: You can do this in 1 to 3 lines of code. Tip: The result can never be negative and should be 0 or close to 0. Tip: The last value should be “mid” (so you do not need to worry about handling it specially).
#length(low)- length(high)
Add a column named trafficRank to traffic as its first (leftmost) column. This new column should have the type ordered factor with three possible values: high, mid, and low. Each value in trafficRank is based on the corresponding value in trafficBand (low = “low”, mid = “mid”, high = “high”). The values high, mid, and low (in trafficRank) must correspond to the factor values of 3, 2, and 1, respectively. Then, print traffic to validate your code. Note: The new column will become the traffic’s first column and shift all the previous columns right by one. Tip: You can do this in three steps (define trafficRank, then add to traffic, and then print traffic). Rubric: 5 points for computing trafficRank, 2 points for combining trafficRank and traffic, and 1 point for printing traffic.
#trafficRank = factor(traffic,ordered = TRUE, levels=c("low","mid","high"))
#trafficRank
#traffic<-cbind(traffic,trafficRank)
#traffic
Now, convert traffic to a dataframe named dftraffic. Then, add a column named trafficBand to dftraffic. This new column should go on the left of dftraffic as its first column (and should correspond to the trafficBand vector that you computed earlier). Then, print dftraffic to validate your code. Rubric: 2 points for converting to dataframe, 2 points for combining trafficBand and traffic, and 1 point for printing traffic.
#dftraffic<-data.frame(traffic,trafficBand)
#dftraffic
Knit to html after eliminating all the errors. Save this .html file. After you have completed both parts, submit the .Rmd and .html for all parts to Canvas. If your file does not knit correctly, just submit the Rmd file. Tip: Do not worry about minor formatting issues.
#No code needed