[1 point] Q1.

Enter your name and today’s date in Lines 3 & 4, and then run this chunk. Note: Numbering corresponds to chunk numbers. Chunk 1 specified the knitting parameters.

R Markdown

Reminder: This is the R Proficiency Exam Part 1 of 3. Please note the other parts of the R Proficiency Exam are in separate files. Your submission should include BOTH HTML and Rmd files from ALL PARTS. If your file does not knit correctly, submit the Rmd file for partial credit.

Goal: Demonstrate Introductory R Proficiency

Directions for the exam: 1. Your Name on Line 3 and netid in Chunk 2 indicate your compliance with the Fuqua Honor Code (https://www.fuqua.duke.edu/honorcode) for this part of the Final Exam. FILLING IN YOUR NAME AND NETID ARE BOTH REQUIRED TO RECEIVE CREDIT FOR THIS EXAM.
2. This exam is an open book/internet/notes/etc.; you must cite all sources you consult (using the URL or other identifiable information). 3. You can’t discuss with any living being except Dr. Salman Azhar. 4. This exam uses data similar to the data used in Practice Exams. While the data is slightly different, the structure is similar. 5. To maintain consistency across students, we will only answer logistical questions during the exam. For instance, “When I have submitted my Exam, can I walk away from my computer?” (Note: You can). If you have a logistical question, send it to Dr. Salman Azhar in the Zoom chat window. If you have an emergency, call +1 408-806-3500, and we’ll solve it together.

[1 point] Q2.

Replace “sa239” with your specific NetID (without removing the quotes). Your NetID is composed of your initials followed by numbers. My NetID is sa239. For instance, if your NetID is abc123, then your code should read: myNetID = “abc123”. THIS IS REQUIRED FOR YOU TO GET CREDIT FOR THIS EXAM. Your NetID in this chunk indicates your compliance with the Fuqua Honor Code (https://www.fuqua.duke.edu/honorcode) for this exam.

#REPLACE netid WITH YOUR NETID and DO NOT CHANGE ANY OTHER PART OF THIS CODE CHUNK (just run it).
myNetID = "ks715"

[1 point] Q3.

Run the following code chunk to initialize 7 vectors; each vector has 12 elements. a. Each vector’s each element contains a simulation of the daily projected vehicle count using I-885’s Exit 14 after it opens. b. The 7 vectors correspond to the day of the week (Mon, Tue, Wed, Thu, Fri, Sat, Sun). c. The 12 elements of each vector correspond to the first 12 weeks in the calendar order; the 1st element corresponds to the 1st week, the 2nd element corresponds to the 2nd week, etc. Tip: You do not need (and should not try) to understand the code in this chunk. You need to run it and understand the above description.

#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
splitted = strsplit(myNetID, "")
seed = nchar(myNetID)
isnumber = c()
myNetLetters = c()
for (i in 1:nchar(myNetID)) {
  isnumber = c(isnumber, is.numeric(splitted[[1]][i]))
  myNetLetters = c(myNetLetters, splitted[[1]][i])
}
if (FALSE == (("0" <= myNetLetters[nchar(myNetID)]) & (myNetLetters[nchar(myNetID)] <= "9"))) {
  myNetLetters[nchar(myNetID)] = "0"
}
mySeed = 2*(as.numeric(myNetLetters[nchar(myNetID)]) + seed)/2 - nchar(myNetID)
iMax = 7
jMax = 12
ijMax = iMax*jMax
distance = ( (seed-1) %% 4 + 1)*100
set.seed(1)
set.seed(mySeed)
middle = 0.222
stddev = middle/3.012
base = c(9000, rep(10000, 3), 9000, 5000, 3000)
case = round(sqrt(base)/2,2)
weeks12 = paste(c(rep("Week", 12)), c(1:12))
dase = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")
wase = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
check = c()
rain = c()
vec = c(1:ijMax)
mat = matrix(data = vec, nrow = iMax, byrow = TRUE)
for (i in 1:iMax){
  for (j in 1:jMax) {
    mat[i,j] = trunc(0.5 + rnorm(1, mean = base[i], sd = case[i]))
  }
}
Mon = mat[1,]
Tue = mat[2,]
Wed = mat[3,]
Thu = mat[4,]
Fri = mat[5,]
Sat = mat[6,]
Sun = mat[7,]
for (i in 0:seed) {
  set.seed(i)
  focusDistance = sample(1:7, 1, replace = FALSE)
}
rownames(mat) = wase
colnames(mat) = paste("Week", 1:12)
"Success! Your data is ready, as you can see below..."
## [1] "Success! Your data is ready, as you can see below..."
mat
##     Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10
## Mon   8960   9066   8940   9003   9081   8971   8978   8970   8986    9007
## Tue   9946   9992   9946   9993   9970   9891  10012   9987  10045   10047
## Wed  10041   9985  10071  10075   9967   9957  10016  10055  10111   10061
## Thu   9950   9900   9912   9993  10078   9960   9996  10095   9977   10028
## Fri   8966   8997   9069   9009   9048   8972   8995   8956   9036    8995
## Sat   4960   5030   4980   5018   4973   4988   4926   4989   4955    4990
## Sun   3010   3001   3011   2996   3027   3003   3005   2985   3014    2952
##     Week 11 Week 12
## Mon    9058    8962
## Tue   10073   10035
## Wed   10074   10048
## Thu    9956    9977
## Fri    8997    9011
## Sat    4993    4992
## Sun    3027    2999
set.seed(mySeed)
myPrint4 = wase[(mySeed %% 7) + 1]
myPrint5 = myPrint4
myPrint6 = myPrint4
myPrint13x = colSums(mat)
myPrint13x = c(myPrint13x, mean(myPrint13x))
myPrint13 = round(sd(myPrint13x)/2,0)
myPrint13
## [1] 63
rm(mat)
weeks12 = paste(c(rep("Week", 12)), c(1:12))
dase = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")
wase = c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
check = c()
rain = c()

[1 point] Q4.

Run the following code chunk to get the directions for your next chunk.

#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
paste("Now, print the structure and statistical summary (in exactly that order) of the vector named ", myPrint4, " based on the simulated data.", sep = "")
## [1] "Now, print the structure and statistical summary (in exactly that order) of the vector named Sat based on the simulated data."

[2 points] Q5.

Follow the directions printed by Chunk 4 above after “[1]” (starting with “Now, print …”). Rubric: 1 point each for printing each item.

structure(Sat)
##  [1] 4960 5030 4980 5018 4973 4988 4926 4989 4955 4990 4993 4992
summary(Sat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4926    4970    4988    4983    4992    5030

[1 point] Q6.

Run the following code chunk to get the directions for your next chunk.

#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
paste("Compute the minimum, median, average, maximum, and standard deviation of ", myPrint6, "; store the results in variables named minVehicles, medVehicles, avgVehicles, maxVehicles, and sdVehicles, respectively. Then, print the values of these variables in the order listed above, and check if they look consistent with your results in the previous chunk.", sep = "")
## [1] "Compute the minimum, median, average, maximum, and standard deviation of Sat; store the results in variables named minVehicles, medVehicles, avgVehicles, maxVehicles, and sdVehicles, respectively. Then, print the values of these variables in the order listed above, and check if they look consistent with your results in the previous chunk."

[5 points] Q7.

Follow the directions printed by Chunk 6 above after “[1]” (starting with “Compute the …”). Rubric: 1 point each for computing and printing each statistic.

minVehicles = min(Sat) 

medVehicles = median(Sat)

avgVehicles = mean(Sat)
  
maxVehicles = max(Sat)
  
sdVehicles = sd(Sat)
  
minVehicles
## [1] 4926
medVehicles
## [1] 4988.5
avgVehicles
## [1] 4982.833
maxVehicles
## [1] 5030
sdVehicles
## [1] 27.62684

[4 points] Q8.

Now, combine all 7 vectors in order, starting with the vector named Mon and ending with Sun, and store the result in a matrix named traffic. Specifically, the order is Mon, Tue, Wed, Thu, Fri, Sat, Sun. Then, print the matrix traffic to validate your code. Tip: The matrix traffic should have one column for each day of the week (Mon, Tue, Wed, Thu, Fri, Sat, Sun) and one row for each week in numeric order (Week 1 to Week 12). Tip: Each value is the projected traffic for the corresponding day and week. Tip: Confirm the resulting matrix is the transpose of Chunk 3’s result. Specifically, The first column corresponds to Mon; the second column corresponds to Tue; etc. The first row corresponds to Week 1; the second corresponds to Week 2; etc. Tip: You will specify the row names later. Rubric: 3 points for combining the vectors, 1 point for printing the matrix.

traffic = cbind(Mon,Tue,Wed,Thu,Fri,Sat,Sun)
traffic
##        Mon   Tue   Wed   Thu  Fri  Sat  Sun
##  [1,] 8960  9946 10041  9950 8966 4960 3010
##  [2,] 9066  9992  9985  9900 8997 5030 3001
##  [3,] 8940  9946 10071  9912 9069 4980 3011
##  [4,] 9003  9993 10075  9993 9009 5018 2996
##  [5,] 9081  9970  9967 10078 9048 4973 3027
##  [6,] 8971  9891  9957  9960 8972 4988 3003
##  [7,] 8978 10012 10016  9996 8995 4926 3005
##  [8,] 8970  9987 10055 10095 8956 4989 2985
##  [9,] 8986 10045 10111  9977 9036 4955 3014
## [10,] 9007 10047 10061 10028 8995 4990 2952
## [11,] 9058 10073 10074  9956 8997 4993 3027
## [12,] 8962 10035 10048  9977 9011 4992 2999

[4 points] Q9.

Rename the rows in the traffic matrix with the following names: Week 1, Week 2, …, Week 12, respectively. Then, print traffic to validate your code. Tip: You can use the paste() command to concatenate/combine chr strings and vectors. Tip: m:n generates numbers from m to n automatically. Rubric: 3 points for setting the rownames, 1 point for printing the matrix.

weeks<- c("Week1", "Week2", "Week3", "Week4", "Week5", "Week6", "Week7", "Week8","Week9","Week10","Week11","Week12")
rownames(traffic)<- weeks
traffic
##         Mon   Tue   Wed   Thu  Fri  Sat  Sun
## Week1  8960  9946 10041  9950 8966 4960 3010
## Week2  9066  9992  9985  9900 8997 5030 3001
## Week3  8940  9946 10071  9912 9069 4980 3011
## Week4  9003  9993 10075  9993 9009 5018 2996
## Week5  9081  9970  9967 10078 9048 4973 3027
## Week6  8971  9891  9957  9960 8972 4988 3003
## Week7  8978 10012 10016  9996 8995 4926 3005
## Week8  8970  9987 10055 10095 8956 4989 2985
## Week9  8986 10045 10111  9977 9036 4955 3014
## Week10 9007 10047 10061 10028 8995 4990 2952
## Week11 9058 10073 10074  9956 8997 4993 3027
## Week12 8962 10035 10048  9977 9011 4992 2999

[6 points] Q10.

Add a row named avgDay as the last row of the traffic matrix. Tip: The new row should contain the average traffic for each day of the week (over all weeks). Note: The new row must be the last row of traffic and not shift any previous rows up or down. Then, print traffic to validate your code. Rubric: 3 points for computing avgDay, 2 points for combining the avgDay row, 1 point for printing the matrix.

traffic<-rbind(traffic,colSums(traffic)/12)
traffic
##           Mon      Tue      Wed       Thu     Fri      Sat    Sun
## Week1  8960.0  9946.00 10041.00  9950.000 8966.00 4960.000 3010.0
## Week2  9066.0  9992.00  9985.00  9900.000 8997.00 5030.000 3001.0
## Week3  8940.0  9946.00 10071.00  9912.000 9069.00 4980.000 3011.0
## Week4  9003.0  9993.00 10075.00  9993.000 9009.00 5018.000 2996.0
## Week5  9081.0  9970.00  9967.00 10078.000 9048.00 4973.000 3027.0
## Week6  8971.0  9891.00  9957.00  9960.000 8972.00 4988.000 3003.0
## Week7  8978.0 10012.00 10016.00  9996.000 8995.00 4926.000 3005.0
## Week8  8970.0  9987.00 10055.00 10095.000 8956.00 4989.000 2985.0
## Week9  8986.0 10045.00 10111.00  9977.000 9036.00 4955.000 3014.0
## Week10 9007.0 10047.00 10061.00 10028.000 8995.00 4990.000 2952.0
## Week11 9058.0 10073.00 10074.00  9956.000 8997.00 4993.000 3027.0
## Week12 8962.0 10035.00 10048.00  9977.000 9011.00 4992.000 2999.0
##        8998.5  9994.75 10038.42  9985.167 9004.25 4982.833 3002.5

[6 points] Q11.

Now, using a for-loop, compute a vector named weeklyTraffic that contains the total traffic in the week (by adding up the daily traffic for each day), including the row named avgDay. Then, print the weeklyTraffic vector and the number of elements in it to validate your code. Note: Your code must work for any number of rows and columns in the traffic matrix. Tip: The number of elements in this vector should equal the number of rows in the traffic matrix. Rubric: 4 points for computing weeklyTraffic, 1 point for printing weeklyTraffic, 1 point for printing the number of elements in weeklyTraffic.

sum=0
for (i in 1:nrow(traffic))
{sum = sum + traffic[i,] 
 sum
}
weeklyTraffic = c(sum)
weeklyTraffic
##       Mon       Tue       Wed       Thu       Fri       Sat       Sun 
## 116980.50 129931.75 130499.42 129807.17 117055.25  64776.83  39032.50

[3 points] Q12.

Now, combine the vector weeklyTraffic with the matrix traffic as the first column of the resulting traffic matrix. Then, print the traffic matrix to validate your code. Note: The new column will become the first column of the resulting traffic matrix (and will shift all previous columns to the right). Rubric: 2 points for combining the weeklyTraffic column, 1 point for printing the matrix.

traffic<- cbind(traffic,weeklyTraffic)
## Warning in cbind(traffic, weeklyTraffic): number of rows of result is not a
## multiple of vector length (arg 2)
traffic
##           Mon      Tue      Wed       Thu     Fri      Sat    Sun weeklyTraffic
## Week1  8960.0  9946.00 10041.00  9950.000 8966.00 4960.000 3010.0     116980.50
## Week2  9066.0  9992.00  9985.00  9900.000 8997.00 5030.000 3001.0     129931.75
## Week3  8940.0  9946.00 10071.00  9912.000 9069.00 4980.000 3011.0     130499.42
## Week4  9003.0  9993.00 10075.00  9993.000 9009.00 5018.000 2996.0     129807.17
## Week5  9081.0  9970.00  9967.00 10078.000 9048.00 4973.000 3027.0     117055.25
## Week6  8971.0  9891.00  9957.00  9960.000 8972.00 4988.000 3003.0      64776.83
## Week7  8978.0 10012.00 10016.00  9996.000 8995.00 4926.000 3005.0      39032.50
## Week8  8970.0  9987.00 10055.00 10095.000 8956.00 4989.000 2985.0     116980.50
## Week9  8986.0 10045.00 10111.00  9977.000 9036.00 4955.000 3014.0     129931.75
## Week10 9007.0 10047.00 10061.00 10028.000 8995.00 4990.000 2952.0     130499.42
## Week11 9058.0 10073.00 10074.00  9956.000 8997.00 4993.000 3027.0     129807.17
## Week12 8962.0 10035.00 10048.00  9977.000 9011.00 4992.000 2999.0     117055.25
##        8998.5  9994.75 10038.42  9985.167 9004.25 4982.833 3002.5      64776.83

[1 point] Q13.

Run the following code chunk to get the directions for your next chunk.

#DO NOT CHANGE ANY PART OF THIS CODE CHUNK (just run it).
cutoff = myPrint13
paste("For the next chunk, use a cut-off of", cutoff, "such that:")
## [1] "For the next chunk, use a cut-off of 63 such that:"
paste("a. ‘high’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly greater than the average weekly traffic (based on avgDay row's weeklyTraffic column) plus ", cutoff, ". (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘high’ corresponds to any week with traffic greater than or equal to ", 55000 + cutoff, ".)", sep = "")
## [1] "a. ‘high’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly greater than the average weekly traffic (based on avgDay row's weeklyTraffic column) plus 63. (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘high’ corresponds to any week with traffic greater than or equal to 55063.)"
paste("b. ‘low’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly less than the average weekly traffic (based on avgDay row's weeklyTraffic column) minus ", cutoff, ". (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘low’ corresponds to any week with traffic less than or equal to ", 55000 - cutoff, ".)", sep = "")
## [1] "b. ‘low’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is strictly less than the average weekly traffic (based on avgDay row's weeklyTraffic column) minus 63. (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘low’ corresponds to any week with traffic less than or equal to 54937.)"
paste("c. ‘mid’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is +/- ", cutoff, " (inclusive) of the average weekly traffic (based on avgDay row's weeklyTraffic column). (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘mid’ corresponds to any week with traffic between, but excluding, ", 55000 - cutoff, " and ", 55000 + cutoff, ".)", sep = "")
## [1] "c. ‘mid’ is any week in which the weekly traffic (based on the weeklyTraffic column of that week) is +/- 63 (inclusive) of the average weekly traffic (based on avgDay row's weeklyTraffic column). (For example, if the weeklyTraffic row's avgDay column is 55000, then ‘mid’ corresponds to any week with traffic between, but excluding, 54937 and 55063.)"

[7 points] Q14.

Using a for-loop, compute a vector named trafficBand (of type chr) based on the following specifications: 1. The for-loop should work regardless of the number of rows and columns in traffic. 2. The value of trafficBand for each week must have three possible values (high, mid, and low) based on the weeklyTraffic for that week compared to the value in the weeklyTraffic in the avgDay column. Compute trafficBand’s value (high, mid, and low) for each week based on the criteria specified in the four lines printed by the previous chunk, starting with “For the next chunk…” (ignore the “[1]” at the start of each line). Then, print trafficBand to validate your code. Tip: trafficBand will have 13 values, one for each week (the first 12 values) and one for the average of all weeks (which is the last value, which should be mid). Rubric: 1/2 point each for each week, 1 point for printing trafficBand.

traffic<-cbind(traffic,rowSums(traffic)/7)
traffic
##           Mon      Tue      Wed       Thu     Fri      Sat    Sun weeklyTraffic
## Week1  8960.0  9946.00 10041.00  9950.000 8966.00 4960.000 3010.0     116980.50
## Week2  9066.0  9992.00  9985.00  9900.000 8997.00 5030.000 3001.0     129931.75
## Week3  8940.0  9946.00 10071.00  9912.000 9069.00 4980.000 3011.0     130499.42
## Week4  9003.0  9993.00 10075.00  9993.000 9009.00 5018.000 2996.0     129807.17
## Week5  9081.0  9970.00  9967.00 10078.000 9048.00 4973.000 3027.0     117055.25
## Week6  8971.0  9891.00  9957.00  9960.000 8972.00 4988.000 3003.0      64776.83
## Week7  8978.0 10012.00 10016.00  9996.000 8995.00 4926.000 3005.0      39032.50
## Week8  8970.0  9987.00 10055.00 10095.000 8956.00 4989.000 2985.0     116980.50
## Week9  8986.0 10045.00 10111.00  9977.000 9036.00 4955.000 3014.0     129931.75
## Week10 9007.0 10047.00 10061.00 10028.000 8995.00 4990.000 2952.0     130499.42
## Week11 9058.0 10073.00 10074.00  9956.000 8997.00 4993.000 3027.0     129807.17
## Week12 8962.0 10035.00 10048.00  9977.000 9011.00 4992.000 2999.0     117055.25
##        8998.5  9994.75 10038.42  9985.167 9004.25 4982.833 3002.5      64776.83
##                
## Week1  24687.64
## Week2  26557.54
## Week3  26632.63
## Week4  26556.31
## Week5  24742.75
## Week6  17216.98
## Week7  13565.79
## Week8  24716.79
## Week9  26579.39
## Week10 26654.20
## Week11 26569.31
## Week12 24725.61
##        17254.75
#trafficBand = c()
#for(i in 1:nrow(traffic))
#{
#  if(traffic[weeklyTraffic,i]>"V9"+63)
#  {trafficBand=c(trafficBand,high) 
#  }
#  else if(traffic[weeklyTraffic,i]<"V9"-63)
#  {trafficBand=c(trafficBand,low)
#  }
#  else if(traffic[weeklyTraffic,i]=="V9"+63)
#  {trafficBand=c(trafficBand,mid)
#  }
#  else if(traffic[weeklyTraffic,i]=="V9"-63)
#  {trafficBand=c(trafficBand,mid)
#  }
#}

[3 points] Q15.

Now, print the absolute value of the difference between the number of “low” weeks and the number of “high” weeks. Tip: You can do this in 1 to 3 lines of code. Tip: The result can never be negative and should be 0 or close to 0. Tip: The last value should be “mid” (so you do not need to worry about handling it specially).

#length(low)- length(high)

[8 points] Q16.

Add a column named trafficRank to traffic as its first (leftmost) column. This new column should have the type ordered factor with three possible values: high, mid, and low. Each value in trafficRank is based on the corresponding value in trafficBand (low = “low”, mid = “mid”, high = “high”). The values high, mid, and low (in trafficRank) must correspond to the factor values of 3, 2, and 1, respectively. Then, print traffic to validate your code. Note: The new column will become the traffic’s first column and shift all the previous columns right by one. Tip: You can do this in three steps (define trafficRank, then add to traffic, and then print traffic). Rubric: 5 points for computing trafficRank, 2 points for combining trafficRank and traffic, and 1 point for printing traffic.

#trafficRank = factor(traffic,ordered = TRUE, levels=c("low","mid","high"))
#trafficRank
#traffic<-cbind(traffic,trafficRank)
#traffic

[5 points] Q17.

Now, convert traffic to a dataframe named dftraffic. Then, add a column named trafficBand to dftraffic. This new column should go on the left of dftraffic as its first column (and should correspond to the trafficBand vector that you computed earlier). Then, print dftraffic to validate your code. Rubric: 2 points for converting to dataframe, 2 points for combining trafficBand and traffic, and 1 point for printing traffic.

#dftraffic<-data.frame(traffic,trafficBand)
#dftraffic

[4 points] Q18.

Knit to html after eliminating all the errors. Save this .html file. After you have completed both parts, submit the .Rmd and .html for all parts to Canvas. If your file does not knit correctly, just submit the Rmd file. Tip: Do not worry about minor formatting issues.

#No code needed