Below is your midterm take home exam for DSCI 101 - Fall 2024. By taking this exam you are not working with any other student in the class or other resources outside of this class. You may consult your class notes or the help buttons in R. If you are found to be working with someone else (classmate, tutor… etc.) an automatic 0 will be given on the midterm exam (both in class and take home) and you will be reported to the Dean’s Office.
Use functions learned from the tidyverse package to
complete this exam. Load all the packages you will use throughout this
exam in the loading packages section below.
Make sure to answer in full sentences if you see YOUR WRITTEN ANSWER HERE after a question.
Answer each question using this same file. Your codes should go inside the codechunks.
You will be examining the class survey dataset that you had to take at the beginning of the semester. I have cleaned the dataset for general use. Please refer back to the survey here.
(15 points) Submit the html file in Sakai before the deadline. 15 points will be deducted if submitted in rmd format.
(5 points)
library(tidyverse)
library(mdsr)
(20 points) Using the midterm_data excel file found in
the midterm_exam folder. Answer the following questions:
midterm_data.midterm_data<-read.csv("~/DSCI_101/midterm_exam/midterm_data.csv")
It should be a numeric class, but it is showing as a character. This is because one of the responses is not a number, but instead an explanation with words.
class(midterm_data$exercise)
## [1] "character"
midterm_data$exercise<-as.numeric(midterm_data$exercise)
class(midterm_data$exercise)
## [1] "numeric"
Answer the following questions:
Freshman with 33 students.
question_2a<-midterm_data %>%
group_by(grad_class) %>%
summarize(total_grad = n())
Gemini
question_2b<-midterm_data %>%
group_by(zodiac) %>%
summarize(total_zodiac = n()) %>%
arrange(desc(total_zodiac))%>%
head(1)
US Midwest
question_2c<-midterm_data %>%
group_by(born) %>%
summarize(total_born = n()) %>%
filter(born != "Illinois" & born != "Outside of the US")
question_2d<-midterm_data %>%
group_by(grade) %>%
summarize(total_grade = n()) %>%
arrange(desc(grade))
(15 points) If a person is 5’7 (5 feet and seven inches), then their total height in inches is 67 inches (since there are 12 inches per foot). Find each student’s total height in inches. Then find the average height in inches across all students. Your final output should be a printed dataframe that has one column reflecting the average height (in inches) across all students
question_3<-midterm_data %>%
mutate(total_inches = (height_feet * 12) + height_inches) %>%
select(total_inches)
(20 points) Is there a difference in the average number of exercise time in students who are seniors and those that are not seniors? Your final output should be a printed dataframe that has two columns reflecting the graduating class (seniors & not seniors) and another column reflecting the average number of exercise
Yes there is a slight difference in the averages in the exercise between grades, Junior being the most and Freshman being the least.
question_4<-midterm_data %>%
group_by(grad_class) %>%
summarize(class_exercise = mean(exercise, na.rm=TRUE))
(20 points) In all of Prof. Abdalla’s classes, the final grade that a student gets at the end of the semester is computed as a weighted average. What this means is that some categories weigh more than other categories when computing your final average. In DSCI 101, the final grade is computed using the following weights:
| Categories | Percentage |
|---|---|
| Participation | 5% |
| Homework | 25% |
| 2 Midterms | 45% |
| Final Exams | 25% |
The way this works is that every weight is multiplied by the final
average score you get in every category. For example, if you get a an
average grade of 98, 100, 85, and 90 on participation, homework,
midterms, and final exam, your final grade will be computed as:
0.05*98+100*0.25+85*0.45+90*0.25=90.65 Create a function
such that the argument is a vector of scores where the first, second,
third, and fourth elements of the vector represent a student’s average
score on participation, homework, midterms, and final exam,
respectively, and the outcome is the student’s final grade in the class.
Once you are done building the function you have to test it out with the
following vector (to make sure it’s working properly):
scores = c(98,100,85,90)
90.65
scores<-function(participation, homework, midterms, finalexam){
grade<-(0.05*participation)+(homework*0.25)+(midterms*0.45)+(finalexam*0.25)
return(grade)
}
scores(98,100,85,90)
## [1] 90.65
(20 points) In a company, employees get different kinds of perks based on the amount of hours they worked during the year. If the employee worked less than or equal to 2000 hours, they will not get a perk and they will get paid $10 an hour. If they worked more than 2000 hours but less than or equal to 3000 hours they will get a paid trip to Hawaii and they will get paid $20 an hour. Finally, if they worked more than 3000 hours, they will get a brand new car and they will get paid $30 an hour.
employee <- c("Mary", "John", "Tim", "Holly", "Glen")
hours <- c(2080, 3120, 2600, 1900, 4000)
Create a for loop that computes and prints each employee’s perk along with their annual salary (hours worked * hourly rate). The final output should look like:
“Mary will get a paid trip to Hawaii and a salary of 41600”
“John will get a brand new car and a salary of 93600”… etc.
for (i in 1:5) {
if (hours[i]<2000){
print(paste(employee[i], "will get a salary of", hours[i] * 10))}
else if (2000<hours[i]& hours[i]<3000){
print(paste(employee[i], "will get a paid trip to Hawaii and a salary of", hours[i] * 20))
} else if (hours[i]>3000){
print(paste(employee[i], "will get a brand new car and a salary of", hours[i] * 30))
}
}
## [1] "Mary will get a paid trip to Hawaii and a salary of 41600"
## [1] "John will get a brand new car and a salary of 93600"
## [1] "Tim will get a paid trip to Hawaii and a salary of 52000"
## [1] "Holly will get a salary of 19000"
## [1] "Glen will get a brand new car and a salary of 120000"