Answer all questions. Duration of test is 2 hours. Please STOP after 2 hours. Each question has the same weightage. Most questions DO NOT have a single correct answer. So leave comments when making an assumption or choice. You can use all available resources to solve questions (including Google and RStudio). Please feel free to discuss with others. Different and original thought process is given more weightage. Hence letting others directly copy your code may not be a good idea (Although discussion with others may help with clarity on your own thought process by bringing in other perspectives)
Some questions are real world questions. They may convey an ask from a stakeholder. They might not give 100% clarity on the question. Make intelligent assumptions or simplifications. But state them clearly as comments
Evaluation criteria
Submit a <yourname>.R file and a <yourname>_app.R file (for the Shiny app submission). I expect two <>.R file submissions per person. Please ensure you add library() components for any packages that you use to your submission files. This is to ensure that your code file runs as is when it is being evaluated
y <- list("x", "y", "z") and q <- list("X", "Y", "Z", "x", "y", "z"). Write code that will return all elements of q that are not in y, with the following result[[1]]
[1] "X"
[[2]]
[1] "Y"
[[3]]
[1] "Z"
Use the iris dataset. Create 2 new columns called sepal.size and petal.size. These should be the average of respective width and length. Plot a scatter plot between the size variables with colour of the dots represented by Species of the dot. Write down (as comments after the code) any 2 obvious insights from this scatter plot
Write a code for a regression model (of your choice) that classifies the iris dataset. Target variable is Species. Check the accuracy of your model
Write a function that prints out missing values for each column in a dataframe. Generic function should work on any dataframe
Write a function that prints out frequency counts for each factor column in a dataframe. Consider a column to be a factor column if it has less than or equal to 10 factor levels. Generic function that should work on any dataframe
Import the Test_Superstore_Sales.xls dataset that has been mailed to you. Import the 3 tabs as 3 different datasets
order_amount == Unit Price * Order Quantity