Programming for Analytics Midterm

1. Creating Functions

Assign the numbers 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 to variable Y
a) Create a function that takes Y as an input and counts the elements in Y. The output of the function should be formatted as followed: ‘Your input has xx elements’, where xx is the number of elements in Y.

Y <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
# Define the function
myfunc <- function(Y) {
  counts <- length(Y)
  sprintf("Your input has %s elements", counts)
}
# Load the function
# Call the function
myfunc(Y)
## [1] "Your input has 20 elements"
  1. Create a function called myfun2 that counts the numbers which are greater than five in Y. The output of the function should be formatted as followed: ‘Your input has xx elements greater than five’, where xx is the number of elements in Y greater than five.
myfun2 <- function(Y) {
  counter <- 0
  for(i in 1:length(Y)){
    if(Y[i] > 5){
    counter = counter + 1
    }
  }
  sprintf("Your input has %s elements", counter)
}
myfun2(Y)
## [1] "Your input has 15 elements"
  1. Create a function called myfun3 that counts the even numbers in Y. The output of the function should be formatted as followed: ‘Your input has xx even numbers’, where xx is the number of even numbers in Y.
myfun3 <- function(Y) {
  counter3 <- 0
  for(i in Y){
    if(Y[i] %% 2 == 0){
      counter3 = counter3 + 1
    }
  }
sprintf("Your input has %s even numbers", counter3)
}
myfun3(Y)
## [1] "Your input has 10 even numbers"
  1. Create a function called myfun4, takes Y and an integer, n, as arguments. The function should do the following:
  1. Check that n is less than or equal to 20 and greater than or equal to 1
  2. If that checks out, then compute the average of the first n numbers in the array that is passed.
  3. The function should return the average.
  4. The calling program should print the following output: ‘The average of the first n numbers is xx.xx’
# Step 1: Define the function
myfun4 <- function(myvector,n){
    # Check that n is less than or equal to 20 and greater than or equal to 1
    if((n <= 20) & (n >= 1)){
      myset <-subset(myvector, myvector <= n)
      myavg <- mean(myset)
      return(myavg)
    } else
      return(NULL)
}
# Step 2: Load the function into memory

# Step 3: Call the function
mylimit <- 4
myfun4(Y,3)
## [1] 2
myout <- paste('The average of the first', mylimit,'numbers is', myfun4(Y,mylimit))

print(myout)
## [1] "The average of the first 4 numbers is 2.5"

2. Loops: for-loop and while-loops, loop inside loop

  1. Create a list that contains the following values: 23, 45, 60, 30, 49. Write a for-loop that only prints out the numbers which are larger than 40
mylist <- list(23,45,60,30,49)
for (i in mylist) {
  if(i > 40)
    print(i)
}
## [1] 45
## [1] 60
## [1] 49
  1. Create a list that contains the following letters: p,r,o,g,r,a,m,m,i,n,g. Write a for-loop that only prints out the letters that are vowels (e.g., o,u,a,e,i)
mylist2 <- list('p','r','o','g','r','a','m','m','i','n','g')
for (z in mylist2) {
  if ((z == 'a') | (z == 'e') | (z == 'i') | (z == 'o') | (z == 'a=u'))
  print(z)
}
## [1] "o"
## [1] "a"
## [1] "i"

4. Subset in combination with dplyr

  1. Create a subset of all players that scored more than 1000 points and had a salary higher than 20,000,000
# Option 1: subset command
basketball <- read.csv("/Users/jackcarlson/Downloads/basketball.csv")
subset(basketball, points_scored > 1000 & salary > 20000000)
##        player   salary points_scored games_played minutes_played
## 2   ChrisPaul 23180790          1154           80           2791
## 3  DwayneWade 20644400          1743           69           2493
## 7 LeBronJames 20068563          1564           82           2857
##   active_or_retired
## 2                 1
## 3                 1
## 7                 1
# Option 2: using dplyr
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
filter(basketball, points_scored > 1000 & salary > 20000000)
##        player   salary points_scored games_played minutes_played
## 1   ChrisPaul 23180790          1154           80           2791
## 2  DwayneWade 20644400          1743           69           2493
## 3 LeBronJames 20068563          1564           82           2857
##   active_or_retired
## 1                 1
## 2                 1
## 3                 1
  1. Create a new dataframe with only the variables salary and active_or_retired for the players who played less than 50 games or who scored less than 900 points
newdf <- subset(basketball, games_played < 50 | points_scored < 900, select=c(salary, active_or_retired))
newdf
##      salary active_or_retired
## 1  23500000                 0
## 4  22458000                 1
## 5  21436271                 1
## 6  20644400                 0
## 8  18995624                 1
## 12  9000000                 0
## 13  9500000                 0
## 14  7500000                 0
# With dplyr
basketball %>% filter(games_played <= 50 | points_scored < 900) %>% select(salary, active_or_retired)
##     salary active_or_retired
## 1 23500000                 0
## 2 22458000                 1
## 3 21436271                 1
## 4 20644400                 0
## 5 18995624                 1
## 6  9000000                 0
## 7  9500000                 0
## 8  7500000                 0
  1. Using the dataframe from part b), group the players into two groups based on the variable active_or_retired. Create a summary showing the average salary for the two groups
basketball %>% group_by(active_or_retired) %>% summarise(avg = mean(salary))
## # A tibble: 2 x 2
##   active_or_retired       avg
##               <int>     <dbl>
## 1                 0 13690733.
## 2                 1 20080815.

8. Shiny

Seperate file.