Case-scenario 1 This is the fourth season of outfielder Luis Robert with the Chicago White Socks. If during the first three seasons he hit 11, 13, and 12 home runs, how many does he need on this season for his overall average to be at least 20?
# Home-runs so far
HR_before <- c(11, 13, 12)
# Average Number of Home-runs per season wanted
wanted_HR <- 20
# Number of seasons
n_seasons <- 4
# Needed Home-runs on season 4
x_4 <- n_seasons*wanted_HR - sum(HR_before)
# Minimum number of Home-runs needed by Robert
x_4
## [1] 44
According to the calculations above, Robert must hit 44 home-runs or better on this season to get an average number of home-runs per season of at least 20. We could confirm this, by using the function mean() in R
# Robert's performance
Robert_HRs <- c(11, 13, 12,44)
# Find mean
mean(Robert_HRs)
## [1] 20
# Find standard deviation
sd(Robert_HRs)
## [1] 16.02082
# Find the maximum number of home-runs during the four seasons period
max(Robert_HRs)
## [1] 44
# Find the minimum number of home-runs during the four seasons period
min(Robert_HRs)
## [1] 11
# We can also use the summary() function to find basic statistics, including the median!
summary(Robert_HRs)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 11.75 12.50 20.00 20.75 44.00
fivenum(Robert_HRs)
## [1] 11.0 11.5 12.5 28.5 44.0
hist(Robert_HRs)
Question 1 Now, you must complete the problem below which represents a similar case scenario. You may use the steps that we executed in Case-scenario 1 as a template for your solution.
This is the sixth season of outfielder Juan Soto in the majors. If during the first five seasons he received 79, 108,41,145, and 135 walks, how many does he need on this season for his overall number of walks per season to be at least 100?
# walks so far
soto_walks_before <- c(79, 108, 41, 145, 135)
# Average Number of walks per season wanted
wanted_walks <- 100
# Number of seasons
n_soto_seasons <- 6
# Needed walks on season 6
soto_walks_6 <- n_soto_seasons*wanted_walks - sum(soto_walks_before)
# Minimum number of Home-runs needed by Robert
soto_walks_6
## [1] 92
Case-scenario 2 The average salary of 10 baseball players is 72,000 dollars a week and the average salary of 4 soccer players is 84,000. Find the mean salary of all 14 professional players.
n_1 <- 10
n_2 <- 4
y_1 <- 72000
y_2 <- 84000
# Mean salary overall
salary_ave <- (n_1*y_1 + n_2*y_2)/(n_1+n_2)
salary_ave
## [1] 75428.57
Question 2 The average salary of 7 basketball players is 102,000 dollars a week and the average salary of 9 NFL players is 91,000. Find the mean salary of all 16 professional players.
n_3 <- 7
n_4 <- 9
y_3 <- 102000
y_4 <- 91000
salary_ave_2 <- (n_3*y_3 + n_4*y_4)/(n_3+n_4)
salary_ave_2
## [1] 95812.5
Case-scenario 3 The frequency distribution below lists the number of active players in the Barclays Premier League and the time left in their contract.
contract_length <- read.table("allcontracts.csv", header = TRUE, sep = ",")
contract_years <- contract_length$years
# Mean
contracts_mean <- mean(contract_years)
contracts_mean
## [1] 3.458918
# Median
contracts_median <- median(contract_years)
contracts_median
## [1] 3
# Find number of observations
contracts_n <- length(contract_years)
# Find standard deviation
contracts_sd <- sd(contract_years)
contracts_w1sd <- sum((contract_years - contracts_mean)/contracts_sd < 1)/ contracts_n
# Percentage of observation within one standard deviation of the mean
contracts_w1sd
## [1] 0.8416834
## Difference from empirical
contracts_w1sd - 0.68
## [1] 0.1616834
## Within 2 sd
contracts_w2sd <- sum((contract_years - contracts_mean)/ contracts_sd < 2)/contracts_n
contracts_w2sd
## [1] 1
## Difference from empirical
contracts_w2sd - 0.95
## [1] 0.05
## Within 3 sd
contracts_w3sd <- sum((contract_years - contracts_mean)/ contracts_sd < 3)/contracts_n
contracts_w3sd
## [1] 1
As we already know 100% of the data is within 2 standard deviation of the mean so this calculation was not necessary
## Difference from empirical
contracts_w3sd - 0.9973
## [1] 0.0027
# Create histogram
hist(contract_years,xlab = "Years Left in Contract",col = "green",border = "red", xlim = c(0,8), ylim = c(0,225),
breaks = 5)
# Create histogram
hist(contract_years,xlab = "Years Left in Contract",col = "blue",border = "black", xlim = c(0,7), ylim = c(0,200),
breaks = 8)
View(contract_length)
hist(contract_years)
plot(contract_years)