Question 1

Now, you must complete the problem below which represents a similar case scenario. You may use the steps that we executed in Case-scenario 1 as a template for your solution.

This is the sixth season of outfielder Juan Soto in the majors. If during the first five seasons he received 79, 108,41,145, and 135 walks, how many does he need on this season for his overall number of walks per season to be at least 100?

# Walks so far
Walks_before = c(79, 108, 41, 145, 135)

# Average Number of walks per season wanted
wanted_Walks = 100

# Number of seasons
n_seasons = 6

# Needed walks on season 6
x_6 <- n_seasons*wanted_Walks - sum(Walks_before)

# Minimum number of Walks needed by Robert
x_6
## [1] 92

#Question 1 Answer:

According to the calculations above, Soto needs at least 92 walks to get a per season average of at least 100.

Following the case-scenario solution, we can confirm using mean().

# Soto's performance
Soto_walks <- c(79, 108, 41, 145, 135, 92)
# Find mean
mean(Soto_walks)
## [1] 100
# Find standard deviation
sd(Soto_walks)
## [1] 38.20995
# Find the maximum number of walks during the six seasons period
max(Soto_walks)
## [1] 145
# Find the minimum number of home-runs during the four seasons period
min(Soto_walks)
## [1] 41
summary(Soto_walks)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   41.00   82.25  100.00  100.00  128.25  145.00

Question 2

The average salary of 7 basketball players is 102,000 dollars a week and the average salary of 9 NFL players is 91,000. Find the mean salary of all 16 professional players.

n_1 = 7
n_2 = 9
y_1 = 102000
y_2 = 91000

# Mean salary overall
salary_ave =  (n_1*y_1 + n_2*y_2)/(n_1+n_2)
salary_ave
## [1] 95812.5

Question 2 Answer:

Following the example in case scenario 2, we can see that the average salary of all 16 professional players is $95,812.50.

Question 3

Use the skills learned in case scenario number 3 on one the following data sets. You may choose only one dataset. They are both available in Canvas.

(I chose to upload doubles_hit.csv)

getwd()
## [1] "D:/17865/Documents"
doubles_hit = read.csv("doubles_hit.csv", header = TRUE, sep = ",")

d_hit = doubles_hit$doubles_hit

d_hit
##   [1] 37  4  6  7  9 25 18 11  8 13 15  1 30 30  6 23 14 26 33 23 34 32  9  4 23
##  [26] 34 19 29 15 27 18 35  7  7 19  4 38  2 16 15 26 15  3 19 24 33 34 33 38 29
##  [51] 19 18  7  7 30 15 31 12 17 21 11  9 35  1 27 27 27 10 35 34 13  5 40 40 11
##  [76] 40 29 23 37 22 29 15 24 25 40 14  1 47 49 45 42 40 40 46 41 42 43 45 48 46

Question 3 Answers:

  1. Find the mean,the median and the standard deviation.

Using the codes below, we can see that:

Mean DH = 23.55

Median DH = 23.5

Number of DH = 100

SD DH = 13.37371

#Mean
d_hit_mean = mean(d_hit)
d_hit_mean
## [1] 23.55
#Median
d_hit_med = median(d_hit)
d_hit_med
## [1] 23.5
#Number of Observations
d_hit_n = length(d_hit)
d_hit_n
## [1] 100
#Standard Deviation
d_hit_sd = sd(d_hit)
d_hit_sd
## [1] 13.37371
  1. What percentage of the data lies within one standard deviation of the mean?

Using the codes below, we can see that the percentage of observations within one standard deviation of the mean is 79%, and the difference from the empirical rule is 11%.

d_hit_w1sd <- sum((d_hit - d_hit_mean)/d_hit_sd < 1)/ d_hit_n

# Percentage of observation within one standard deviation of the mean
d_hit_w1sd
## [1] 0.79
## Difference from empirical 
d_hit_w1sd - 0.68
## [1] 0.11
  1. What percentage of the data lies within two standard deviations of the mean?
## Within 2 sd
d_hit_w2sd <- sum((d_hit - d_hit_mean)/ d_hit_sd < 2)/d_hit_n
d_hit_w2sd
## [1] 1
## Difference from empirical 
d_hit_w2sd - 0.95
## [1] 0.05
  1. What percent of the data lies within three standard deviations of the mean?
## Within 3 sd 
d_hit_w3sd <- sum((d_hit - d_hit_mean)/ d_hit_sd < 3)/d_hit_n
d_hit_w3sd
## [1] 1
## Difference from empirical 
d_hit_w3sd - 0.9973
## [1] 0.0027
  1. Draw a histogram to illustrate the data.
# Create histogram
hist(d_hit,xlab = "Number of Doubles Hits",col = "green",border = "red", xlim = c(0,60), ylim = c(0,30),
   breaks = 5)