This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Name: Oscar Alexnader Tobar

Course: CAP-4936-2253-4282

```r
plot(cars)

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

#
Case-scenario 1

This is the fourth season of outfielder Luis Robert with the Chicago White Socks. If during the first three seasons he hit 11, 13, and 12 home runs, how many does he need on this season for his overall average to be at least 20?



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuIyBIb21lLXJ1bnMgc28gZmFyXG5IUl9iZWZvcmUgPC0gYygxMSwgMTMsIDEyKVxuIyBBdmVyYWdlIE51bWJlciBvZiBIb21lLXJ1bnMgcGVyIHNlYXNvbiB3YW50ZWRcbndhbnRlZF9IUiA8LSAyMFxuIyBOdW1iZXIgb2Ygc2Vhc29uc1xubl9zZWFzb25zIDwtIDRcbiMgTmVlZGVkIEhvbWUtcnVucyBvbiBzZWFzb24gNFxueF80IDwtIG5fc2Vhc29ucyp3YW50ZWRfSFIgLSBzdW0oSFJfYmVmb3JlKVxuIyBNaW5pbXVtIG51bWJlciBvZiBIb21lLXJ1bnMgbmVlZGVkIGJ5IFJvYmVydFxueF80XG5gYGAifQ== -->

```r
# Home-runs so far
HR_before <- c(11, 13, 12)
# Average Number of Home-runs per season wanted
wanted_HR <- 20
# Number of seasons
n_seasons <- 4
# Needed Home-runs on season 4
x_4 <- n_seasons*wanted_HR - sum(HR_before)
# Minimum number of Home-runs needed by Robert
x_4
[1] 44
#Solution

#Given that x1=11,x2=13,x3=12

#we want to find x4
#such that the mean (average) number of home-runs is x¯>=20

#Notice that in this case n=4

#According to the information above: 20×4=11+13+12+x4

#so when x4=61
#, the home-runs average will be 20.
Robert_HRs<-c(11,13,12,44)
mean(Robert_HRs)
[1] 20
sd(Robert_HRs)
[1] 16.02082

Continuing with the assignment on March 8th 2025 at 9:10 am

# Find the maximum number of home-runs during the four seasons period
max(Robert_HRs)
[1] 44
# Find the minimum number of home-runs during the four seasons period
min(Robert_HRs)
[1] 11
summary(Robert_HRs)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  11.00   11.75   12.50   20.00   20.75   44.00 

#Question 1

Now, you must complete the problem below which represents a similar case scenario. You may use the steps that we executed in Case-scenario 1 as a template for your solution.

This is the sixth season of outfielder Juan Soto in the majors. If during the first five seasons he received 79, 108,41,145, and 135 walks, how many does he need on this season for his overall number of walks per season to be at least 100?

soto_walks <-c(79, 108, 41, 145, 135)
wanted_walks <-100
number_seasons <-6
#Needed Walks on season 6
walks_6 <- number_seasons*wanted_walks-sum(soto_walks)
walks_6
[1] 92

Case-scenario 2

The average salary of 10 baseball players is 72,000 dollars a week and the average salary of 4 soccer players is 84,000. Find the mean salary of all 14 professional players. Solution

We can easily find the joined mean by adding both mean and dividing by the total number of people.

Let n1=10 denote the number of baseball players, and y1=72000 their mean salary. Let n2=4 the number of soccer players and y2=84000 their mean salary. Then the mean salary of all 16 individuals is: n1x1+n2x2/n1+n2

We can compute this in R as follows:

n_1 <- 10
n_2 <- 4
y_1 <- 72000
y_2 <- 84000
# Mean salary overall
salary_ave <-  (n_1*y_1 + n_2*y_2)/(n_1+n_2)
salary_ave
[1] 75428.57

Question 2: The average salary of 7 basketball players is 102,000 dollars a week and the average salary of 9 NFL players is 91,000. Find the mean salary of all 16 professional players.

bp_1<-7
fp_1<-9
w_1<-102000
w_2<-91000
#Mean salary overall
salary_average<-(bp_1*w_1+fp_1*w_2)/(bp_1+fp_1) #Note this for the exam
salary_average
[1] 95812.5

Case-scenario 3

The frequency distribution below lists the number of active players in the Barclays Premier League and the time left in their contract. Years Number of players 6 28 5 72 4 201 3 109 2 56 1 34

Find the mean,the median and the standard deviation.

What percentage of the data lies within one standard deviation of the mean?

What percentage of the data lies within two standard deviations of the mean?

What percent of the data lies within three standard deviations of the mean?

Draw a histogram to illustrate the data.

Solution

The allcontracts.csv file contains all the players’ contracts length. We can read this file in R using the read.csv() function.

#get the CSV file
getwd()
[1] "C:/Users/OAT meal/Documents/In class Activity 5 for Sports analytics"
contract_length<-read.csv("allcontracts.csv",header = TRUE, sep = ",")

contract_years<-contract_length$years
contract_mean<-mean(contract_years)
contract_mean<-round(contract_mean,digits=2)
contract_mean
[1] 3.46
# Median
contracts_median <- median(contract_years)
contracts_median
[1] 3
# Find number of observations
contracts_n <- length(contract_years)
# Find standard deviation
contracts_sd <- sd(contract_years)

What percentage of the data lies within one standard deviation of the mean?

contracts_w1sd <- sum((contract_years - contract_mean)/contracts_sd < 1)/ contracts_n
# Percentage of observation within one standard deviation of the mean
contracts_w1sd
[1] 0.8416834
## Difference from empirical 
contracts_w1sd - 0.68
[1] 0.1616834

What percentage of the data lies within two standard deviations of the mean?

## Within 2 sd
contracts_w2sd <- sum((contract_years - contract_mean)/ contracts_sd < 2)/contracts_n
contracts_w2sd
[1] 1
## Difference from empirical 
contracts_w2sd - 0.95
[1] 0.05

What percent of the data lies within three standard deviations of the mean?

## Within 3 sd 
contracts_w3sd <- sum((contract_years - contract_mean)/ contracts_sd < 3)/contracts_n
contracts_w3sd
[1] 1
## Difference from empirical 
contracts_w3sd - 0.9973
[1] 0.0027

create a histogram

# Create histogram
hist(contract_years,xlab = "Years Left in Contract",col = "green",border = "red", xlim = c(0,8), ylim = c(0,225),
   breaks = 5)

hist(contract_years,xlab = "Years Left in Contract",col = "green",border = "red", xlim = c(0,8), ylim = c(0,250),
   breaks = 3)

boxplot(contract_years,main="Years Left in Control", ylab="Years", col = "lightblue", border="blue", horizontal = FALSE)

Question 3:

Use the skills learned in case scenario number 3 on one the following data sets. You may choose only one dataset. They are both available in Canvas. (double-hit.csv) and (triples_hit.csv)

doubles<-read.table("doubles_hit.csv",header = TRUE,sep = ",")
doubles_hit <- doubles$doubles_hit
doubles_hit_mean<-mean(doubles_hit)
doubles_hit_mean
[1] 23.55
doubles_hit_median<-median(doubles_hit)
doubles_hit_median
[1] 23.5
# Find number of observations
doubles_hit_n <- length(doubles_hit)
# Find standard deviation
doubles_hit_sd <- sd(doubles_hit)
doubles_hit_w1sd <- sum((doubles_hit - doubles_hit_mean)/doubles_hit_sd < 1)/ doubles_hit_n
# Percentage of observation within one standard deviation of the mean
doubles_hit_w1sd
[1] 0.79
doubles_hit_w1sd-0.68
[1] 0.11
## Within 2 sd
doubles_hit_w2sd <- sum((doubles_hit - doubles_hit_mean)/ doubles_hit_sd < 2)/doubles_hit_n
doubles_hit_w2sd
[1] 1

doubles_hit_w2sd - 0.95
[1] 0.05
## Within 3 sd 
doubles_hit_w3sd <- sum((doubles_hit - doubles_hit_mean)/ doubles_hit_sd < 3)/doubles_hit_n
doubles_hit_w3sd
[1] 1
doubles_hit_w3sd - 0.9973
[1] 0.0027

Create a Box-plot

boxplot(doubles_hit, main="Years Left in Control", ylab="Years", col = "lightblue", border="blue", horizontal = FALSE)

It was better to use a box plot then a histogram I hope you understand

hist(doubles_hit,xlab = "Years Left in Contract",col = "green",border = "red", xlim = c(0,8), ylim = c(0,250),
   breaks = 3)

^This is why^

