## Game League Runs Margin Pitchers Attendance Time
## 1 CLE-DET AL 14 6 6 38774 168
## 2 CHI-BAL AL 11 5 5 15398 164
## 3 BOS-NYY AL 10 4 11 55058 202
## 4 TOR-TAM AL 8 4 10 13478 172
## 5 TEX-KC AL 3 1 4 17004 151
## 6 OAK-LAA AL 6 4 4 37431 133
## [,1]
## Runs 0.68131437
## Margin -0.07135831
## Pitchers 0.89430821
## Attendance 0.25719248
## Time 1.00000000
##
## Call:
## lm(formula = Time ~ Pitchers, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.945 -8.445 -3.104 9.751 50.794
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 94.843 13.387 7.085 8.24e-06 ***
## Pitchers 10.710 1.486 7.206 6.88e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.46 on 13 degrees of freedom
## Multiple R-squared: 0.7998, Adjusted R-squared: 0.7844
## F-statistic: 51.93 on 1 and 13 DF, p-value: 6.884e-06
Regression Equation: The linear regression model can be expressed as:
Time = 94.843 + 10.710 × PitchersWe can notice a few things from the plot. The main focus is the linear pattern. This suggests that as our data told us befor there is a strong linear correlation between pitchers and time. Second we must note that there are a few outliers but that most of the plots follow the linear pattern we expected. Both of this allow us to confidently say that once again there is a strong relationship between the time of a baseball game and the number of pitchers used.
This project allowed us to learn how to find correlation between two data vectors in R as well as how to fit a linear model to determine significance between of the relationship between the two variables. We were able to use linear regression to determine that there was a strong relationship between the length of a baseball game and the number of pitchers used in the game.
Load the dataset
data <-
read.csv(“BaseballTimes.csv”)
Fit the linear regression model
model <- lm(Time ~
Pitchers, data = data)
Show summary of the model
summary(model)