Final Project- Comparing the mean number of passing yards per game in the AFC East from 2020-2024 to the mean number of passing yards per game in the AFC West from 2020-2024.

Project Overview
The AFC East and AFC West are two divisions within the national football league. For this project, I will randomly sample the 40 games from the AFC East and 40 from the AFC West. Each division has played 400 or more games from 2020-2024. The sampling was done randomly using a process in which I assigned an integer between 1 and the number of games played. Then I randomly selected numbers from that range. I assigned each game a number using a chronological list of games and then used a random number generator to choose 40 of them.

The Sample Data

East <- c(344, 155, 190, 118, 266, 19, 329, 366, 270, 405, 239, 103, 186, 115, 147, 304, 202, 154, 166, 320, 243, 316, 301, 173, 272, 263, 274, 167, 169, 281, 243, 139, 184, 196, 225, 209, 257, 297, 283, 338)

West <- c(208, 302, 385, 236, 347, 248, 282, 348, 243, 302, 337, 343, 206, 323, 222, 406, 257, 410, 237, 383, 340, 235, 188, 292, 293, 196, 329, 335, 252, 222, 228, 305, 167, 306, 424, 209, 271, 94, 291, 306)

Summary Statistics - Sample Data

meanEast <- round(mean(East),2)
sdEast <- round(sd(East), 2)

meanWest <- round(mean(East),2)
sdWest<- round(sd(East),2)

The mean of the sample data from the AFC East is 230.7.

The standard deviation of the sample data from the AFC East is 82.34.

The mean of the sample data from the AFC West is 282.7.

The standard deviation of the sample data from the AFC West is 72.16.

Distribution of the Sample Data

hist(East, breaks = 8, col = "red")

hist(West, breaks = 8, col = "blue")

boxplot(East, horizontal = T, col = "red")

boxplot(West, horizontal = T, col = "blue")

Statement about the Distribution of the Sample Data Sets

The East’s data is more evenly spread out. It is roughly bell-shaped with a slight right skew.

The West’s data has a leftward skew.

Hypotheses

I am going to try to show that the mean number of passing yards per game in the AFC West is greater than the mean number passing yards per game in the AFC East.

Hypothesis Test for Two Means

Step 1: State the Hypotheses

\(H_0: \mu_W \ge \mu_E\) \(H_a: \mu_W \lt \mu_E\)

Step 2: Write the probability statement

\(p(\bar{x_W} - \bar{x_E} \gt 52)\)

Step 3: Find the p-value

result <- tsum.test(mean.x = 230.7, s.x = 82.34, n.x = 40, mean.y = 282.7, s.y = 72.16, n.y = 40, alternative = "less", var.equal = FALSE)

The p-value is 0.0017997.

Step 4: Make a Conclusion

Since the p-value is significantly smaller than alpha, we reject the null hypothesis. There is strong statistical evidence to conclude that the average passing yards per game in the AFC East is significantly greater than in the AFC West.

Statement of Learning

In this class, I learned how to graph data and analyze its shape, center, and spread. I learned how to organize data and turn it into an R Markdown document. Some topics that I learned from this class that will be helpful to me in the future were counting, probablilities, and density curves.