The research question that we will be addressing is the comparison of medals won in the Olympics between men and women in all competitions. We are aiming to look at the disparities that continue to exist between genders that are highlighted even in the Olympics. Our data captures the year and the medal count that each gender has won during each Olympic summer. The data was provided by IOC Research and Refernce Center that recorded each individual medal awarded to each athlete from 1896 to 2014. Possible limitations of the data set can be attributed to the lack of participation that women have in certain events, therefore lessening their total medal count. For example, in 1976, rowing, basketball, and handball became open to women’s participation, which means that with overall medal count, women will perpetually be catching up.
Inital observations of the visualizaton show both genders have a postive relationship between medals won and year. The women’s regression line is slightly steeper than the men’s. The mean and median medals won by men are about twice the amount of the women.
| Gender | Mean | Median |
|---|---|---|
| Men | 937.8235 | 958 |
| Women | 471.6471 | 429 |
The numerical explanatory variable in this model is year. The categorical explanatory variable is gender. Our outcome variable is total medals won per year by men and women in all sports of the Olympics. The interaction model is used to find the regression lines.
| term | estimate | std_error | statistic | p_value | conf_low | conf_high |
|---|---|---|---|---|---|---|
| intercept | -12606.735 | 1534.859 | -8.214 | 0 | -15741.335 | -9472.136 |
| GenderWomen | -16796.324 | 2170.618 | -7.738 | 0 | -21229.316 | -12363.331 |
| Year | 6.841 | 0.775 | 8.825 | 0 | 5.258 | 8.424 |
| GenderWomen:Year | 8.248 | 1.096 | 7.524 | 0 | 6.009 | 10.486 |
Men: \(\widehat{medalcount} = -12607 + 6.84 * year\)
Women: \(\widehat{medalcount} = -29403 + 15.09 *year\)
The regression model we applied to the data was for non-parallel slopes. The intercept refers to the average medals won for men at year 0, which is -12607. The intercept for women is -29403. This refers to the average medals won for women at year 0. The slope for the males en’s is 6.84. This means that for every 1 year increase there is an on average associated increase of 6.84 medals won for males.The slope for the women’s line is 6.84 + 8.25, which equals 15.09. That means for every 1 year increase, there is on average an associated increase of 15.09 medals won for females. The regression line for men has a flatter slope, while the women’s regression line has a steeper slope, which is illustrated by our exploratory data analysis. Potential limitations of our analysis may originate from the fact that we began our computations from the year 1940, which may skew the intercept and slope of both genders.
We chose the interaction model over the parallel slopes model becasue it is clear based on the visualization that the slopes are not equal. Also, neither confidence interval included zero, therefore it is not possible for the slopes to be the same.
On the graph, the slope of the men’s curve is decreasing over time and the women’s slope is increasing over time. The slope of the women’s curve is steeper than the mens, therefore the difference between medals won for each gender decreases over the years. Also, theoretically women have a lower medals won at year 0 than men.
Null Hypothesis: There is no difference in medals won over time between men and women.
Alternative Hypothesis: There is a difference in medals won over time between men and women.
Confidence interval for men’s slope: [5.258, 8.424] Confidence interval for women’s slope: [12.850, 17.327]
There is 95% confidence that the true population slope for the men’s slope is between 5.258 and 8.424. There is also 95% confidence that the true population slope for the women’s slope is between 12.850 and 17.327. This means that on average there is a steeper increase in medals won by women over time and men are winning medals at a slower rate. These confidence intervals suggest there is a change in medals won each year, which is different for each both respective genders.
Alpha Value = 0.05
Given the p-value is equal to 0 for both male and female slopes, we would reject the null hypothesis for any alpha value, therefore, we would conclude that there is a difference in medals won over time between men and women. There is statistically significant evidence that the slopes are not the same.
Given the residual plots, the histogram is somewhat normal which satisfies that condition. The scatterplot of residuals as the y variable shows constant spread. Additionally, the original scatterplot showcases a linear relationship between the year and medal count.
Based on our statistical analysis we found that the slope for women medal count is 15.09 versus a value of 6.84 for men. These values show that there is a rise in women participation in the Olympics, however, the totals are not equal. The amount of medals won by women is slowly equaling the total of the men, which can be supported by the larger slope that female athletes have generated in this data set. Additionally, regarding our hypothesis test, we were able to conclude that there is a difference in medals won over time amongst men and women due to our p-value being less than our alpha value (0 < 0.05). We found a statistically significant difference between the slopes of men and women. All conditions for statistical inferences are satisfied by our original graph and residual plots.
Based on our analysis, our research question was sufficiently supported due to the gender disparities that continue to permeate through the Olympics. However, medals that are won by women are trending at a greater pace than medals won by men due to the introductions of events that did not exist to prior years. Essentially, the Olympics are trending towards equal representation in events by both genders. Therefore, hopefully it is a safe assumption that some point in the near future that Olympics will truly represent both genders, and allow both genders to compete in the same number of events, and have the opportunity to win an equal amount of medals, rather than the male gender winning more the female gender.
There are many limitations this data consists of that stem from under representation of women in early years of the Olympics. There is a large disparity between men and women Olympic particpants before 1950, which suggests there is a significantly larger amount of opportunities to win medals as a man. The overall amount of medal opportunities is greater for men when the first Olympics occured. Overtime, medal opportunites for both genders increase due to more sports participating in the Olympic games. Another misinterpretation of our data is the slope of the regression line for women. We see a steep slope between 1940 and 2012 which shows the increase of medals won by women. The increase of this slope is most likely due to the increase of medal opportunities and women participants over time.
Also, one limitation that could be potential work is taking into account the differences between genders in specific countries. This would allow us to look at which countries are being more proactive about gender equality. Below is a data visualizaton that could be a model for future research.
Another opportunity for potential future work will originate from the inclusivity that the Olympics will create. Whether that stems from the additions of women’s sports that have been primarily open for men, or simply the creation of events that are only for women. It would be interesting to see if the total medals won by women will equal the totals medals won by men, and even if the women’s total medal count will exceed the men’s.
Link for online data set: https://www.kaggle.com/fangya/summer-olympic-medal/data
This visualization could foreshadow potential work becasue it shows the differences between men and women for 5 of the largest countries represented in the Olympics. You are able to see for specific sports which gender has more medals. It also allows us to look at which countries have a large and small gender gap. This is a small sample size that could possibly suggest patterns in other countries that closely resemble these ones. Therefore, this potential work can physically identify which countries are moving towards equality and gender represenation at the Olympics.