Chinese Summer Olympics Medal Analysis

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
summer<-read.csv("C:/Users/atan/Desktop/All/1 study/5 kaggle/summer.csv", header=TRUE)
summary(summer)
##       Year               City              Sport             Discipline   
##  Min.   :1896   London     : 3567   Aquatics  : 4170   Athletics  : 3638  
##  1st Qu.:1948   Athens     : 2149   Athletics : 3638   Rowing     : 2667  
##  Median :1980   Los Angeles: 2074   Rowing    : 2667   Swimming   : 2628  
##  Mean   :1970   Beijing    : 2042   Gymnastics: 2307   Artistic G.: 2103  
##  3rd Qu.:2000   Sydney     : 2015   Fencing   : 1613   Fencing    : 1613  
##  Max.   :2012   Atlanta    : 1859   Football  : 1497   Football   : 1497  
##                 (Other)    :17459   (Other)   :15273   (Other)    :17019  
##                  Athlete         Country        Gender     
##  PHELPS, Michael     :   22   USA    : 4585   Men  :22746  
##  LATYNINA, Larisa    :   18   URS    : 2049   Women: 8419  
##  ANDRIANOV, Nikolay  :   15   GBR    : 1720                
##  MANGIAROTTI, Edoardo:   13   FRA    : 1396                
##  ONO, Takashi        :   13   GER    : 1305                
##  SHAKHLIN, Boris     :   13   ITA    : 1296                
##  (Other)             :31071   (Other):18814                
##               Event          Medal      
##  Football        : 1497   Bronze:10369  
##  Hockey          : 1422   Gold  :10486  
##  Team Competition: 1147   Silver:10310  
##  Basketball      : 1012                 
##  Handball        :  973                 
##  Water Polo      :  958                 
##  (Other)         :24156

Exploring the Medal status of China

1. Total number and the distribution of Medals

First we look the total number of Medals over the time from 1984 to 2012, then we look the what is the distribution of the Gold, Silver and Bronze Medals.

all <- summer[which(summer$Country=="CHN" ),] 

t2<- all[, c(1,9)]
c2<- ggplot(data=all, aes(x=Year, fill=Medal ))+ geom_bar()+ xlab("Olympic Year")+ ylab("Total Medal Number Count")+xlim(1980, 2016)
c2

c3<- ggplot(data=all, aes(x=Year,fill=Medal))+geom_bar(position=position_dodge())+ xlab("Olympic Year")+ ylab("Bronze, Gold and Silver Medal Number")+xlim(1980, 2016)
c3

From the total medal count graph we can see the gradual increasing of medals from 1988 to 1996, then remained stable in 2000 and 2004, there is an absolute max in 2008, and a drop a little in 2012. We have noticed this the reason caused this issue is in 2008 the Olympics is in Beijing, CHINA. So chinese athletes did an amazing job.

From the medal distribution graph, we noticed the stable increasing of Golden medals dispite the fluctuation of total medal number counts from 1988 to 2008. Specially from 2000, the Gold Medal number exceeds Silver and Bronze Medals. We think one of the reason cause this phenomenon is in addition to the prize Chinese central government give the Gold Medal athletes, each province government and city government of the athletes’ native town would give them extra awards. Not mentioning the potential chances to be on TV shows, Advertising fees. In conclusion, the athletes are very motivated to earn the Gold Medal. Therefore now we are going to explore the Gold medal sports in next section

2.Gold Medal distribution in sports

Now we investigate which Sports has the most gold medals.

library(vcd)
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
library(forcats)
## Warning: package 'forcats' was built under R version 3.4.3
g2<- all[, c(1,3)]
s3<- ggplot(data=g2, aes(fct_infreq(Sport), fill=Sport))+geom_bar()+ xlab("Sports")+ ylab("Total Gold Medal number")+coord_flip()
s3

g3<- g2[c(2)] 
g4<-data.frame(ftable(g3))
g5 <- g4[order(-g4$Freq),]
g5$a <- c(matrix(1:43))
g6 <- g5[,c(2,3) ]
g7 <- g6[c(1:18),]
g8 <- t(as.matrix(g7))
g9 <- g8[c(1),]

barplot(g9, xlab=" Gold Medal Sport ", ylab="Number of Gold Medals" , col="lightgreen")

fit <-goodfit(g7, type="poisson")
summary(fit)
## 
##   Goodness-of-fit test for poisson distribution
## 
##                       X^2 df      P(> X^2)
## Likelihood Ratio 1274.914 16 1.228669e-261
fit1 <-goodfit(g7, type="nbinomial")
summary(fit1)
## 
##   Goodness-of-fit test for nbinomial distribution
## 
##                       X^2 df     P(> X^2)
## Likelihood Ratio 248.7631 15 2.231905e-44
plot(fit, main="Poisson")

plot(fit1, main="Negative Binomial")

distplot(g7, type="poisson")

distplot(g7, type="nbinomial")

Ord_plot(g7, main="Gold medal distribution")

The previous graph showed the sports with the most gold medal is Aquatics, Gymnastics, Table Tennis, Weightlighting and Badminton. The trend of the table shows it might be a poisson or negative binomial distribution. We use the goodfit function to examine our guesses. Then we use ord_plot to show how or where the distribution does not fit.

Finally, we will look the Chinese athletes who won the most Gold Medals

a1 <- all[, c(1,5)]
a2<- a1[c(2)] 
a3<-data.frame(ftable(a2))
a4 <- a3[order(-a3$Freq),]
a5<- a4[(1:15),]

b1<- ggplot(data=a5, aes(x=Freq, fill=x))+geom_bar()+ xlab("Number of Gold Medal for per Athlete ")+ ylab("Number of Athletes ") +ggtitle(" Top Individual Chinese Athletes Gold Medals Count")
b1

by 2012, ZOU Kai set the record for 5 Gold Medals, followed by 14 people with 3 or 4 Gold Medals. As we mentioned earlier, many Gold Medal Athletes are doing great besides the professional life. All the top gold medal winners are/ were advertisement spokenperson for big company like Coke Cola. Especially, LI xiaopeng, Guo Jinjjing and Wu minxia are constantly on all kinds of TV shows, Xiong ni and Deng yaping were working for government. In addition, Li Ning has a large clothes company in China.