library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
summer<-read.csv("C:/Users/atan/Desktop/All/1 study/5 kaggle/summer.csv", header=TRUE)
summary(summer)
## Year City Sport Discipline
## Min. :1896 London : 3567 Aquatics : 4170 Athletics : 3638
## 1st Qu.:1948 Athens : 2149 Athletics : 3638 Rowing : 2667
## Median :1980 Los Angeles: 2074 Rowing : 2667 Swimming : 2628
## Mean :1970 Beijing : 2042 Gymnastics: 2307 Artistic G.: 2103
## 3rd Qu.:2000 Sydney : 2015 Fencing : 1613 Fencing : 1613
## Max. :2012 Atlanta : 1859 Football : 1497 Football : 1497
## (Other) :17459 (Other) :15273 (Other) :17019
## Athlete Country Gender
## PHELPS, Michael : 22 USA : 4585 Men :22746
## LATYNINA, Larisa : 18 URS : 2049 Women: 8419
## ANDRIANOV, Nikolay : 15 GBR : 1720
## MANGIAROTTI, Edoardo: 13 FRA : 1396
## ONO, Takashi : 13 GER : 1305
## SHAKHLIN, Boris : 13 ITA : 1296
## (Other) :31071 (Other):18814
## Event Medal
## Football : 1497 Bronze:10369
## Hockey : 1422 Gold :10486
## Team Competition: 1147 Silver:10310
## Basketball : 1012
## Handball : 973
## Water Polo : 958
## (Other) :24156
First we look the total number of Medals over the time from 1984 to 2012, then we look the what is the distribution of the Gold, Silver and Bronze Medals.
all <- summer[which(summer$Country=="CHN" ),]
t2<- all[, c(1,9)]
c2<- ggplot(data=all, aes(x=Year, fill=Medal ))+ geom_bar()+ xlab("Olympic Year")+ ylab("Total Medal Number Count")+xlim(1980, 2016)
c2
c3<- ggplot(data=all, aes(x=Year,fill=Medal))+geom_bar(position=position_dodge())+ xlab("Olympic Year")+ ylab("Bronze, Gold and Silver Medal Number")+xlim(1980, 2016)
c3
From the total medal count graph we can see the gradual increasing of medals from 1988 to 1996, then remained stable in 2000 and 2004, there is an absolute max in 2008, and a drop a little in 2012. We have noticed this the reason caused this issue is in 2008 the Olympics is in Beijing, CHINA. So chinese athletes did an amazing job.
From the medal distribution graph, we noticed the stable increasing of Golden medals dispite the fluctuation of total medal number counts from 1988 to 2008. Specially from 2000, the Gold Medal number exceeds Silver and Bronze Medals. We think one of the reason cause this phenomenon is in addition to the prize Chinese central government give the Gold Medal athletes, each province government and city government of the athletes’ native town would give them extra awards. Not mentioning the potential chances to be on TV shows, Advertising fees. In conclusion, the athletes are very motivated to earn the Gold Medal. Therefore now we are going to explore the Gold medal sports in next section
Now we investigate which Sports has the most gold medals.
library(vcd)
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
library(forcats)
## Warning: package 'forcats' was built under R version 3.4.3
g2<- all[, c(1,3)]
s3<- ggplot(data=g2, aes(fct_infreq(Sport), fill=Sport))+geom_bar()+ xlab("Sports")+ ylab("Total Gold Medal number")+coord_flip()
s3
g3<- g2[c(2)]
g4<-data.frame(ftable(g3))
g5 <- g4[order(-g4$Freq),]
g5$a <- c(matrix(1:43))
g6 <- g5[,c(2,3) ]
g7 <- g6[c(1:18),]
g8 <- t(as.matrix(g7))
g9 <- g8[c(1),]
barplot(g9, xlab=" Gold Medal Sport ", ylab="Number of Gold Medals" , col="lightgreen")
fit <-goodfit(g7, type="poisson")
summary(fit)
##
## Goodness-of-fit test for poisson distribution
##
## X^2 df P(> X^2)
## Likelihood Ratio 1274.914 16 1.228669e-261
fit1 <-goodfit(g7, type="nbinomial")
summary(fit1)
##
## Goodness-of-fit test for nbinomial distribution
##
## X^2 df P(> X^2)
## Likelihood Ratio 248.7631 15 2.231905e-44
plot(fit, main="Poisson")
plot(fit1, main="Negative Binomial")
distplot(g7, type="poisson")
distplot(g7, type="nbinomial")
Ord_plot(g7, main="Gold medal distribution")
The previous graph showed the sports with the most gold medal is Aquatics, Gymnastics, Table Tennis, Weightlighting and Badminton. The trend of the table shows it might be a poisson or negative binomial distribution. We use the goodfit function to examine our guesses. Then we use ord_plot to show how or where the distribution does not fit.
Finally, we will look the Chinese athletes who won the most Gold Medals
a1 <- all[, c(1,5)]
a2<- a1[c(2)]
a3<-data.frame(ftable(a2))
a4 <- a3[order(-a3$Freq),]
a5<- a4[(1:15),]
b1<- ggplot(data=a5, aes(x=Freq, fill=x))+geom_bar()+ xlab("Number of Gold Medal for per Athlete ")+ ylab("Number of Athletes ") +ggtitle(" Top Individual Chinese Athletes Gold Medals Count")
b1
by 2012, ZOU Kai set the record for 5 Gold Medals, followed by 14 people with 3 or 4 Gold Medals. As we mentioned earlier, many Gold Medal Athletes are doing great besides the professional life. All the top gold medal winners are/ were advertisement spokenperson for big company like Coke Cola. Especially, LI xiaopeng, Guo Jinjjing and Wu minxia are constantly on all kinds of TV shows, Xiong ni and Deng yaping were working for government. In addition, Li Ning has a large clothes company in China.