data = read.csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vSDpJqmVSks0f4vLzzcmcTfPJ8TSu4ziCNpTFy_fIY6LibZksRXzCfJYXj9qZd4NiofejxoYSkmLMwu/pub?output=csv')
For this first assignment, I want you to get into R and load your data!
I discovered this data set at https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results. In order to create my data file, I clicked the icon to upload to Google sheets, published it to the web as a csv and copied a link. Some data will be possible to upload to github but if the file is large github will not do it. You need the data to be in a raw form, with csv.
Once you have your data loaded up into R, lets do two things with it! First I am going to show a snippet of the data with the head command. Create an R environment using three tick marks (by the tilde) and an {r} and close it out with three more tick marks. Anything inside can be run as R code (anything outside is just text explaining your project)
head(data)
## ID Name Sex Age Height Weight Team NOC
## 1 1 A Dijiang M 24 180 80 China CHN
## 2 2 A Lamusi M 23 170 60 China CHN
## 3 3 Gunnar Nielsen Aaby M 24 NA NA Denmark DEN
## 4 4 Edgar Lindenau Aabye M 34 NA NA Denmark/Sweden DEN
## 5 5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED
## 6 5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED
## Games Year Season City Sport
## 1 1992 Summer 1992 Summer Barcelona Basketball
## 2 2012 Summer 2012 Summer London Judo
## 3 1920 Summer 1920 Summer Antwerpen Football
## 4 1900 Summer 1900 Summer Paris Tug-Of-War
## 5 1988 Winter 1988 Winter Calgary Speed Skating
## 6 1988 Winter 1988 Winter Calgary Speed Skating
## Event Medal
## 1 Basketball Men's Basketball <NA>
## 2 Judo Men's Extra-Lightweight <NA>
## 3 Football Men's Football <NA>
## 4 Tug-Of-War Men's Tug-Of-War Gold
## 5 Speed Skating Women's 500 metres <NA>
## 6 Speed Skating Women's 1,000 metres <NA>
So we see a few athletes from different times and events. Notice that Aaftink is repeated, might worry about that later. Lastly I’ll run some real quick statistics.
summary(data)
## ID Name Sex Age
## Min. : 1 Length:271116 Length:271116 Min. :10.00
## 1st Qu.: 34643 Class :character Class :character 1st Qu.:21.00
## Median : 68205 Mode :character Mode :character Median :24.00
## Mean : 68249 Mean :25.56
## 3rd Qu.:102097 3rd Qu.:28.00
## Max. :135571 Max. :97.00
## NA's :9474
## Height Weight Team NOC
## Min. :127.0 Min. : 25.0 Length:271116 Length:271116
## 1st Qu.:168.0 1st Qu.: 60.0 Class :character Class :character
## Median :175.0 Median : 70.0 Mode :character Mode :character
## Mean :175.3 Mean : 70.7
## 3rd Qu.:183.0 3rd Qu.: 79.0
## Max. :226.0 Max. :214.0
## NA's :60171 NA's :62875
## Games Year Season City
## Length:271116 Min. :1896 Length:271116 Length:271116
## Class :character 1st Qu.:1960 Class :character Class :character
## Mode :character Median :1988 Mode :character Mode :character
## Mean :1978
## 3rd Qu.:2002
## Max. :2016
##
## Sport Event Medal
## Length:271116 Length:271116 Length:271116
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
This went through all the columns and did summary statistics on the quantitative variables. Okay that is all I want for the first assignment!