#1a. import the data library(readr) olympic <- read_csv(“Documents/NYU MSQM/mydata/olympic.csv”) #1b. view the data View(olympic) #1c. look at column names names(olympic) #1d. look at dimensions of data (rows and columns) dim(olympic) #attach the data frame attach(olympic) #2. sort your data by total medals and country, assigned sorted data to a new data frame called sort_total sort_total <- olympic[order(Total, NOC),] sort_total #3. use describe() function to look at data #import library Hmisc library(Hmisc) describe(olympic) #4a and b answers were found in the describe statistics #4a. what is median of gold, silver, bronze and total medals? #the median of gold is 2.5, silver is 3, bronze is 2, and total is 8 #4b. Also look at the mean and total number of GSB and T medals? #Mean of gold is 3.808, silver is 3.731, bronze is 3.808, and total is 11.35 #total number of GSBT medals sum(olympic\(Gold) sum(olympic\)Silver) sum(olympic\(Bronze) sum(olympic\)Total) #total number of gold is 99, silver is 97, bronze is 99, and total is 295 #5. More Statistics #5a. For Gold, look at summary stats, including: IQR, min, max, mean, var, sd, skew min(olympic\(Gold) max(olympic\)Gold) mean(olympic\(Gold) IQR(olympic\)Gold) var(olympic\(Gold) sd(olympic\)Gold) library(psych) skew(olympic\(Gold) #min = 0, max= 13, mean = 3.808, IQR=4.75, var= 14.642, sd=3.826, skew=0.879 summary(olympic\)Gold) describe(olympic\(Gold) #6. More statistics - subset #6a. Redo above stats, group by Region describeBy(olympic, group=olympic\)Region) #6b. Which region wont the highest total medals describeBy(olympic\(Total, group=olympic\)Region) #North America has the highest mean #6c. How many countries in this region? northamerica<- subset(olympic, Region==‘NORTH_A’) northamerica<- #There are 2 countries in North America #6d. How many countries are in Europe group? europe<- subset(olympic, Region==‘EUROPE’) #15 countries are in the Europe group #6e. What is the max number of medals won? max(olympic\(Total) #the max number of medals won is 33 #6e. What country won the max? subset(olympic, olympic\)Total== 33) #Russia won the max medals #7. More stats- correlations #7a. explore correlations between Total medals and number of Gold and Bronze cor(olympic\(Total, olympic\)Gold) #correlation = 0.919 cor(olympic\(Total, olympic\)Bronze) #correlation = 0.899 cor(olympic\(Gold, olympic\)Bronze) #correlation= 0.725 #7b.What is the correlation between Rank and Total medals? Is this expected or surprising? cor(olympic\(Rank, olympic\)Total) #cor = -0.875 #Although expectation is for a higher rank country to win more medals, the data shows that higher rank is signified by a lower number. Thus, lower number (higher rank) countries would have more medals, which will be expressed with a negative correlation.