MSDS Winter 2018 R Workshop

Jiadi Li

Assignment #2 Data Wrangling

Read Data File

horsePrices <- read.csv(“https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/Stat2Data/HorsePrices.csv”, header = TRUE)

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

summary(horsePrices) mean(horsePrices\(Price) mean(horsePrices\)Age) median(horsePrices\(Price) median(horsePrices\)Age)

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

oldHorse <- subset(horsePrices,Age>=8,select=c(HorseID,Price,Age,Sex))

3. Create new column names for the new data frame.

colnames(oldHorse)[4] <- “Gender”

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(oldHorse) mean(oldHorse\(Price) mean(oldHorse\)Age) median(horsePrices\(Age) median(horsePrices\)Height,na.rm = TRUE)

5. For at least 3 values in a column please rename so that every value in that column is renamed.

oldHorse <- within(oldHorse,Price[Gender==‘m’] <- Price*0.9)

6. Display enough rows to see examples of all of steps 1-5 above.

oldHorse

7. BONUS - place the original .csv in a github file and have R read from the link.

horseCsv <- read.csv(“https://raw.githubusercontent.com/xiaoxiaogao-DD/String-Class/master/HorsePrices.csv”,header = TRUE) horseCsv summary(horseCsv)