Bonus: Place the original .csv in a github file and have R read from the link. This will be a very
useful skill as you progress in your data science education and career.
url <- getURL("https://rawgit.com/nschettini/MSDSBridgeR/master/mtcars.csv")
mtcarsx <- read.csv(text = url)
head(mtcarsx)
## X mpg cyl disp hp drat wt qsec vs am gear carb
## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Question 1: Use the summary function to gain an overview of the data set. Then display the mean and
Question 2: Create a new data frame with a subset of the columns and rows. Make sure to rename it
df1 <- subset(mtcarsx, hp > 100 & mpg > 15, c('mpg','hp', 'wt','cyl'))
df1
## mpg hp wt cyl
## 1 21.0 110 2.620 6
## 2 21.0 110 2.875 6
## 4 21.4 110 3.215 6
## 5 18.7 175 3.440 8
## 6 18.1 105 3.460 6
## 10 19.2 123 3.440 6
## 11 17.8 123 3.440 6
## 12 16.4 180 4.070 8
## 13 17.3 180 3.730 8
## 14 15.2 180 3.780 8
## 22 15.5 150 3.520 8
## 23 15.2 150 3.435 8
## 25 19.2 175 3.845 8
## 28 30.4 113 1.513 4
## 29 15.8 264 3.170 8
## 30 19.7 175 2.770 6
## 32 21.4 109 2.780 4
Question 3: Create new column names for the new data frame.
df2 <- rename(df1, horse_power = hp, miles_per_gallon = mpg, weight = wt, cylinders = cyl)
df2
## miles_per_gallon horse_power weight cylinders
## 1 21.0 110 2.620 6
## 2 21.0 110 2.875 6
## 4 21.4 110 3.215 6
## 5 18.7 175 3.440 8
## 6 18.1 105 3.460 6
## 10 19.2 123 3.440 6
## 11 17.8 123 3.440 6
## 12 16.4 180 4.070 8
## 13 17.3 180 3.730 8
## 14 15.2 180 3.780 8
## 22 15.5 150 3.520 8
## 23 15.2 150 3.435 8
## 25 19.2 175 3.845 8
## 28 30.4 113 1.513 4
## 29 15.8 264 3.170 8
## 30 19.7 175 2.770 6
## 32 21.4 109 2.780 4
Question 4: Use the summary function to create an overview of your new data frame. The print the mean
and median for the same two attributes. Please compare.
summary(df2)
## miles_per_gallon horse_power weight cylinders
## Min. :15.20 Min. :105.0 Min. :1.513 Min. :4.000
## 1st Qu.:16.40 1st Qu.:110.0 1st Qu.:2.875 1st Qu.:6.000
## Median :18.70 Median :150.0 Median :3.440 Median :6.000
## Mean :19.02 Mean :148.9 Mean :3.241 Mean :6.706
## 3rd Qu.:21.00 3rd Qu.:175.0 3rd Qu.:3.520 3rd Qu.:8.000
## Max. :30.40 Max. :264.0 Max. :4.070 Max. :8.000
cat("The mean of mpg is", mean(df2$miles_per_gallon), "\n")
## The mean of mpg is 19.01765
cat("The mean of hp is", mean(df2$horse_power), "\n")
## The mean of hp is 148.9412
cat("The median of mpg is", median(df2$miles_per_gallon), "\n")
## The median of mpg is 18.7
cat("The median of hp is",median(df2$horse_power), "\n")
## The median of hp is 150
MPG_compared <- c('Original Mean','New Mean', 'Original median','New Median')
mpg <- c(mean(mtcars$mpg), mean(df2$miles_per_gallon), median(mtcars$mpg), median(df2$miles_per_gallon))
table_mpg <- data.frame(MPG_compared, mpg)
table_mpg
## MPG_compared mpg
## 1 Original Mean 20.09062
## 2 New Mean 19.01765
## 3 Original median 19.20000
## 4 New Median 18.70000
HP_compared <- c('Original Mean','New Mean', 'Original median','New Median')
hp <- c(mean(mtcars$hp), mean(df2$horse_power), median(mtcars$hp), median(df2$horse_power))
table_hp <- data.frame(HP_compared, hp)
table_hp
## HP_compared hp
## 1 Original Mean 146.6875
## 2 New Mean 148.9412
## 3 Original median 123.0000
## 4 New Median 150.0000
Question 5: For at least 3 values in a column please rename so that every value in that column is renamed.
For example, suppose I have 20 values of the letter “e” in one column. Rename those values so
that all 20 would show as “excellent”
i <- 1
for (x in df2$cylinders){
if (x == 4){
df2$cylinders[i] <- "four Cylinders"
}else if (x == 6){
df2$cylinders[i] <- "six cylinders"
}else if (x == 8){
df2$cylinders[i] <- "eight cylinders"
}
i <- i + 1
}
df2
## miles_per_gallon horse_power weight cylinders
## 1 21.0 110 2.620 six cylinders
## 2 21.0 110 2.875 six cylinders
## 4 21.4 110 3.215 six cylinders
## 5 18.7 175 3.440 eight cylinders
## 6 18.1 105 3.460 six cylinders
## 10 19.2 123 3.440 six cylinders
## 11 17.8 123 3.440 six cylinders
## 12 16.4 180 4.070 eight cylinders
## 13 17.3 180 3.730 eight cylinders
## 14 15.2 180 3.780 eight cylinders
## 22 15.5 150 3.520 eight cylinders
## 23 15.2 150 3.435 eight cylinders
## 25 19.2 175 3.845 eight cylinders
## 28 30.4 113 1.513 four Cylinders
## 29 15.8 264 3.170 eight cylinders
## 30 19.7 175 2.770 six cylinders
## 32 21.4 109 2.780 four Cylinders
Question 6: Display enough rows to see examples of all of steps 1-5 above.