Master’s in Data Science: R Bridge Week 2 Assignment

Bonus: Place the original .csv in a github file and have R read from the link. This will be a very

useful skill as you progress in your data science education and career.

url <- getURL("https://rawgit.com/nschettini/MSDSBridgeR/master/mtcars.csv")
mtcarsx <- read.csv(text = url)
head(mtcarsx)
##                   X  mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Question 1: Use the summary function to gain an overview of the data set. Then display the mean and

median for at least two attributes

summary(mtcarsx)
##                   X           mpg             cyl             disp      
##  AMC Javelin       : 1   Min.   :10.40   Min.   :4.000   Min.   : 71.1  
##  Cadillac Fleetwood: 1   1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8  
##  Camaro Z28        : 1   Median :19.20   Median :6.000   Median :196.3  
##  Chrysler Imperial : 1   Mean   :20.09   Mean   :6.188   Mean   :230.7  
##  Datsun 710        : 1   3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0  
##  Dodge Challenger  : 1   Max.   :33.90   Max.   :8.000   Max.   :472.0  
##  (Other)           :26                                                  
##        hp             drat             wt             qsec      
##  Min.   : 52.0   Min.   :2.760   Min.   :1.513   Min.   :14.50  
##  1st Qu.: 96.5   1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89  
##  Median :123.0   Median :3.695   Median :3.325   Median :17.71  
##  Mean   :146.7   Mean   :3.597   Mean   :3.217   Mean   :17.85  
##  3rd Qu.:180.0   3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90  
##  Max.   :335.0   Max.   :4.930   Max.   :5.424   Max.   :22.90  
##                                                                 
##        vs               am              gear            carb      
##  Min.   :0.0000   Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4375   Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :5.000   Max.   :8.000  
## 
cat("The mean of mpg is", mean(mtcarsx$mpg), "\n")
## The mean of mpg is 20.09062
cat("The mean of hp is", mean(mtcarsx$hp), "\n")
## The mean of hp is 146.6875
cat("The median of mpg is", median(mtcarsx$mpg), "\n")
## The median of mpg is 19.2
cat("The median of hp is",median(mtcarsx$hp), "\n")
## The median of hp is 123

Question 2: Create a new data frame with a subset of the columns and rows. Make sure to rename it

df1 <- subset(mtcarsx, hp > 100 & mpg > 15, c('mpg','hp', 'wt','cyl'))
df1
##     mpg  hp    wt cyl
## 1  21.0 110 2.620   6
## 2  21.0 110 2.875   6
## 4  21.4 110 3.215   6
## 5  18.7 175 3.440   8
## 6  18.1 105 3.460   6
## 10 19.2 123 3.440   6
## 11 17.8 123 3.440   6
## 12 16.4 180 4.070   8
## 13 17.3 180 3.730   8
## 14 15.2 180 3.780   8
## 22 15.5 150 3.520   8
## 23 15.2 150 3.435   8
## 25 19.2 175 3.845   8
## 28 30.4 113 1.513   4
## 29 15.8 264 3.170   8
## 30 19.7 175 2.770   6
## 32 21.4 109 2.780   4

Question 3: Create new column names for the new data frame.

df2 <- rename(df1, horse_power = hp, miles_per_gallon = mpg, weight = wt, cylinders = cyl)
df2
##    miles_per_gallon horse_power weight cylinders
## 1              21.0         110  2.620         6
## 2              21.0         110  2.875         6
## 4              21.4         110  3.215         6
## 5              18.7         175  3.440         8
## 6              18.1         105  3.460         6
## 10             19.2         123  3.440         6
## 11             17.8         123  3.440         6
## 12             16.4         180  4.070         8
## 13             17.3         180  3.730         8
## 14             15.2         180  3.780         8
## 22             15.5         150  3.520         8
## 23             15.2         150  3.435         8
## 25             19.2         175  3.845         8
## 28             30.4         113  1.513         4
## 29             15.8         264  3.170         8
## 30             19.7         175  2.770         6
## 32             21.4         109  2.780         4

Question 4: Use the summary function to create an overview of your new data frame. The print the mean

and median for the same two attributes. Please compare.

summary(df2)
##  miles_per_gallon  horse_power        weight        cylinders    
##  Min.   :15.20    Min.   :105.0   Min.   :1.513   Min.   :4.000  
##  1st Qu.:16.40    1st Qu.:110.0   1st Qu.:2.875   1st Qu.:6.000  
##  Median :18.70    Median :150.0   Median :3.440   Median :6.000  
##  Mean   :19.02    Mean   :148.9   Mean   :3.241   Mean   :6.706  
##  3rd Qu.:21.00    3rd Qu.:175.0   3rd Qu.:3.520   3rd Qu.:8.000  
##  Max.   :30.40    Max.   :264.0   Max.   :4.070   Max.   :8.000
cat("The mean of mpg is", mean(df2$miles_per_gallon), "\n")
## The mean of mpg is 19.01765
cat("The mean of hp is", mean(df2$horse_power), "\n")
## The mean of hp is 148.9412
cat("The median of mpg is", median(df2$miles_per_gallon), "\n")
## The median of mpg is 18.7
cat("The median of hp is",median(df2$horse_power), "\n")
## The median of hp is 150
MPG_compared <- c('Original Mean','New Mean', 'Original median','New Median')
mpg <- c(mean(mtcars$mpg), mean(df2$miles_per_gallon), median(mtcars$mpg), median(df2$miles_per_gallon))
table_mpg <- data.frame(MPG_compared, mpg)
table_mpg
##      MPG_compared      mpg
## 1   Original Mean 20.09062
## 2        New Mean 19.01765
## 3 Original median 19.20000
## 4      New Median 18.70000
HP_compared <- c('Original Mean','New Mean', 'Original median','New Median')
hp <- c(mean(mtcars$hp), mean(df2$horse_power), median(mtcars$hp), median(df2$horse_power))
table_hp <- data.frame(HP_compared, hp)
table_hp
##       HP_compared       hp
## 1   Original Mean 146.6875
## 2        New Mean 148.9412
## 3 Original median 123.0000
## 4      New Median 150.0000

Question 5: For at least 3 values in a column please rename so that every value in that column is renamed.

For example, suppose I have 20 values of the letter “e” in one column. Rename those values so

that all 20 would show as “excellent”

i <- 1
for (x in df2$cylinders){
  if (x == 4){
    df2$cylinders[i] <- "four Cylinders"
  }else if (x == 6){
    df2$cylinders[i] <- "six cylinders"
  }else if (x == 8){
    df2$cylinders[i] <- "eight cylinders"
  }
  i <- i + 1
}
df2
##    miles_per_gallon horse_power weight       cylinders
## 1              21.0         110  2.620   six cylinders
## 2              21.0         110  2.875   six cylinders
## 4              21.4         110  3.215   six cylinders
## 5              18.7         175  3.440 eight cylinders
## 6              18.1         105  3.460   six cylinders
## 10             19.2         123  3.440   six cylinders
## 11             17.8         123  3.440   six cylinders
## 12             16.4         180  4.070 eight cylinders
## 13             17.3         180  3.730 eight cylinders
## 14             15.2         180  3.780 eight cylinders
## 22             15.5         150  3.520 eight cylinders
## 23             15.2         150  3.435 eight cylinders
## 25             19.2         175  3.845 eight cylinders
## 28             30.4         113  1.513  four Cylinders
## 29             15.8         264  3.170 eight cylinders
## 30             19.7         175  2.770   six cylinders
## 32             21.4         109  2.780  four Cylinders

Question 6: Display enough rows to see examples of all of steps 1-5 above.