[Dataset] (https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv)

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

mt.cars <- read.csv("mtcars.csv")
print(summary(mt.cars))
##                   X           mpg             cyl             disp      
##  AMC Javelin       : 1   Min.   :10.40   Min.   :4.000   Min.   : 71.1  
##  Cadillac Fleetwood: 1   1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8  
##  Camaro Z28        : 1   Median :19.20   Median :6.000   Median :196.3  
##  Chrysler Imperial : 1   Mean   :20.09   Mean   :6.188   Mean   :230.7  
##  Datsun 710        : 1   3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0  
##  Dodge Challenger  : 1   Max.   :33.90   Max.   :8.000   Max.   :472.0  
##  (Other)           :26                                                  
##        hp             drat             wt             qsec      
##  Min.   : 52.0   Min.   :2.760   Min.   :1.513   Min.   :14.50  
##  1st Qu.: 96.5   1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89  
##  Median :123.0   Median :3.695   Median :3.325   Median :17.71  
##  Mean   :146.7   Mean   :3.597   Mean   :3.217   Mean   :17.85  
##  3rd Qu.:180.0   3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90  
##  Max.   :335.0   Max.   :4.930   Max.   :5.424   Max.   :22.90  
##                                                                 
##        vs               am              gear            carb      
##  Min.   :0.0000   Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4375   Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :5.000   Max.   :8.000  
## 
meanMpg <- mean(mt.cars$mpg, na.rm = TRUE)
print(meanMpg)
## [1] 20.09062
medianMpg <- median(mt.cars$mpg, na.rm = TRUE)
print(medianMpg)
## [1] 19.2
meanDisp <- mean(mt.cars$disp, na.rm = TRUE)
print(meanDisp)
## [1] 230.7219
medianDisp <- median(mt.cars$disp, na.rm = TRUE)
print(medianDisp)
## [1] 196.3
meanHp <- mean(mt.cars$hp, na.rm = TRUE)
print(meanHp)
## [1] 146.6875
medianHp <- median(mt.cars$hp, na.rm = TRUE)
print(medianHp)
## [1] 123

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

#install.packages("stringr")
require(stringr)
mercCars <- mt.cars[str_detect(mt.cars$X, "Merc"), c("X", "vs", "am", "gear", "carb")]
print(mercCars)
##              X vs am gear carb
## 8    Merc 240D  1  0    4    2
## 9     Merc 230  1  0    4    2
## 10    Merc 280  1  0    4    4
## 11   Merc 280C  1  0    4    4
## 12  Merc 450SE  0  0    3    3
## 13  Merc 450SL  0  0    3    3
## 14 Merc 450SLC  0  0    3    3

3. Create new column names for the new data frame.

colnames(mercCars) <- c("model", "_vs", "_am", "_gear", "_carb")
print(mercCars)
##          model _vs _am _gear _carb
## 8    Merc 240D   1   0     4     2
## 9     Merc 230   1   0     4     2
## 10    Merc 280   1   0     4     4
## 11   Merc 280C   1   0     4     4
## 12  Merc 450SE   0   0     3     3
## 13  Merc 450SL   0   0     3     3
## 14 Merc 450SLC   0   0     3     3

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(mercCars)
##         model        _vs              _am        _gear           _carb    
##  Merc 230  :1   Min.   :0.0000   Min.   :0   Min.   :3.000   Min.   :2.0  
##  Merc 240D :1   1st Qu.:0.0000   1st Qu.:0   1st Qu.:3.000   1st Qu.:2.5  
##  Merc 280  :1   Median :1.0000   Median :0   Median :4.000   Median :3.0  
##  Merc 280C :1   Mean   :0.5714   Mean   :0   Mean   :3.571   Mean   :3.0  
##  Merc 450SE:1   3rd Qu.:1.0000   3rd Qu.:0   3rd Qu.:4.000   3rd Qu.:3.5  
##  Merc 450SL:1   Max.   :1.0000   Max.   :0   Max.   :4.000   Max.   :4.0  
##  (Other)   :1
meanGear <- mean(mercCars$'_gear', na.rm = TRUE)
print(meanGear)
## [1] 3.571429
medianGear <- median(mercCars$'_gear', na.rm = TRUE)
print(medianGear)
## [1] 4
meanCarb <- mean(mercCars$'_carb', na.rm = TRUE)
print(meanCarb)
## [1] 3
medianCarb <- median(mercCars$'_carb', na.rm = TRUE)
print(medianCarb)
## [1] 3

5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

mercCars$'_carb' <- gsub("2", "20", as.character(mercCars$'_carb'))
mercCars$'_carb' <- gsub("3", "30", as.character(mercCars$'_carb'))
mercCars$'_carb' <- gsub("4", "40", as.character(mercCars$'_carb'))

6. Display enough rows to see examples of all of steps 1-5 above.

print(mercCars)
##          model _vs _am _gear _carb
## 8    Merc 240D   1   0     4    20
## 9     Merc 230   1   0     4    20
## 10    Merc 280   1   0     4    40
## 11   Merc 280C   1   0     4    40
## 12  Merc 450SE   0   0     3    30
## 13  Merc 450SL   0   0     3    30
## 14 Merc 450SLC   0   0     3    30

7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

#install.packages("RCurl")
library(RCurl)
gitHubUrl <- getURL("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv")
mtCars <- read.csv(text = gitHubUrl)
print(head(mtCars))
##                   X  mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1