R Bridge Week 02 Assignment

[Dataset] (https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv)

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

mt.cars <- read.csv("mtcars.csv")
print(summary(mt.cars))

##                   X           mpg             cyl             disp      
##  AMC Javelin       : 1   Min.   :10.40   Min.   :4.000   Min.   : 71.1  
##  Cadillac Fleetwood: 1   1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8  
##  Camaro Z28        : 1   Median :19.20   Median :6.000   Median :196.3  
##  Chrysler Imperial : 1   Mean   :20.09   Mean   :6.188   Mean   :230.7  
##  Datsun 710        : 1   3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0  
##  Dodge Challenger  : 1   Max.   :33.90   Max.   :8.000   Max.   :472.0  
##  (Other)           :26                                                  
##        hp             drat             wt             qsec      
##  Min.   : 52.0   Min.   :2.760   Min.   :1.513   Min.   :14.50  
##  1st Qu.: 96.5   1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89  
##  Median :123.0   Median :3.695   Median :3.325   Median :17.71  
##  Mean   :146.7   Mean   :3.597   Mean   :3.217   Mean   :17.85  
##  3rd Qu.:180.0   3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90  
##  Max.   :335.0   Max.   :4.930   Max.   :5.424   Max.   :22.90  
##                                                                 
##        vs               am              gear            carb      
##  Min.   :0.0000   Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4375   Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :5.000   Max.   :8.000  
##

meanMpg <- mean(mt.cars$mpg, na.rm = TRUE)
print(meanMpg)

## [1] 20.09062

medianMpg <- median(mt.cars$mpg, na.rm = TRUE)
print(medianMpg)

## [1] 19.2

meanDisp <- mean(mt.cars$disp, na.rm = TRUE)
print(meanDisp)

## [1] 230.7219

medianDisp <- median(mt.cars$disp, na.rm = TRUE)
print(medianDisp)

## [1] 196.3

meanHp <- mean(mt.cars$hp, na.rm = TRUE)
print(meanHp)

## [1] 146.6875

medianHp <- median(mt.cars$hp, na.rm = TRUE)
print(medianHp)

## [1] 123

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

#install.packages("stringr")
require(stringr)
mercCars <- mt.cars[str_detect(mt.cars$X, "Merc"), c("X", "vs", "am", "gear", "carb")]
print(mercCars)

##              X vs am gear carb
## 8    Merc 240D  1  0    4    2
## 9     Merc 230  1  0    4    2
## 10    Merc 280  1  0    4    4
## 11   Merc 280C  1  0    4    4
## 12  Merc 450SE  0  0    3    3
## 13  Merc 450SL  0  0    3    3
## 14 Merc 450SLC  0  0    3    3

3. Create new column names for the new data frame.

colnames(mercCars) <- c("model", "_vs", "_am", "_gear", "_carb")
print(mercCars)

##          model _vs _am _gear _carb
## 8    Merc 240D   1   0     4     2
## 9     Merc 230   1   0     4     2
## 10    Merc 280   1   0     4     4
## 11   Merc 280C   1   0     4     4
## 12  Merc 450SE   0   0     3     3
## 13  Merc 450SL   0   0     3     3
## 14 Merc 450SLC   0   0     3     3

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(mercCars)

##         model        _vs              _am        _gear           _carb    
##  Merc 230  :1   Min.   :0.0000   Min.   :0   Min.   :3.000   Min.   :2.0  
##  Merc 240D :1   1st Qu.:0.0000   1st Qu.:0   1st Qu.:3.000   1st Qu.:2.5  
##  Merc 280  :1   Median :1.0000   Median :0   Median :4.000   Median :3.0  
##  Merc 280C :1   Mean   :0.5714   Mean   :0   Mean   :3.571   Mean   :3.0  
##  Merc 450SE:1   3rd Qu.:1.0000   3rd Qu.:0   3rd Qu.:4.000   3rd Qu.:3.5  
##  Merc 450SL:1   Max.   :1.0000   Max.   :0   Max.   :4.000   Max.   :4.0  
##  (Other)   :1

meanGear <- mean(mercCars$'_gear', na.rm = TRUE)
print(meanGear)

## [1] 3.571429

medianGear <- median(mercCars$'_gear', na.rm = TRUE)
print(medianGear)

## [1] 4

meanCarb <- mean(mercCars$'_carb', na.rm = TRUE)
print(meanCarb)

## [1] 3

medianCarb <- median(mercCars$'_carb', na.rm = TRUE)
print(medianCarb)

## [1] 3

5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

mercCars$'_carb' <- gsub("2", "20", as.character(mercCars$'_carb'))
mercCars$'_carb' <- gsub("3", "30", as.character(mercCars$'_carb'))
mercCars$'_carb' <- gsub("4", "40", as.character(mercCars$'_carb'))

6. Display enough rows to see examples of all of steps 1-5 above.

print(mercCars)

##          model _vs _am _gear _carb
## 8    Merc 240D   1   0     4    20
## 9     Merc 230   1   0     4    20
## 10    Merc 280   1   0     4    40
## 11   Merc 280C   1   0     4    40
## 12  Merc 450SE   0   0     3    30
## 13  Merc 450SL   0   0     3    30
## 14 Merc 450SLC   0   0     3    30

7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

#install.packages("RCurl")
library(RCurl)
gitHubUrl <- getURL("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv")
mtCars <- read.csv(text = gitHubUrl)
print(head(mtCars))

##                   X  mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

R Bridge Week 02 Assignment

Binish Kurian Chandy

Jan 07, 2018

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

3. Create new column names for the new data frame.

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

6. Display enough rows to see examples of all of steps 1-5 above.

7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.