The file is downloaded machine to work on dataset “Auto.csv”. To download and create folder with .csv file,the R code is written in Point 7 of the report. Source courtesy : " http://vincentarelbundock.github.io/Rdatasets/"

autodata <- read.csv("Auto.csv")  # To read .csv file
head(autodata)

##   X mpg cylinders displacement horsepower weight acceleration year origin
## 1 1  18         8          307        130   3504         12.0   70      1
## 2 2  15         8          350        165   3693         11.5   70      1
## 3 3  18         8          318        150   3436         11.0   70      1
## 4 4  16         8          304        150   3433         12.0   70      1
## 5 5  17         8          302        140   3449         10.5   70      1
## 6 6  15         8          429        198   4341         10.0   70      1
##                        name
## 1 chevrolet chevelle malibu
## 2         buick skylark 320
## 3        plymouth satellite
## 4             amc rebel sst
## 5               ford torino
## 6          ford galaxie 500

1) Summary of “Auto.csv” File with mean and median of ‘dispalcement’ and ‘weight’ attributes.Same attributes are continued throughout the report.

summary(autodata)

##        X               mpg          cylinders      displacement  
##  Min.   :  1.00   Min.   : 9.00   Min.   :3.000   Min.   : 68.0  
##  1st Qu.: 99.75   1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0  
##  Median :198.50   Median :22.75   Median :4.000   Median :151.0  
##  Mean   :198.52   Mean   :23.45   Mean   :5.472   Mean   :194.4  
##  3rd Qu.:296.25   3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8  
##  Max.   :397.00   Max.   :46.60   Max.   :8.000   Max.   :455.0  
##                                                                  
##    horsepower        weight      acceleration        year      
##  Min.   : 46.0   Min.   :1613   Min.   : 8.00   Min.   :70.00  
##  1st Qu.: 75.0   1st Qu.:2225   1st Qu.:13.78   1st Qu.:73.00  
##  Median : 93.5   Median :2804   Median :15.50   Median :76.00  
##  Mean   :104.5   Mean   :2978   Mean   :15.54   Mean   :75.98  
##  3rd Qu.:126.0   3rd Qu.:3615   3rd Qu.:17.02   3rd Qu.:79.00  
##  Max.   :230.0   Max.   :5140   Max.   :24.80   Max.   :82.00  
##                                                                
##      origin                      name    
##  Min.   :1.000   amc matador       :  5  
##  1st Qu.:1.000   ford pinto        :  5  
##  Median :1.000   toyota corolla    :  5  
##  Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :3.000   chevrolet chevette:  4  
##                  (Other)           :365

mean(autodata$displacement)

## [1] 194.412

median(autodata$displacement)

## [1] 151

mean(autodata$weight)

## [1] 2977.584

median(autodata$weight)

## [1] 2803.5

2) Create a new data frame with subset of rows and columns.Rename it.

The subset “subauto1” with 50 rows and three columns ‘displacement’,‘weight’ and ‘name’is created with code chunk below. ’tail’ command is used to check the number of rows.

subauto <- subset(autodata,select = c("displacement","weight","name"))
subauto1 <- subauto[1:50,]
tail(subauto1)

##    displacement weight                       name
## 45          258   2962 amc hornet sportabout (sw)
## 46          140   2408        chevrolet vega (sw)
## 47          250   3282           pontiac firebird
## 48          250   3139               ford mustang
## 49          122   2220         mercury capri 2000
## 50          116   2123                  opel 1900

summary(subauto1)

##   displacement       weight                            name   
##  Min.   : 97.0   Min.   :1835   amc gremlin              : 2  
##  1st Qu.:154.5   1st Qu.:2599   chevrolet chevelle malibu: 2  
##  Median :280.0   Median :3381   chevrolet impala         : 2  
##  Mean   :268.8   Mean   :3366   datsun pl510             : 2  
##  3rd Qu.:357.8   3rd Qu.:4195   ford galaxie 500         : 2  
##  Max.   :455.0   Max.   :5140   plymouth fury iii        : 2  
##                                 (Other)                  :38

3) Create new column names for the new data frame.

Column names replaced with their abbreviations to create new names.

head(subauto1)

##   displacement weight                      name
## 1          307   3504 chevrolet chevelle malibu
## 2          350   3693         buick skylark 320
## 3          318   3436        plymouth satellite
## 4          304   3433             amc rebel sst
## 5          302   3449               ford torino
## 6          429   4341          ford galaxie 500

subauto1 <- setNames(subauto1,c("disp","wt","carmodel"))
names(subauto1)

## [1] "disp"     "wt"       "carmodel"

tail(subauto1)

##    disp   wt                   carmodel
## 45  258 2962 amc hornet sportabout (sw)
## 46  140 2408        chevrolet vega (sw)
## 47  250 3282           pontiac firebird
## 48  250 3139               ford mustang
## 49  122 2220         mercury capri 2000
## 50  116 2123                  opel 1900

4) Use Summary function and Print mean and median for same attributes.Please Compare.

Summary is displayed in code cunk point 2.Mean and median is displayed on ‘disp’ and ‘wt’ attribute.

mean(subauto1$disp)

## [1] 268.76

median(subauto1$disp)

## [1] 280

mean(subauto1$wt)

## [1] 3366.46

median(subauto1$wt)

## [1] 3381

COMPARISON

As the number of rows in “autodata” and its subset “subauto1” are reduced, the ‘mean’ and ‘median’ are not equal in original “autodata” and subset “subauto1”.The values would have been same if row number was kept similar to original data set.

5) For at least 3 values in a column please rename so that every value in that column is renamed.

To replace letter “c” with “modelc” in every carmodel field,“gsub”command is used. The command belongs to the family of “grep” which is used for pattern matching and replacement. ‘gsub’ relpaced every ‘c’ with ‘modelc’ in carmodel column.

subauto1$carmodel <- gsub("c","modelc",subauto1$carmodel)
head(subauto1)

##   disp   wt                            carmodel
## 1  307 3504 modelchevrolet modelchevelle malibu
## 2  350 3693              buimodelck skylark 320
## 3  318 3436                  plymouth satellite
## 4  304 3433                  ammodelc rebel sst
## 5  302 3449                         ford torino
## 6  429 4341                    ford galaxie 500

6) Display enough rows to see examples of all steps 1-5 above.

Every code chunk displays rows from (1-6) for every step taken.

7) BONUS- place the original .csv in a github file and have R read from the link.

Two ways are demonstrated below to read file with R.

The below code reads file directly without downloading on machine

fileUrl <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/ISLR/Auto.csv"
t <- read.csv(fileUrl,header = TRUE )
head(t)

##   X mpg cylinders displacement horsepower weight acceleration year origin
## 1 1  18         8          307        130   3504         12.0   70      1
## 2 2  15         8          350        165   3693         11.5   70      1
## 3 3  18         8          318        150   3436         11.0   70      1
## 4 4  16         8          304        150   3433         12.0   70      1
## 5 5  17         8          302        140   3449         10.5   70      1
## 6 6  15         8          429        198   4341         10.0   70      1
##                        name
## 1 chevrolet chevelle malibu
## 2         buick skylark 320
## 3        plymouth satellite
## 4             amc rebel sst
## 5               ford torino
## 6          ford galaxie 500

If you want to download the file and create the folder to your machine.“curl” is another important package made available by R to download files. BONUS part of assignment was very delightful to execute and learn about new packages.

if (!file.exists("./Rbridge")){dir.create("./Rbridge")}
fileUrl <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/ISLR/Auto.csv"
download.file(fileUrl,destfile="./Rbridge/Auto.csv",method="curl")

CUNY/week2/assignment2

jagruti

8/1/2017