The file is downloaded machine to work on dataset “Auto.csv”. To download and create folder with .csv file,the R code is written in Point 7 of the report. Source courtesy : " http://vincentarelbundock.github.io/Rdatasets/"
autodata <- read.csv("Auto.csv") # To read .csv file
head(autodata)
## X mpg cylinders displacement horsepower weight acceleration year origin
## 1 1 18 8 307 130 3504 12.0 70 1
## 2 2 15 8 350 165 3693 11.5 70 1
## 3 3 18 8 318 150 3436 11.0 70 1
## 4 4 16 8 304 150 3433 12.0 70 1
## 5 5 17 8 302 140 3449 10.5 70 1
## 6 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
summary(autodata)
## X mpg cylinders displacement
## Min. : 1.00 Min. : 9.00 Min. :3.000 Min. : 68.0
## 1st Qu.: 99.75 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0
## Median :198.50 Median :22.75 Median :4.000 Median :151.0
## Mean :198.52 Mean :23.45 Mean :5.472 Mean :194.4
## 3rd Qu.:296.25 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8
## Max. :397.00 Max. :46.60 Max. :8.000 Max. :455.0
##
## horsepower weight acceleration year
## Min. : 46.0 Min. :1613 Min. : 8.00 Min. :70.00
## 1st Qu.: 75.0 1st Qu.:2225 1st Qu.:13.78 1st Qu.:73.00
## Median : 93.5 Median :2804 Median :15.50 Median :76.00
## Mean :104.5 Mean :2978 Mean :15.54 Mean :75.98
## 3rd Qu.:126.0 3rd Qu.:3615 3rd Qu.:17.02 3rd Qu.:79.00
## Max. :230.0 Max. :5140 Max. :24.80 Max. :82.00
##
## origin name
## Min. :1.000 amc matador : 5
## 1st Qu.:1.000 ford pinto : 5
## Median :1.000 toyota corolla : 5
## Mean :1.577 amc gremlin : 4
## 3rd Qu.:2.000 amc hornet : 4
## Max. :3.000 chevrolet chevette: 4
## (Other) :365
mean(autodata$displacement)
## [1] 194.412
median(autodata$displacement)
## [1] 151
mean(autodata$weight)
## [1] 2977.584
median(autodata$weight)
## [1] 2803.5
The subset “subauto1” with 50 rows and three columns ‘displacement’,‘weight’ and ‘name’is created with code chunk below. ’tail’ command is used to check the number of rows.
subauto <- subset(autodata,select = c("displacement","weight","name"))
subauto1 <- subauto[1:50,]
tail(subauto1)
## displacement weight name
## 45 258 2962 amc hornet sportabout (sw)
## 46 140 2408 chevrolet vega (sw)
## 47 250 3282 pontiac firebird
## 48 250 3139 ford mustang
## 49 122 2220 mercury capri 2000
## 50 116 2123 opel 1900
summary(subauto1)
## displacement weight name
## Min. : 97.0 Min. :1835 amc gremlin : 2
## 1st Qu.:154.5 1st Qu.:2599 chevrolet chevelle malibu: 2
## Median :280.0 Median :3381 chevrolet impala : 2
## Mean :268.8 Mean :3366 datsun pl510 : 2
## 3rd Qu.:357.8 3rd Qu.:4195 ford galaxie 500 : 2
## Max. :455.0 Max. :5140 plymouth fury iii : 2
## (Other) :38
Column names replaced with their abbreviations to create new names.
head(subauto1)
## displacement weight name
## 1 307 3504 chevrolet chevelle malibu
## 2 350 3693 buick skylark 320
## 3 318 3436 plymouth satellite
## 4 304 3433 amc rebel sst
## 5 302 3449 ford torino
## 6 429 4341 ford galaxie 500
subauto1 <- setNames(subauto1,c("disp","wt","carmodel"))
names(subauto1)
## [1] "disp" "wt" "carmodel"
tail(subauto1)
## disp wt carmodel
## 45 258 2962 amc hornet sportabout (sw)
## 46 140 2408 chevrolet vega (sw)
## 47 250 3282 pontiac firebird
## 48 250 3139 ford mustang
## 49 122 2220 mercury capri 2000
## 50 116 2123 opel 1900
Summary is displayed in code cunk point 2.Mean and median is displayed on ‘disp’ and ‘wt’ attribute.
mean(subauto1$disp)
## [1] 268.76
median(subauto1$disp)
## [1] 280
mean(subauto1$wt)
## [1] 3366.46
median(subauto1$wt)
## [1] 3381
As the number of rows in “autodata” and its subset “subauto1” are reduced, the ‘mean’ and ‘median’ are not equal in original “autodata” and subset “subauto1”.The values would have been same if row number was kept similar to original data set.
To replace letter “c” with “modelc” in every carmodel field,“gsub”command is used. The command belongs to the family of “grep” which is used for pattern matching and replacement. ‘gsub’ relpaced every ‘c’ with ‘modelc’ in carmodel column.
subauto1$carmodel <- gsub("c","modelc",subauto1$carmodel)
head(subauto1)
## disp wt carmodel
## 1 307 3504 modelchevrolet modelchevelle malibu
## 2 350 3693 buimodelck skylark 320
## 3 318 3436 plymouth satellite
## 4 304 3433 ammodelc rebel sst
## 5 302 3449 ford torino
## 6 429 4341 ford galaxie 500
Every code chunk displays rows from (1-6) for every step taken.
Two ways are demonstrated below to read file with R.
fileUrl <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/ISLR/Auto.csv"
t <- read.csv(fileUrl,header = TRUE )
head(t)
## X mpg cylinders displacement horsepower weight acceleration year origin
## 1 1 18 8 307 130 3504 12.0 70 1
## 2 2 15 8 350 165 3693 11.5 70 1
## 3 3 18 8 318 150 3436 11.0 70 1
## 4 4 16 8 304 150 3433 12.0 70 1
## 5 5 17 8 302 140 3449 10.5 70 1
## 6 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
if (!file.exists("./Rbridge")){dir.create("./Rbridge")}
fileUrl <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/ISLR/Auto.csv"
download.file(fileUrl,destfile="./Rbridge/Auto.csv",method="curl")