You might see a warning message just above this file. Something like… “R Markdown requires the knitr package (version 1.2 or higher)” Don’t worry about this for now. We’ll address it at the end of this file.
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
str(cars)
## 'data.frame': 50 obs. of 2 variables:
## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...
## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
If you know about quantiles, then the output should look familiar. If not, you probably recognize the min (minimum), median, mean, and max (maximum). We’ll go over quantiles in Lesson 3 so don’t worry if the output seems overwhelming.
The str() and summary() functions are helpful commands when working with a new data set. The str() function gives us the variable names and their types. The summary() function gives us an idea of the values a variable can take on.
efficient <- subset(mtcars, mpg > 23)
efficient
## mpg cyl disp hp drat wt qsec vs am gear carb
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
efficient_cars <- nrow(efficient)
efficient_cars
## [1] 7
subset(mtcars, mpg > 30 & hp > 100)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
There’s only one car that gets more than 30 mpg and 100 hp.
subset(mtcars, mpg < 14 | disp > 390)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
Note: You may be familiar with the || operator in Java. R uses one single & for the logical operator AND. It also uses one | for the logical operator OR.
The command above creates a data frame of cars that have mpg less than 14 OR a displacement of more than 390. Only one of the conditions for a car needs to be satisfied so that the car makes it into the subset. Any of the cars that fit the criteria are printed to the console.
Now you try some.
subset(mtcars, qsec <= 16.90 )
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
lightcars <- subset(mtcars, wt <= 2.0)
lightcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
mtcars$year <- 1974
Notice how the number of variables changed in the work space. You can also see the result by double clicking on mtcars in the workspace and examining the data in a table.
To drop a variable, subset the data frame and select the variable you want to drop with a negative sign in front of it.
mtcars <- subset(mtcars, select = -year)
Notice, we are back to 11 variables in the data frame.
mtcars$year <- c(1973, 1974)
Open the table of values to see what values year takes on.
Drop the year variable from the data set.
mtcars <- subset(mtcars, select = -year)
If you are having trouble don’t worry, we will review the ifelse statement at the end of Lesson 3. You won’t be quizzed on it, and it’s not essential to keep going in this course. We just want you to try to get familiar with more code.
mtcars$wt
## [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440
## [12] 4.070 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520
## [23] 3.435 3.840 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780
cond <- mtcars$wt < 3
cond
## [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
## [23] FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
mtcars$weight_class <- ifelse(cond, 'light', 'average')
mtcars$weight_class
## [1] "light" "light" "light" "average" "average" "average" "average"
## [8] "average" "average" "average" "average" "average" "average" "average"
## [15] "average" "average" "average" "light" "light" "light" "light"
## [22] "average" "average" "average" "average" "light" "light" "light"
## [29] "average" "light" "average" "light"
cond <- mtcars$wt > 3.5
mtcars$weight_class <- ifelse(cond, 'heavy', mtcars$weight_class)
mtcars$weight_class
## [1] "light" "light" "light" "average" "average" "average" "heavy"
## [8] "average" "average" "average" "average" "heavy" "heavy" "heavy"
## [15] "heavy" "heavy" "heavy" "light" "light" "light" "light"
## [22] "heavy" "average" "heavy" "heavy" "light" "light" "light"
## [29] "average" "light" "heavy" "light"
You have some variables in your workspace or environment like ‘cond’ and efficient. You want to be careful that you don’t bring in too much data into R at once since R will hold all the data in working memory. We have nothing to worry about here, but let’s delete those variables from the work space.
rm(cond)
rm(efficient)
Save this file if you haven’t done so yet.
You’ll have the opportunity to create one Rmd file for the final project in this class and submit the Rmd file and knitted output (or HTML file). You’ll need the knitr package to do that so let’s install that now. Uncomment the follownig two lines of code and run them.
# install.packages('knitr', dependencies = T)
# library(knitr)
Once you’ve installed knitr, comment out the two lines of code above. When you click the Knit HTML button a web page will be generated that includes both content (text and text formatting from Markdown) as well as the output of any embedded R code chunks within the document.
You’ve reached the end of the file so now it’s time to write some code to answer a question to continue on in Lesson 2.
Which car(s) have an mpg (miles per gallon) greater than or equal to 30 OR hp (horsepower) less than 60? Create an R chunk of code to answer the question.
subset(mtcars, mpg >= 30 | hp < 60)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## weight_class
## Fiat 128 light
## Honda Civic light
## Toyota Corolla light
## Lotus Europa light
mtcars[mtcars$mpg >= 30 | mtcars$hp < 60, ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## weight_class
## Fiat 128 light
## Honda Civic light
## Toyota Corolla light
## Lotus Europa light
Once you have the answer, go the Udacity website to continue with Lesson 2.
Note: You use brackets around text followed by two parentheses to create a link. There must be no spaces between the brackets and the parentheses. Paste or type the link into the parentheses. This also works on the discussions!
And if you want to see all of your HARD WORK from this file, click the KNIT HTML button now. (You may or may not need to restart R).