There are a lot of ways to do Excel things in R. In Part 3, we will explore working with columns, formatting, sorting, conditionals, using LookUps, pivot tables, and something similar to Solver or Goal Seek.
We will work in this set of problems with the iris data set. First, let’s take look at it and get an idea what we are working with.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
We can remove the Species column easily enough. It’s non-numeric and won’t fit into our plans well.
iris.num <- iris[, -5]
head(iris.num)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.1 3.5 1.4 0.2
## 2 4.9 3.0 1.4 0.2
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
## 6 5.4 3.9 1.7 0.4
Since we removed the non-numeric Species column, we can work with the colMeans and colSums commands.
colSums(iris.num)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 876.5 458.6 563.7 179.9
colMeans(iris.num)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 5.843333 3.057333 3.758000 1.199333
There are two different ways we can find minimum and maximum values in a column. They are the apply and sapply commands.
apply(iris.num, 2, min) #or
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 4.3 2.0 1.0 0.1
sapply(iris.num, min)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 4.3 2.0 1.0 0.1
apply(iris.num, 2, max) #or
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 7.9 4.4 6.9 2.5
sapply(iris.num, max)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 7.9 4.4 6.9 2.5
Yet another variation on the theme is this:
apply(iris.num[, c('Sepal.Length','Sepal.Width', 'Petal.Length', 'Petal.Width')], 2, range)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## [1,] 4.3 2.0 1.0 0.1
## [2,] 7.9 4.4 6.9 2.5
If you don’t like the column order for these values, feel free to move them around.
apply(iris.num[, c('Petal.Length', 'Petal.Width', 'Sepal.Length', 'Sepal.Width')], 2, range)
## Petal.Length Petal.Width Sepal.Length Sepal.Width
## [1,] 1.0 0.1 4.3 2.0
## [2,] 6.9 2.5 7.9 4.4
Formatting in R isn’t as easy as it is in Excel, but it’s not very difficult either. Here is an example using numbers in the format command.
format(12345678.9, digits = 9, decimal.mark = '.', big.mark = ',', small.mark = ',', small.interval = 3)
## [1] "12,345,678.9"
We can use the format command to format data in a data frame, as well.
x <- colMeans(mtcars[, 1:4])
format(x, digits = 2, nsmall = 2)
## mpg cyl disp hp
## " 20.09" " 6.19" "230.72" "146.69"
Here’s a way to convert decimals into percentages.
x <- seq(0.5, 0.55, 0.01)
sprintf('%.1f%%', 100 * x)
## [1] "50.0%" "51.0%" "52.0%" "53.0%" "54.0%" "55.0%"
Likewise, we can set currency symbols.
set.seed(1)
x <- 1000 * runif(5)
sprintf('$%3.2f', x)
## [1] "$265.51" "$372.12" "$572.85" "$908.21" "$201.68"
We can edit text and combine values. Here we show the prices of bread and cheese.
shopping <- c('dozen eggs', 'bacon')
prices <- c(1.89, 6.99)
sprintf('%s costs $%3.2f', shopping, prices)
## [1] "dozen eggs costs $1.89" "bacon costs $6.99"
Data can be sorted in R in various methods. Here we use the with command to sort all the data in the mtcars data frame by horsepower (hp). Note that ascending order is the default here.
with(mtcars, mtcars[order(hp), ])
## mpg cyl disp hp drat wt qsec vs am gear carb
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
To sort in descending order, we have to specify using a switch.
with(mtcars, mtcars[order(hp, decreasing = TRUE), ])
## mpg cyl disp hp drat wt qsec vs am gear carb
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
We discussed something like an Excel IF statements in the first part of this series. Here is another approach using the witin and ifelse commands in R. They work very much like an IF function in Excel.
The psuedocode version of the following command would be something like, “Create an column in mtcars called mpgClass. If mpg is less than the average mpg, tag it as”Low“. Otherwise, tag it”High“.
mtcars <- within(mtcars, mpgClass <- ifelse(mpg < mean(mpg), 'Low', 'High'))
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## mpgClass
## Mazda RX4 High
## Mazda RX4 Wag High
## Datsun 710 High
## Hornet 4 Drive High
## Hornet Sportabout Low
## Valiant Low
## Duster 360 Low
## Merc 240D High
## Merc 230 High
## Merc 280 Low
## Merc 280C Low
## Merc 450SE Low
## Merc 450SL Low
## Merc 450SLC Low
## Cadillac Fleetwood Low
## Lincoln Continental Low
## Chrysler Imperial Low
## Fiat 128 High
## Honda Civic High
## Toyota Corolla High
## Toyota Corona High
## Dodge Challenger Low
## AMC Javelin Low
## Camaro Z28 Low
## Pontiac Firebird Low
## Fiat X1-9 High
## Porsche 914-2 High
## Lotus Europa High
## Ford Pantera L Low
## Ferrari Dino Low
## Maserati Bora Low
## Volvo 142E High
We can now break the data into groups of High mpg and Low mpg vehicles. Note there are a couple ways to do this. We can describe the Low mpg vehicles as == Low or we can describe them as != High, meaning Not High.
mtcars[mtcars$mpgClass == 'High', ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## mpgClass
## Mazda RX4 High
## Mazda RX4 Wag High
## Datsun 710 High
## Hornet 4 Drive High
## Merc 240D High
## Merc 230 High
## Fiat 128 High
## Honda Civic High
## Toyota Corolla High
## Toyota Corona High
## Fiat X1-9 High
## Porsche 914-2 High
## Lotus Europa High
## Volvo 142E High
mtcars[mtcars$mpgClass != 'High', ] #or
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## mpgClass
## Hornet Sportabout Low
## Valiant Low
## Duster 360 Low
## Merc 280 Low
## Merc 280C Low
## Merc 450SE Low
## Merc 450SL Low
## Merc 450SLC Low
## Cadillac Fleetwood Low
## Lincoln Continental Low
## Chrysler Imperial Low
## Dodge Challenger Low
## AMC Javelin Low
## Camaro Z28 Low
## Pontiac Firebird Low
## Ford Pantera L Low
## Ferrari Dino Low
## Maserati Bora Low
mtcars[mtcars$mpgClass == 'Low', ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## mpgClass
## Hornet Sportabout Low
## Valiant Low
## Duster 360 Low
## Merc 280 Low
## Merc 280C Low
## Merc 450SE Low
## Merc 450SL Low
## Merc 450SLC Low
## Cadillac Fleetwood Low
## Lincoln Continental Low
## Chrysler Imperial Low
## Dodge Challenger Low
## AMC Javelin Low
## Camaro Z28 Low
## Pontiac Firebird Low
## Ford Pantera L Low
## Ferrari Dino Low
## Maserati Bora Low
Conditionals can be created in R using the with command. Here, we want to know the average mpg for vehicles with horsepower greater than and less than 150.
First, we find the mean mpg. Then we find mean mpg by horsepower. The last with command tells us the number of vehicles with 150 horsepower or greater.
with(mtcars, mean(mpg))
## [1] 20.09062
with(mtcars, mean(mpg[hp < 150]))
## [1] 24.22353
with(mtcars, mean(mpg[hp <= 150]))
## [1] 23.28947
with(mtcars, mean(mpg[hp > 150]))
## [1] 15.41538
with(mtcars, mean(mpg[hp >= 150]))
## [1] 15.40667
with(mtcars, length(mpg[hp >= 150]))
## [1] 15
We can transpose columns and rows easily enough in R. First, let’s create a matrix with 4 columns and 6 rows.
x <- matrix (1:24, ncol = 4)
x
## [,1] [,2] [,3] [,4]
## [1,] 1 7 13 19
## [2,] 2 8 14 20
## [3,] 3 9 15 21
## [4,] 4 10 16 22
## [5,] 5 11 17 23
## [6,] 6 12 18 24
The t command is used to transpose the object, our matrix, into one of 6 columns and 4 rows.
t(x)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 2 3 4 5 6
## [2,] 7 8 9 10 11 12
## [3,] 13 14 15 16 17 18
## [4,] 19 20 21 22 23 24
We can revert back to the original matrix by tranposing the transposition, if we want. Or we can just call it by the name of the object, x.
t(t(x))
## [,1] [,2] [,3] [,4]
## [1,] 1 7 13 19
## [2,] 2 8 14 20
## [3,] 3 9 15 21
## [4,] 4 10 16 22
## [5,] 5 11 17 23
## [6,] 6 12 18 24
x
## [,1] [,2] [,3] [,4]
## [1,] 1 7 13 19
## [2,] 2 8 14 20
## [3,] 3 9 15 21
## [4,] 4 10 16 22
## [5,] 5 11 17 23
## [6,] 6 12 18 24
This is an important point. We transposed x but we did not impact the original object x in our environment. R is like that.
Here is another example in which we transpose the first four rows of the mtcars data.
mtcars[1:4, ]
## mpg cyl disp hp drat wt qsec vs am gear carb mpgClass
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 High
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 High
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 High
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 High
t(mtcars[1:4, ])
## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## mpg "21.0" "21.0" "22.8" "21.4"
## cyl "6" "6" "4" "6"
## disp "160" "160" "108" "258"
## hp "110" "110" " 93" "110"
## drat "3.90" "3.90" "3.85" "3.08"
## wt "2.620" "2.875" "2.320" "3.215"
## qsec "16.46" "17.02" "18.61" "19.44"
## vs "0" "0" "1" "1"
## am "1" "1" "1" "0"
## gear "4" "4" "4" "3"
## carb "4" "4" "1" "1"
## mpgClass "High" "High" "High" "High"
It is sometimes necessary to search through large data sets to find any duplicate data points, or to identify unique ones.
Let’s find the unique values in the cyl column of the mtcars data.
unique(mtcars$cyl)
## [1] 6 4 8
That’s as expected. Cars typically have 4, 6, or 8 cylinders.
Any duplicated values in the iris data set?
dupes <- duplicated(iris)
dupes
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
It looks like there is just one: Row 143. Can we confirm that?
which(dupes)
## [1] 143
What’s in Row 143? What’s not in Row 143? And how many rows are not duplicated?
iris[dupes, ]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 143 5.8 2.7 5.1 1.9 virginica
iris[!dupes, ]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1.0 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1.0 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1.0 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1.0 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1.0 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2.0 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2.0 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
nrow(iris[!dupes, ])
## [1] 149
We can use the match command in R to tell us more about our data.
For example, where cqan we find the Toyota Corolla rows in the mtcars data?
index <- match('Toyota Corolla', rownames(mtcars))
index
## [1] 20
It’s in Row 20. Let’s get a quick summary of the first four columns.
mtcars[index, 1:4]
## mpg cyl disp hp
## Toyota Corolla 33.9 4 71.1 65
We’ve seen elsewhere how to create a pivot table in R. Here is another method using the with and aggregate commands.
In this case, we want to create a table of the mean horsepower of vehicles in the mtcars data set by number of cylinders and number of gears. We list cyl first, so the rows will be cylinders. Columns will be gear.
with(mtcars, tapply(hp, list(cyl, gear), mean))
## 3 4 5
## 4 97.0000 76.0 102.0
## 6 107.5000 116.5 175.0
## 8 194.1667 NA 299.5
We can reverse the order of cyl and gear easily enough.
with(mtcars, tapply(hp, list(gear, cyl), mean))
## 4 6 8
## 3 97 107.5 194.1667
## 4 76 116.5 NA
## 5 102 175.0 299.5000
In this example, we are finding the mean horsepower of vehicles by the number of cylinders, the number of gears, and the type of transmission (am stands for automatic or manual.
aggregate( hp ~ cyl + gear + am, mtcars, mean)
## cyl gear am hp
## 1 4 3 0 97.00000
## 2 6 3 0 107.50000
## 3 8 3 0 194.16667
## 4 4 4 0 78.50000
## 5 6 4 0 123.00000
## 6 4 4 1 75.16667
## 7 6 4 1 110.00000
## 8 4 5 1 102.00000
## 9 6 5 1 175.00000
## 10 8 5 1 299.50000
R does not have the capability to perform Goal Seek and Solver the way Excel does, but it is possible to mimic them by writing a function in R. You will find writing functions is one of the most powerful features of R. Sure, you can create functions in a cell in Excel, or even write a VBA function, but functions in R are much more flexible and useful.
In this example we are creating a demand curve for a product (sales is a function of price) and then finding the price at which total sales are optimized. We create the functions then use them to graph the relationships.
sales <- function(price) {
100 - 0.5 * price #a linear function
}
revenues <- function(price) {
price * sales(price) #a non-linear function
}
par(mfrow = c(1, 2))
curve(sales, from = 0, to = 300, xname = 'Price', ylab = 'Sales', main = 'Sales')
curve(revenues, from = 0, to = 300, xname = 'Price', ylab = 'Revenues', main = 'Revenues')
We can eyeball the revenue graph and see pretty easily that a price of $100 maximizes revenue at about $5000, but we can confirm that using the optimize command.
par(mfrow = c(1, 1))
optimize(revenues, interval = c(50, 150), maximum = TRUE)
## $maximum
## [1] 100
##
## $objective
## [1] 5000