This tuturial is the part of the dplyr training series. Here is the YouTube Video link https://youtu.be/AbSz6cXKqz8
dplyr is a great tool to use in R.
The commands may look long and overwhelming to someone not using dplyr but that is not the case.
Once you learn the basics of it then it is very intuitive to use. Just like making a sentence once you have learnt the basic words of a language.
For beginners or experienced R users wanting to learn various commands of dplyr.
We will be covering all practical aspects of dplyr::slice command in this. This tutorial is part of a series of tutorials on all practical aspects of dplyr All youtube videos are available in a single playlist on YouTube.
https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR
We will be using the built in dataset called gapminder in this tutuorial
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
library(gapminder)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
We need th install the gapminder package to make use of the sample dataset. We will add an additional data column called RowID, which will be useful in our tuturial later on.
d <- gapminder::gapminder
d <- d%>%
dplyr::mutate(RowID = row_number(), .before = country)
d
d1 <- d %>%
dplyr::slice(1)
d1
d2 <- d %>%
dplyr::slice(-1)
d2
d3 <- d %>%
dplyr::slice(10)
d3
d4 <- d %>%
dplyr::slice(1:10)
d4
d5 <- d %>%
head(10)
d5
d6 <- d %>%
dplyr::slice_head(n = 10)
d6
d7 <- d %>%
dplyr::slice_tail(n = 10)
d7
Starting from the row 100, we will get all rows in the dataset.
d8 <- d %>%
dplyr::slice(100:n())
d8
d9 <- d %>%
dplyr::slice(-(1:10))
d9
In this example we want to get the rows with the 5 minimum values
d10 <- d %>%
dplyr::slice_min(lifeExp, n = 5)
d10
In this example we want to get the rows with the 5 maximum values
d11 <- d %>%
dplyr::slice_max(lifeExp, n = 5)
d11
This example shows that though you said n= 1, the output contains more than 1 rows. The reason is that you said we want to get the rows with the min value of the cylinder. So if there is a tie then all rows will be shown.
mtcars
d12 <- mtcars %>%
dplyr::slice_min(cyl, n = 1)
d12
You can explicitly tell dplyr that with_ties = FALSE. Then it will only show one value which contains the min value for the cyl column.
d13 <- mtcars %>%
dplyr::slice_min(cyl, n = 1, with_ties = FALSE)
d13
You can use the slice_sample command to pick random samples. In this example we said n = 5, so we will get 5 random records from our dataset.
d14 <- d %>%
dplyr::slice_sample(n = 5)
d14
So now there is chance that the same row would be drawn out more than 1 time as we said that we want the replace = TRUE.
d15 <- d %>%
dplyr::slice_sample(n = 5, replace = TRUE)
d15
We are telling dplyr that we want to draw 5 random rows from our data, Replace option is true ( that means the same row can be drawn more than one) Lastly we said that we want to use the lifeExp column as weight_by, so the higher values of the lifeExp column will be given preference when drawing the samples.
d16 <- d %>%
dplyr::slice_sample(weight_by = lifeExp, replace = TRUE, n = 5)
d16
d17 <- d %>%
dplyr::group_by(country) %>%
dplyr::arrange(- year)%>%
dplyr::slice_head(n=1)
d17
d18 <- d %>%
dplyr::group_by(year, country) %>%
dplyr::slice_head(n=1)
d18
d19 <- d %>%
dplyr::group_by(country) %>%
dplyr::slice_max(pop,n=1)
d19
d20 <- d %>%
dplyr::group_by(country) %>%
dplyr::slice_min(pop,n=1)
d20
d21 <- d %>%
dplyr::group_by(country) %>%
dplyr::slice(1)
d21
In the following example we have grouped the data by country and then asked to get the 10% of rows from the top ( for each country).
d22 <- d %>%
dplyr::group_by(country) %>%
dplyr::slice_head(prop = 0.1)
d22
We first created the mean of the whole data for the life expectancy colump ( lifeExp) Then asked the system to filter it by the mean. Then pick the first record only.
mean <- mean(d$lifeExp)
mean
## [1] 59.47444
d24 <- d %>%
dplyr::filter(lifeExp > mean)%>%
dplyr::slice(1)
d24
Watch our complete tutorial on all aspects of DPLYR.
https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR