Why dplyr
Audience
DPLYR : slice
Create sample dataset
- Have a look at the sample dataset.
Get the top row only
Get all rows except the top row
Get the tenth row only
Get the first ten rows
you could also use the head() command
Get the top 10 rows
Get the bottom 10 rows
Get the rows from 100 to the end of the dataset
First 10 rows are removed and the remaining rows shown
Slice_min can be used by giving it a column name
Slice_max can be used by giving it a column name
Following example uses mtcars dataset
Extract random samples
Extract random samples - with replace = TRUE
Defining a column to give the weightage when drawing a random sample.
Using arrange to sort the data and then slicing it.
Using group by and then slicing it.
We want to get one row for each country where the population was maximum.
We want to get one row for each country where the population was minimum.
Using group by and then slicing it.
Using proportions to pick some percentage of the rows.
Select data above mean

This tuturial is the part of the dplyr training series. Here is the YouTube Video link https://youtu.be/AbSz6cXKqz8

Why dplyr

dplyr is a great tool to use in R.

The commands may look long and overwhelming to someone not using dplyr but that is not the case.

Once you learn the basics of it then it is very intuitive to use. Just like making a sentence once you have learnt the basic words of a language.

Audience

For beginners or experienced R users wanting to learn various commands of dplyr.

DPLYR : slice

We will be covering all practical aspects of dplyr::slice command in this. This tutorial is part of a series of tutorials on all practical aspects of dplyr All youtube videos are available in a single playlist on YouTube.

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR

Create sample dataset

We will be using the built in dataset called gapminder in this tutuorial

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.1.3

library(gapminder)
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.1.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Have a look at the sample dataset.

We need th install the gapminder package to make use of the sample dataset. We will add an additional data column called RowID, which will be useful in our tuturial later on.

d <- gapminder::gapminder

d <- d%>%
     dplyr::mutate(RowID = row_number(), .before = country)
d

Get the top row only

d1 <- d %>%
      dplyr::slice(1)

d1

Get all rows except the top row

d2 <- d %>%
  dplyr::slice(-1)

d2

Get the tenth row only

d3 <- d %>%
  dplyr::slice(10)

d3

Get the first ten rows

d4 <-  d %>%
  dplyr::slice(1:10)

d4

you could also use the head() command

d5 <-  d %>%
     head(10)

d5

Get the top 10 rows

d6 <-  d %>%
         dplyr::slice_head(n = 10)

d6

Get the bottom 10 rows

d7 <-  d %>%
  dplyr::slice_tail(n = 10)

d7

Get the rows from 100 to the end of the dataset

Starting from the row 100, we will get all rows in the dataset.

d8 <-  d %>%
       dplyr::slice(100:n())

d8

First 10 rows are removed and the remaining rows shown

d9 <-  d %>%
  dplyr::slice(-(1:10))

d9

Slice_min can be used by giving it a column name

In this example we want to get the rows with the 5 minimum values

d10 <- d %>%
        dplyr::slice_min(lifeExp, n = 5)

d10

Slice_max can be used by giving it a column name

In this example we want to get the rows with the 5 maximum values

d11 <- d %>%
  dplyr::slice_max(lifeExp, n = 5)

d11

Following example uses mtcars dataset

This example shows that though you said n= 1, the output contains more than 1 rows. The reason is that you said we want to get the rows with the min value of the cylinder. So if there is a tie then all rows will be shown.

mtcars

d12 <- mtcars %>% 
       dplyr::slice_min(cyl, n = 1)

d12

You can explicitly tell dplyr that with_ties = FALSE. Then it will only show one value which contains the min value for the cyl column.

d13 <- mtcars %>%
  dplyr::slice_min(cyl, n = 1, with_ties = FALSE)

d13

Extract random samples

You can use the slice_sample command to pick random samples. In this example we said n = 5, so we will get 5 random records from our dataset.

d14 <- d %>%
      dplyr::slice_sample(n = 5)

d14

Extract random samples - with replace = TRUE

So now there is chance that the same row would be drawn out more than 1 time as we said that we want the replace = TRUE.

d15 <- d %>%
  dplyr::slice_sample(n = 5, replace = TRUE)

d15

Defining a column to give the weightage when drawing a random sample.

We are telling dplyr that we want to draw 5 random rows from our data, Replace option is true ( that means the same row can be drawn more than one) Lastly we said that we want to use the lifeExp column as weight_by, so the higher values of the lifeExp column will be given preference when drawing the samples.

d16 <- d %>%
  dplyr::slice_sample(weight_by =  lifeExp, replace = TRUE, n = 5)

d16

Using arrange to sort the data and then slicing it.

d17 <- d %>%
  dplyr::group_by(country) %>%
  dplyr::arrange(- year)%>%
    dplyr::slice_head(n=1)

d17

Using group by and then slicing it.

d18 <- d %>%
  dplyr::group_by(year, country) %>%
  dplyr::slice_head(n=1)

d18

We want to get one row for each country where the population was maximum.

d19 <- d %>%
  dplyr::group_by(country) %>%
  dplyr::slice_max(pop,n=1)

d19

We want to get one row for each country where the population was minimum.

d20 <- d %>%
  dplyr::group_by(country) %>%
  dplyr::slice_min(pop,n=1)

d20

Using group by and then slicing it.

d21 <- d %>%
  dplyr::group_by(country) %>%
  dplyr::slice(1)

d21

Using proportions to pick some percentage of the rows.

In the following example we have grouped the data by country and then asked to get the 10% of rows from the top ( for each country).

d22 <- d %>%
  dplyr::group_by(country) %>%
  dplyr::slice_head(prop = 0.1)

d22

Select data above mean

We first created the mean of the whole data for the life expectancy colump ( lifeExp) Then asked the system to filter it by the mean. Then pick the first record only.

mean <- mean(d$lifeExp)
mean

## [1] 59.47444

d24 <- d %>%
  dplyr::filter(lifeExp >  mean)%>%
  dplyr::slice(1)

d24

Watch our complete tutorial on all aspects of DPLYR.

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR

DPLYR tuturial for SLICE command

techanswers88

https://www.youtube.com/channel/UCIHPu9hJFY4Rb86r5qhZk1g

Why dplyr

Audience

DPLYR : slice

Create sample dataset

Have a look at the sample dataset.

Get the top row only

Get all rows except the top row

Get the tenth row only

Get the first ten rows

you could also use the head() command

Get the top 10 rows

Get the bottom 10 rows

Get the rows from 100 to the end of the dataset

First 10 rows are removed and the remaining rows shown

Slice_min can be used by giving it a column name

Slice_max can be used by giving it a column name

Following example uses mtcars dataset

Extract random samples

Extract random samples - with replace = TRUE

Defining a column to give the weightage when drawing a random sample.

Using arrange to sort the data and then slicing it.

Using group by and then slicing it.

We want to get one row for each country where the population was maximum.

We want to get one row for each country where the population was minimum.

Using group by and then slicing it.

Using proportions to pick some percentage of the rows.

Select data above mean