Create New Variables in R with dplyr

Read More and Download

In data analysis, R stands out as a powerful tool. In this article, we will cover the basics of data manipulation visualization.

Understanding the Code

Let’s start by breaking down the provided R code, step by step, to ensure a clear understanding.

Before We start, Make sure you read the following:

Setting Up the Data Frame

Creating the Data Frame

We initiate a data frame with 10 rows and 4 columns, containing information about individuals.

##      name age gender score
## 1   Alice  25      F    85
## 2     Bob  32      M    76
## 3 Charlie  28      M    92
## 4   David  24      M    81
## 5     Eve  27      F    88

Exploring the Data

Utilizing the head function, we gain a quick overview of the first five rows of the data frame.

A data frame with 10 rows and 12 columns
name age gender score
Alice 25 F 85
Bob 32 M 76
Charlie 28 M 92
David 24 M 81
Eve 27 F 88
Frank 29 M 79
Grace 31 F 94
Harry 26 M 83
Ivy 30 F 90
Jack 33 M 86

People Also Read

Adding Variables

Introducing Grades

We introduce a new variable, ‘grade,’ categorizing individuals based on their scores.

##       name age gender score grade
## 1    Alice  25      F    85     B
## 2      Bob  32      M    76     C
## 3  Charlie  28      M    92     A
## 4    David  24      M    81     B
## 5      Eve  27      F    88     B
## 6    Frank  29      M    79     C
## 7    Grace  31      F    94     A
## 8    Harry  26      M    83     B
## 9      Ivy  30      F    90     A
## 10    Jack  33      M    86     B

Age Modification

The ‘age’ variable undergoes a transformation, incrementing each individual’s age by one.

##       name age gender score
## 1    Alice  26      F    85
## 2      Bob  33      M    76
## 3  Charlie  29      M    92
## 4    David  25      M    81
## 5      Eve  28      F    88
## 6    Frank  30      M    79
## 7    Grace  32      F    94
## 8    Harry  27      M    83
## 9      Ivy  31      F    90
## 10    Jack  34      M    86

Introducing Height and Weight

Two new variables, ‘height’ and ‘weight,’ are created with random values within specified ranges.

##       name age gender score   height   weight
## 1    Alice  25      F    85 159.0612 72.20515
## 2      Bob  32      M    76 174.5540 63.71620
## 3  Charlie  28      M    92 198.6179 75.73870
## 4    David  24      M    81 156.6031 73.91578
## 5      Eve  27      F    88 167.1612 97.90217
## 6    Frank  29      M    79 193.2816 54.55027
## 7    Grace  31      F    94 177.4369 98.49600
## 8    Harry  26      M    83 187.4201 97.41685
## 9      Ivy  30      F    90 191.7643 54.76353
## 10    Jack  33      M    86 178.5300 85.79550

Popular Posts

Variable Deletion

The ‘score’ variable is removed from the data frame.

##       name age gender
## 1    Alice  25      F
## 2      Bob  32      M
## 3  Charlie  28      M
## 4    David  24      M
## 5      Eve  27      F
## 6    Frank  29      M
## 7    Grace  31      F
## 8    Harry  26      M
## 9      Ivy  30      F
## 10    Jack  33      M

Rounding Numeric Variables

Numeric variables in the data frame are rounded using the mutate_all function.

##    age score
## 1   25    85
## 2   32    76
## 3   28    92
## 4   24    81
## 5   27    88
## 6   29    79
## 7   31    94
## 8   26    83
## 9   30    90
## 10  33    86

Text Transformation

The ‘gender’ variable is converted to uppercase, enhancing uniformity.

##       name age gender score
## 1    Alice  25      F    85
## 2      Bob  32      M    76
## 3  Charlie  28      M    92
## 4    David  24      M    81
## 5      Eve  27      F    88
## 6    Frank  29      M    79
## 7    Grace  31      F    94
## 8    Harry  26      M    83
## 9      Ivy  30      F    90
## 10    Jack  33      M    86

Ranking and Pass/Fail Classification

Ranking Individuals

We introduce a new variable, ‘rank,’ by grouping the data by gender and assigning ranks based on scores within each group.

## # A tibble: 10 × 5
##    name      age gender score  rank
##    <chr>   <dbl> <chr>  <dbl> <dbl>
##  1 Alice      25 F         85     4
##  2 Bob        32 M         76     6
##  3 Charlie    28 M         92     1
##  4 David      24 M         81     4
##  5 Eve        27 F         88     3
##  6 Frank      29 M         79     5
##  7 Grace      31 F         94     1
##  8 Harry      26 M         83     3
##  9 Ivy        30 F         90     2
## 10 Jack       33 M         86     2

Pass/Fail Classification

A ‘pass’ variable is introduced, categorizing individuals as ‘Yes’ or ‘No’ based on their scores.

##       name age gender score pass
## 1    Alice  25      F    85  Yes
## 2      Bob  32      M    76   No
## 3  Charlie  28      M    92  Yes
## 4    David  24      M    81  Yes
## 5      Eve  27      F    88  Yes
## 6    Frank  29      M    79   No
## 7    Grace  31      F    94  Yes
## 8    Harry  26      M    83  Yes
## 9      Ivy  30      F    90  Yes
## 10    Jack  33      M    86  Yes

Body Mass Index (BMI) Calculation

BMI Calculation

We generate dummy data for weight and height, creating a new data frame ‘df1.’ The BMI is calculated and added as a new variable, ‘bmi.’

## # A tibble: 100 × 3
##    weight height   bmi
##     <dbl>  <dbl> <dbl>
##  1   64.4   174.  21.3
##  2   89.4   163.  33.5
##  3   70.4   170.  24.5
##  4   94.2   188.  26.6
##  5   97.0   169.  33.8
##  6   52.3   186.  15.2
##  7   76.4   187.  21.9
##  8   94.6   174.  31.1
##  9   77.6   166.  28.0
## 10   72.8   156.  30.0
## # ℹ 90 more rows

Log Transformation

A new variable, ‘log_score,’ is introduced by taking the logarithm of the ‘score’ variable.

##       name age gender score log_score
## 1    Alice  25      F    85  4.442651
## 2      Bob  32      M    76  4.330733
## 3  Charlie  28      M    92  4.521789
## 4    David  24      M    81  4.394449
## 5      Eve  27      F    88  4.477337
## 6    Frank  29      M    79  4.369448
## 7    Grace  31      F    94  4.543295
## 8    Harry  26      M    83  4.418841
## 9      Ivy  30      F    90  4.499810
## 10    Jack  33      M    86  4.454347

Unique Identifier

An ‘id’ variable is created by combining the ‘name’ and ‘age’ variables.

##       name age gender score         id
## 1    Alice  25      F    85   Alice_25
## 2      Bob  32      M    76     Bob_32
## 3  Charlie  28      M    92 Charlie_28
## 4    David  24      M    81   David_24
## 5      Eve  27      F    88     Eve_27
## 6    Frank  29      M    79   Frank_29
## 7    Grace  31      F    94   Grace_31
## 8    Harry  26      M    83   Harry_26
## 9      Ivy  30      F    90     Ivy_30
## 10    Jack  33      M    86    Jack_33

Data Structure

We inspect the structure of the ‘df’ data frame using the glimpse function.

## Rows: 10
## Columns: 4
## $ name   <chr> "Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "H…
## $ age    <dbl> 25, 32, 28, 24, 27, 29, 31, 26, 30, 33
## $ gender <chr> "F", "M", "M", "M", "F", "M", "F", "M", "F", "M"
## $ score  <dbl> 85, 76, 92, 81, 88, 79, 94, 83, 90, 86