Create New Variables in R with dplyr

Understanding the Code

Let’s start by breaking down the provided R code, step by step, to ensure a clear understanding.

Before We start, Make sure you read the following:

Setting Up the Data Frame

Creating the Data Frame

We initiate a data frame with 10 rows and 4 columns, containing information about individuals.

##      name age gender score
## 1   Alice  25      F    85
## 2     Bob  32      M    76
## 3 Charlie  28      M    92
## 4   David  24      M    81
## 5     Eve  27      F    88

Exploring the Data

Utilizing the head function, we gain a quick overview of the first five rows of the data frame.

A data frame with 10 rows and 12 columns
name	age	gender	score
Alice	25	F	85
Bob	32	M	76
Charlie	28	M	92
David	24	M	81
Eve	27	F	88
Frank	29	M	79
Grace	31	F	94
Harry	26	M	83
Ivy	30	F	90
Jack	33	M	86

People Also Read

Exploratory Data Analysis for International Journals -PhD Insight
[Secrets of R Contingency Tables Revealed: A PhD’s Experience](https://www.data03.online/2023/11/secrets-of-r-contingency-tables.html ” Secrets of R Contingency Tables Revealed: A PhD’s Experience”)
Case Study: Unraveling Russia’s War Efforts in Ukraine

Adding Variables

Introducing Grades

We introduce a new variable, ‘grade,’ categorizing individuals based on their scores.

##       name age gender score grade
## 1    Alice  25      F    85     B
## 2      Bob  32      M    76     C
## 3  Charlie  28      M    92     A
## 4    David  24      M    81     B
## 5      Eve  27      F    88     B
## 6    Frank  29      M    79     C
## 7    Grace  31      F    94     A
## 8    Harry  26      M    83     B
## 9      Ivy  30      F    90     A
## 10    Jack  33      M    86     B

Age Modification

The ‘age’ variable undergoes a transformation, incrementing each individual’s age by one.

##       name age gender score
## 1    Alice  26      F    85
## 2      Bob  33      M    76
## 3  Charlie  29      M    92
## 4    David  25      M    81
## 5      Eve  28      F    88
## 6    Frank  30      M    79
## 7    Grace  32      F    94
## 8    Harry  27      M    83
## 9      Ivy  31      F    90
## 10    Jack  34      M    86

Introducing Height and Weight

Two new variables, ‘height’ and ‘weight,’ are created with random values within specified ranges.

##       name age gender score   height   weight
## 1    Alice  25      F    85 159.0612 72.20515
## 2      Bob  32      M    76 174.5540 63.71620
## 3  Charlie  28      M    92 198.6179 75.73870
## 4    David  24      M    81 156.6031 73.91578
## 5      Eve  27      F    88 167.1612 97.90217
## 6    Frank  29      M    79 193.2816 54.55027
## 7    Grace  31      F    94 177.4369 98.49600
## 8    Harry  26      M    83 187.4201 97.41685
## 9      Ivy  30      F    90 191.7643 54.76353
## 10    Jack  33      M    86 178.5300 85.79550

Popular Posts

[How I Master R’s Techniques to Generate, Aggregate, Count, Attach, Change, Format, and Combine Data Sets](https://www.data03.online/2023/08/generate-aggregate-count-attach-change.html ” How I Master R’s Techniques to Generate, Aggregate, Count, Attach, Change, Format, and Combine Data Sets”)
How to Use dplyr in R: A Tutorial on Data Manipulation with Examples
How to Analyze Data in R: A Beginner’s Guide
Data Manipulation: Guide to the dplyr Cheat Sheet
Data Wrangling with dplyr [Update:2023]

Variable Deletion

The ‘score’ variable is removed from the data frame.

##       name age gender
## 1    Alice  25      F
## 2      Bob  32      M
## 3  Charlie  28      M
## 4    David  24      M
## 5      Eve  27      F
## 6    Frank  29      M
## 7    Grace  31      F
## 8    Harry  26      M
## 9      Ivy  30      F
## 10    Jack  33      M

Rounding Numeric Variables

Numeric variables in the data frame are rounded using the mutate_all function.

##    age score
## 1   25    85
## 2   32    76
## 3   28    92
## 4   24    81
## 5   27    88
## 6   29    79
## 7   31    94
## 8   26    83
## 9   30    90
## 10  33    86

Text Transformation

The ‘gender’ variable is converted to uppercase, enhancing uniformity.

##       name age gender score
## 1    Alice  25      F    85
## 2      Bob  32      M    76
## 3  Charlie  28      M    92
## 4    David  24      M    81
## 5      Eve  27      F    88
## 6    Frank  29      M    79
## 7    Grace  31      F    94
## 8    Harry  26      M    83
## 9      Ivy  30      F    90
## 10    Jack  33      M    86

Ranking and Pass/Fail Classification

Ranking Individuals

We introduce a new variable, ‘rank,’ by grouping the data by gender and assigning ranks based on scores within each group.

## # A tibble: 10 × 5
##    name      age gender score  rank
##    <chr>   <dbl> <chr>  <dbl> <dbl>
##  1 Alice      25 F         85     4
##  2 Bob        32 M         76     6
##  3 Charlie    28 M         92     1
##  4 David      24 M         81     4
##  5 Eve        27 F         88     3
##  6 Frank      29 M         79     5
##  7 Grace      31 F         94     1
##  8 Harry      26 M         83     3
##  9 Ivy        30 F         90     2
## 10 Jack       33 M         86     2

Pass/Fail Classification

A ‘pass’ variable is introduced, categorizing individuals as ‘Yes’ or ‘No’ based on their scores.

##       name age gender score pass
## 1    Alice  25      F    85  Yes
## 2      Bob  32      M    76   No
## 3  Charlie  28      M    92  Yes
## 4    David  24      M    81  Yes
## 5      Eve  27      F    88  Yes
## 6    Frank  29      M    79   No
## 7    Grace  31      F    94  Yes
## 8    Harry  26      M    83  Yes
## 9      Ivy  30      F    90  Yes
## 10    Jack  33      M    86  Yes

Body Mass Index (BMI) Calculation

BMI Calculation

We generate dummy data for weight and height, creating a new data frame ‘df1.’ The BMI is calculated and added as a new variable, ‘bmi.’

## # A tibble: 100 × 3
##    weight height   bmi
##     <dbl>  <dbl> <dbl>
##  1   64.4   174.  21.3
##  2   89.4   163.  33.5
##  3   70.4   170.  24.5
##  4   94.2   188.  26.6
##  5   97.0   169.  33.8
##  6   52.3   186.  15.2
##  7   76.4   187.  21.9
##  8   94.6   174.  31.1
##  9   77.6   166.  28.0
## 10   72.8   156.  30.0
## # ℹ 90 more rows

Log Transformation

A new variable, ‘log_score,’ is introduced by taking the logarithm of the ‘score’ variable.

##       name age gender score log_score
## 1    Alice  25      F    85  4.442651
## 2      Bob  32      M    76  4.330733
## 3  Charlie  28      M    92  4.521789
## 4    David  24      M    81  4.394449
## 5      Eve  27      F    88  4.477337
## 6    Frank  29      M    79  4.369448
## 7    Grace  31      F    94  4.543295
## 8    Harry  26      M    83  4.418841
## 9      Ivy  30      F    90  4.499810
## 10    Jack  33      M    86  4.454347

Unique Identifier

An ‘id’ variable is created by combining the ‘name’ and ‘age’ variables.

##       name age gender score         id
## 1    Alice  25      F    85   Alice_25
## 2      Bob  32      M    76     Bob_32
## 3  Charlie  28      M    92 Charlie_28
## 4    David  24      M    81   David_24
## 5      Eve  27      F    88     Eve_27
## 6    Frank  29      M    79   Frank_29
## 7    Grace  31      F    94   Grace_31
## 8    Harry  26      M    83   Harry_26
## 9      Ivy  30      F    90     Ivy_30
## 10    Jack  33      M    86    Jack_33

Data Structure

We inspect the structure of the ‘df’ data frame using the glimpse function.

## Rows: 10
## Columns: 4
## $ name   <chr> "Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "H…
## $ age    <dbl> 25, 32, 28, 24, 27, 29, 31, 26, 30, 33
## $ gender <chr> "F", "M", "M", "M", "F", "M", "F", "M", "F", "M"
## $ score  <dbl> 85, 76, 92, 81, 88, 79, 94, 83, 90, 86

Create New Variables in R with dplyr

data03.online

2023-12-20

Create New Variables in R with dplyr

Read More and Download

Introduction

Understanding the Code

Setting Up the Data Frame

Creating the Data Frame

Exploring the Data

Adding Variables

Introducing Grades

Age Modification

Introducing Height and Weight

Variable Deletion

Rounding Numeric Variables

Text Transformation

Ranking and Pass/Fail Classification

Ranking Individuals

Pass/Fail Classification

Body Mass Index (BMI) Calculation

BMI Calculation

Log Transformation

Unique Identifier

Data Structure