Read the Complete article:

Imagine a world where data whispers its secrets, not in cryptic codes but in clear, concise narratives. Every variable is a crafted story, revealing the hidden pulse of your research. In this world, the power of analysis lies not in brute force calculations but in the delicate art of variable creation. So, tell me, data explorer: are you ready to become a storyteller, wielding the power of R to transform numbers into tales that captivate and inform?

Load the Packages

The tidyverse and data.table packages are loaded. These packages provide functions for data manipulation and analysis.

Set the Seed

The seed for random number generation is set to ensure reproducibility of results.

Generate the Sample Data Set

A data frame df is created with seven variables: name, age, gender, grade, score, height, and weight. Each variable is populated with randomly generated data.

View the Data

##      name age gender grade score height weight
## 1 Charlie  18      F     8    95    161     45
## 2 Charlie  12      F     6    79    151     55
## 3    Jack  13      F     7    84    159     63
## 4     Bob  10      M     7    63    156     71
## 5   Frank  16      F     6    78    174     60
## [1] 20  7
## Rows: 20
## Columns: 7
## $ name   <chr> "Charlie", "Charlie", "Jack", "Bob", "Frank", "Eve", "David", "…
## $ age    <int> 18, 12, 13, 10, 16, 14, 16, 18, 18, 16, 14, 16, 14, 15, 18, 11,…
## $ gender <chr> "F", "F", "F", "M", "F", "F", "F", "F", "M", "M", "F", "M", "M"…
## $ grade  <int> 8, 6, 7, 7, 6, 7, 6, 6, 7, 8, 8, 6, 7, 6, 7, 6, 8, 8, 7, 8
## $ score  <int> 95, 79, 84, 63, 78, 81, 56, 52, 72, 64, 70, 86, 57, 100, 59, 99…
## $ height <int> 161, 151, 159, 156, 174, 179, 169, 154, 163, 162, 146, 168, 154…
## $ weight <int> 45, 55, 63, 71, 60, 50, 75, 58, 64, 78, 65, 48, 46, 73, 52, 58,…
##      name                age           gender              grade     
##  Length:20          Min.   :10.00   Length:20          Min.   :6.00  
##  Class :character   1st Qu.:12.75   Class :character   1st Qu.:6.00  
##  Mode  :character   Median :14.50   Mode  :character   Median :7.00  
##                     Mean   :14.55                      Mean   :6.95  
##                     3rd Qu.:16.25                      3rd Qu.:8.00  
##                     Max.   :18.00                      Max.   :8.00  
##      score            height          weight     
##  Min.   : 52.00   Min.   :144.0   Min.   :40.00  
##  1st Qu.: 62.00   1st Qu.:154.0   1st Qu.:49.50  
##  Median : 78.50   Median :162.0   Median :59.00  
##  Mean   : 76.05   Mean   :161.3   Mean   :60.05  
##  3rd Qu.: 87.25   3rd Qu.:168.2   3rd Qu.:71.50  
##  Max.   :100.00   Max.   :179.0   Max.   :79.00

The head, dim, glimpse, and summary functions are used to view the data, its dimensions, a concise summary, and descriptive statistics, respectively.

Create a New Variable

##  [1] 17.36044 24.12175 24.91990 29.17488 19.81768 15.60501 26.25958 24.45606
##  [9] 24.08822 29.72108 30.49353 17.00680 19.39619 27.81588 19.10009 18.51320
## [17] 26.67487 20.17264 14.51589 38.09799

A new variable bmi is created using the assign function. It represents the Body Mass Index (BMI) calculated as weight (kg) divided by height (m) squared. The bmi variable is then printed.

Attach the Data Frame

##  [1] 17.36044 24.12175 24.91990 29.17488 19.81768 15.60501 26.25958 24.45606
##  [9] 24.08822 29.72108 30.49353 17.00680 19.39619 27.81588 19.10009 18.51320
## [17] 26.67487 20.17264 14.51589 38.09799

The attach function is used to add the data frame df to the R search path. This allows for the direct use of its variables. The bmi variable is recalculated, and the data frame df is detached from the search path. The bmi variable is then printed.

Create a New Variable Using the within Function

##      name age gender grade score height weight      bmi
## 1 Charlie  18      F     8    95    161     45 17.36044
## 2 Charlie  12      F     6    79    151     55 24.12175
## 3    Jack  13      F     7    84    159     63 24.91990
## 4     Bob  10      M     7    63    156     71 29.17488
## 5   Frank  16      F     6    78    174     60 19.81768

The within function creates a new bmi variable within the data frame df. The top five rows of the updated data frame are then printed.

People also read:

Create a New Variable Using the transform Function

##      name age gender grade score height weight      bmi
## 1 Charlie  18      F     8    95    161     45 17.36044
## 2 Charlie  12      F     6    79    151     55 24.12175
## 3    Jack  13      F     7    84    159     63 24.91990
## 4     Bob  10      M     7    63    156     71 29.17488
## 5   Frank  16      F     6    78    174     60 19.81768

The transform function is used to create a new bmi variable within the data frame df. The top five rows of the updated data frame are then printed.

Create a New Variable Using the ifelse Function

##      name age gender grade score height weight      bmi pass
## 1 Charlie  18      F     8    95    161     45 17.36044 Pass
## 2 Charlie  12      F     6    79    151     55 24.12175 Pass
## 3    Jack  13      F     7    84    159     63 24.91990 Pass
## 4     Bob  10      M     7    63    156     71 29.17488 Pass
## 5   Frank  16      F     6    78    174     60 19.81768 Pass

The ifelse function is used to create a new pass variable within the data frame df. This variable indicates whether the score is a pass (>= 60) or fail (< 60). The top five rows of the updated data frame are then printed.

Create a New Variable Using the mutate Function

##      name age gender grade score height weight      bmi pass
## 1 Charlie  18      F     8    95    161     45 17.36044 Pass
## 2 Charlie  12      F     6    79    151     55 24.12175 Pass
## 3    Jack  13      F     7    84    159     63 24.91990 Pass
## 4     Bob  10      M     7    63    156     71 29.17488 Pass
## 5   Frank  16      F     6    78    174     60 19.81768 Pass

The mutate function from the dplyr package (part of tidyverse) is used to create a new bmi variable within the data frame df. The top five rows of the updated data frame are then printed.

Create a New Variable Using the transmute Function

##      name age gender grade score      bmi
## 1 Charlie  18      F     8    95 17.36044
## 2 Charlie  12      F     6    79 24.12175
## 3    Jack  13      F     7    84 24.91990
## 4     Bob  10      M     7    63 29.17488
## 5   Frank  16      F     6    78 19.81768

The transmute function from the dplyr package is used to create a new data frame with selected variables and a new bmi variable. The top five rows of the updated data frame are then printed.

This code provides a comprehensive example of data manipulation and analysis in RStudio, demonstrating various ways to create and manipulate variables within a data frame. It also showcases the use of several important functions from the tidyverse and data.table packages.