Read the Complete article:
Explore our Data Analysis project and solved examples: RStudioDataLab
Imagine a world where data whispers its secrets, not in cryptic codes but in clear, concise narratives. Every variable is a crafted story, revealing the hidden pulse of your research. In this world, the power of analysis lies not in brute force calculations but in the delicate art of variable creation. So, tell me, data explorer: are you ready to become a storyteller, wielding the power of R to transform numbers into tales that captivate and inform?

The tidyverse and data.table packages are
loaded. These packages provide functions for data manipulation and
analysis.
The seed for random number generation is set to ensure reproducibility of results.
A data frame df is created with seven variables:
name, age, gender,
grade, score, height, and
weight. Each variable is populated with randomly generated
data.
## name age gender grade score height weight
## 1 Charlie 18 F 8 95 161 45
## 2 Charlie 12 F 6 79 151 55
## 3 Jack 13 F 7 84 159 63
## 4 Bob 10 M 7 63 156 71
## 5 Frank 16 F 6 78 174 60
## [1] 20 7
## Rows: 20
## Columns: 7
## $ name <chr> "Charlie", "Charlie", "Jack", "Bob", "Frank", "Eve", "David", "…
## $ age <int> 18, 12, 13, 10, 16, 14, 16, 18, 18, 16, 14, 16, 14, 15, 18, 11,…
## $ gender <chr> "F", "F", "F", "M", "F", "F", "F", "F", "M", "M", "F", "M", "M"…
## $ grade <int> 8, 6, 7, 7, 6, 7, 6, 6, 7, 8, 8, 6, 7, 6, 7, 6, 8, 8, 7, 8
## $ score <int> 95, 79, 84, 63, 78, 81, 56, 52, 72, 64, 70, 86, 57, 100, 59, 99…
## $ height <int> 161, 151, 159, 156, 174, 179, 169, 154, 163, 162, 146, 168, 154…
## $ weight <int> 45, 55, 63, 71, 60, 50, 75, 58, 64, 78, 65, 48, 46, 73, 52, 58,…
## name age gender grade
## Length:20 Min. :10.00 Length:20 Min. :6.00
## Class :character 1st Qu.:12.75 Class :character 1st Qu.:6.00
## Mode :character Median :14.50 Mode :character Median :7.00
## Mean :14.55 Mean :6.95
## 3rd Qu.:16.25 3rd Qu.:8.00
## Max. :18.00 Max. :8.00
## score height weight
## Min. : 52.00 Min. :144.0 Min. :40.00
## 1st Qu.: 62.00 1st Qu.:154.0 1st Qu.:49.50
## Median : 78.50 Median :162.0 Median :59.00
## Mean : 76.05 Mean :161.3 Mean :60.05
## 3rd Qu.: 87.25 3rd Qu.:168.2 3rd Qu.:71.50
## Max. :100.00 Max. :179.0 Max. :79.00
The head, dim, glimpse, and
summary functions are used to view the data, its
dimensions, a concise summary, and descriptive statistics,
respectively.
## [1] 17.36044 24.12175 24.91990 29.17488 19.81768 15.60501 26.25958 24.45606
## [9] 24.08822 29.72108 30.49353 17.00680 19.39619 27.81588 19.10009 18.51320
## [17] 26.67487 20.17264 14.51589 38.09799
A new variable bmi is created using the
assign function. It represents the Body Mass Index (BMI)
calculated as weight (kg) divided by height (m) squared. The
bmi variable is then printed.
## [1] 17.36044 24.12175 24.91990 29.17488 19.81768 15.60501 26.25958 24.45606
## [9] 24.08822 29.72108 30.49353 17.00680 19.39619 27.81588 19.10009 18.51320
## [17] 26.67487 20.17264 14.51589 38.09799
The attach function is used to add the data frame
df to the R search path. This allows for the direct use of
its variables. The bmi variable is recalculated, and the
data frame df is detached from the search path. The
bmi variable is then printed.
within Function## name age gender grade score height weight bmi
## 1 Charlie 18 F 8 95 161 45 17.36044
## 2 Charlie 12 F 6 79 151 55 24.12175
## 3 Jack 13 F 7 84 159 63 24.91990
## 4 Bob 10 M 7 63 156 71 29.17488
## 5 Frank 16 F 6 78 174 60 19.81768
The within function creates a new bmi
variable within the data frame df. The top five rows of the
updated data frame are then printed.
People also read:
transform Function## name age gender grade score height weight bmi
## 1 Charlie 18 F 8 95 161 45 17.36044
## 2 Charlie 12 F 6 79 151 55 24.12175
## 3 Jack 13 F 7 84 159 63 24.91990
## 4 Bob 10 M 7 63 156 71 29.17488
## 5 Frank 16 F 6 78 174 60 19.81768
The transform function is used to create a new
bmi variable within the data frame df. The top
five rows of the updated data frame are then printed.
ifelse Function## name age gender grade score height weight bmi pass
## 1 Charlie 18 F 8 95 161 45 17.36044 Pass
## 2 Charlie 12 F 6 79 151 55 24.12175 Pass
## 3 Jack 13 F 7 84 159 63 24.91990 Pass
## 4 Bob 10 M 7 63 156 71 29.17488 Pass
## 5 Frank 16 F 6 78 174 60 19.81768 Pass
The ifelse function is used to create a new
pass variable within the data frame df. This
variable indicates whether the score is a pass (>= 60) or fail (<
60). The top five rows of the updated data frame are then printed.
mutate Function## name age gender grade score height weight bmi pass
## 1 Charlie 18 F 8 95 161 45 17.36044 Pass
## 2 Charlie 12 F 6 79 151 55 24.12175 Pass
## 3 Jack 13 F 7 84 159 63 24.91990 Pass
## 4 Bob 10 M 7 63 156 71 29.17488 Pass
## 5 Frank 16 F 6 78 174 60 19.81768 Pass
The mutate function from the dplyr package
(part of tidyverse) is used to create a new
bmi variable within the data frame df. The top
five rows of the updated data frame are then printed.
transmute Function## name age gender grade score bmi
## 1 Charlie 18 F 8 95 17.36044
## 2 Charlie 12 F 6 79 24.12175
## 3 Jack 13 F 7 84 24.91990
## 4 Bob 10 M 7 63 29.17488
## 5 Frank 16 F 6 78 19.81768
The transmute function from the dplyr
package is used to create a new data frame with selected variables and a
new bmi variable. The top five rows of the updated data
frame are then printed.
This code provides a comprehensive example of data manipulation and
analysis in RStudio, demonstrating various ways to create and manipulate
variables within a data frame. It also showcases the use of several
important functions from the tidyverse and
data.table packages.