Read the Complete article:
Explore our Data Analysis project and solved examples: RStudioDataLab
Imagine a world where data whispers its secrets, not in cryptic codes but in clear, concise narratives. Every variable is a crafted story, revealing the hidden pulse of your research. In this world, the power of analysis lies not in brute force calculations but in the delicate art of variable creation. So, tell me, data explorer: are you ready to become a storyteller, wielding the power of R to transform numbers into tales that captivate and inform?
The tidyverse
and data.table
packages are
loaded. These packages provide functions for data manipulation and
analysis.
The seed for random number generation is set to ensure reproducibility of results.
A data frame df
is created with seven variables:
name
, age
, gender
,
grade
, score
, height
, and
weight
. Each variable is populated with randomly generated
data.
## name age gender grade score height weight
## 1 Charlie 18 F 8 95 161 45
## 2 Charlie 12 F 6 79 151 55
## 3 Jack 13 F 7 84 159 63
## 4 Bob 10 M 7 63 156 71
## 5 Frank 16 F 6 78 174 60
## [1] 20 7
## Rows: 20
## Columns: 7
## $ name <chr> "Charlie", "Charlie", "Jack", "Bob", "Frank", "Eve", "David", "…
## $ age <int> 18, 12, 13, 10, 16, 14, 16, 18, 18, 16, 14, 16, 14, 15, 18, 11,…
## $ gender <chr> "F", "F", "F", "M", "F", "F", "F", "F", "M", "M", "F", "M", "M"…
## $ grade <int> 8, 6, 7, 7, 6, 7, 6, 6, 7, 8, 8, 6, 7, 6, 7, 6, 8, 8, 7, 8
## $ score <int> 95, 79, 84, 63, 78, 81, 56, 52, 72, 64, 70, 86, 57, 100, 59, 99…
## $ height <int> 161, 151, 159, 156, 174, 179, 169, 154, 163, 162, 146, 168, 154…
## $ weight <int> 45, 55, 63, 71, 60, 50, 75, 58, 64, 78, 65, 48, 46, 73, 52, 58,…
## name age gender grade
## Length:20 Min. :10.00 Length:20 Min. :6.00
## Class :character 1st Qu.:12.75 Class :character 1st Qu.:6.00
## Mode :character Median :14.50 Mode :character Median :7.00
## Mean :14.55 Mean :6.95
## 3rd Qu.:16.25 3rd Qu.:8.00
## Max. :18.00 Max. :8.00
## score height weight
## Min. : 52.00 Min. :144.0 Min. :40.00
## 1st Qu.: 62.00 1st Qu.:154.0 1st Qu.:49.50
## Median : 78.50 Median :162.0 Median :59.00
## Mean : 76.05 Mean :161.3 Mean :60.05
## 3rd Qu.: 87.25 3rd Qu.:168.2 3rd Qu.:71.50
## Max. :100.00 Max. :179.0 Max. :79.00
The head
, dim
, glimpse
, and
summary
functions are used to view the data, its
dimensions, a concise summary, and descriptive statistics,
respectively.
## [1] 17.36044 24.12175 24.91990 29.17488 19.81768 15.60501 26.25958 24.45606
## [9] 24.08822 29.72108 30.49353 17.00680 19.39619 27.81588 19.10009 18.51320
## [17] 26.67487 20.17264 14.51589 38.09799
A new variable bmi
is created using the
assign
function. It represents the Body Mass Index (BMI)
calculated as weight (kg) divided by height (m) squared. The
bmi
variable is then printed.
## [1] 17.36044 24.12175 24.91990 29.17488 19.81768 15.60501 26.25958 24.45606
## [9] 24.08822 29.72108 30.49353 17.00680 19.39619 27.81588 19.10009 18.51320
## [17] 26.67487 20.17264 14.51589 38.09799
The attach
function is used to add the data frame
df
to the R search path. This allows for the direct use of
its variables. The bmi
variable is recalculated, and the
data frame df
is detached from the search path. The
bmi
variable is then printed.
within
Function## name age gender grade score height weight bmi
## 1 Charlie 18 F 8 95 161 45 17.36044
## 2 Charlie 12 F 6 79 151 55 24.12175
## 3 Jack 13 F 7 84 159 63 24.91990
## 4 Bob 10 M 7 63 156 71 29.17488
## 5 Frank 16 F 6 78 174 60 19.81768
The within
function creates a new bmi
variable within the data frame df
. The top five rows of the
updated data frame are then printed.
People also read:
transform
Function## name age gender grade score height weight bmi
## 1 Charlie 18 F 8 95 161 45 17.36044
## 2 Charlie 12 F 6 79 151 55 24.12175
## 3 Jack 13 F 7 84 159 63 24.91990
## 4 Bob 10 M 7 63 156 71 29.17488
## 5 Frank 16 F 6 78 174 60 19.81768
The transform
function is used to create a new
bmi
variable within the data frame df
. The top
five rows of the updated data frame are then printed.
ifelse
Function## name age gender grade score height weight bmi pass
## 1 Charlie 18 F 8 95 161 45 17.36044 Pass
## 2 Charlie 12 F 6 79 151 55 24.12175 Pass
## 3 Jack 13 F 7 84 159 63 24.91990 Pass
## 4 Bob 10 M 7 63 156 71 29.17488 Pass
## 5 Frank 16 F 6 78 174 60 19.81768 Pass
The ifelse
function is used to create a new
pass
variable within the data frame df
. This
variable indicates whether the score is a pass (>= 60) or fail (<
60). The top five rows of the updated data frame are then printed.
mutate
Function## name age gender grade score height weight bmi pass
## 1 Charlie 18 F 8 95 161 45 17.36044 Pass
## 2 Charlie 12 F 6 79 151 55 24.12175 Pass
## 3 Jack 13 F 7 84 159 63 24.91990 Pass
## 4 Bob 10 M 7 63 156 71 29.17488 Pass
## 5 Frank 16 F 6 78 174 60 19.81768 Pass
The mutate
function from the dplyr
package
(part of tidyverse
) is used to create a new
bmi
variable within the data frame df
. The top
five rows of the updated data frame are then printed.
transmute
Function## name age gender grade score bmi
## 1 Charlie 18 F 8 95 17.36044
## 2 Charlie 12 F 6 79 24.12175
## 3 Jack 13 F 7 84 24.91990
## 4 Bob 10 M 7 63 29.17488
## 5 Frank 16 F 6 78 19.81768
The transmute
function from the dplyr
package is used to create a new data frame with selected variables and a
new bmi
variable. The top five rows of the updated data
frame are then printed.
This code provides a comprehensive example of data manipulation and
analysis in RStudio, demonstrating various ways to create and manipulate
variables within a data frame. It also showcases the use of several
important functions from the tidyverse
and
data.table
packages.