Join our Community and Stay Ahead of Data Analysis Community
Seeking Professional Coding Assistance? Elevate Your Projects with Our Expertise!
In data analysis, R stands out as a powerful tool. In this article, we will cover the basics of data manipulation visualization.
Let’s start by breaking down the provided R code, step by step, to ensure a clear understanding.
Before We start, Make sure you read the following:
We initiate a data frame with 10 rows and 4 columns, containing information about individuals.
## name age gender score
## 1 Alice 25 F 85
## 2 Bob 32 M 76
## 3 Charlie 28 M 92
## 4 David 24 M 81
## 5 Eve 27 F 88
Utilizing the head
function, we gain a quick overview of
the first five rows of the data frame.
name | age | gender | score |
---|---|---|---|
Alice | 25 | F | 85 |
Bob | 32 | M | 76 |
Charlie | 28 | M | 92 |
David | 24 | M | 81 |
Eve | 27 | F | 88 |
Frank | 29 | M | 79 |
Grace | 31 | F | 94 |
Harry | 26 | M | 83 |
Ivy | 30 | F | 90 |
Jack | 33 | M | 86 |
People Also Read
Exploratory Data Analysis for International Journals -PhD Insight
[Secrets of R Contingency Tables Revealed: A PhD’s Experience](https://www.data03.online/2023/11/secrets-of-r-contingency-tables.html ” Secrets of R Contingency Tables Revealed: A PhD’s Experience”)
We introduce a new variable, ‘grade,’ categorizing individuals based on their scores.
## name age gender score grade
## 1 Alice 25 F 85 B
## 2 Bob 32 M 76 C
## 3 Charlie 28 M 92 A
## 4 David 24 M 81 B
## 5 Eve 27 F 88 B
## 6 Frank 29 M 79 C
## 7 Grace 31 F 94 A
## 8 Harry 26 M 83 B
## 9 Ivy 30 F 90 A
## 10 Jack 33 M 86 B
The ‘age’ variable undergoes a transformation, incrementing each individual’s age by one.
## name age gender score
## 1 Alice 26 F 85
## 2 Bob 33 M 76
## 3 Charlie 29 M 92
## 4 David 25 M 81
## 5 Eve 28 F 88
## 6 Frank 30 M 79
## 7 Grace 32 F 94
## 8 Harry 27 M 83
## 9 Ivy 31 F 90
## 10 Jack 34 M 86
Two new variables, ‘height’ and ‘weight,’ are created with random values within specified ranges.
## name age gender score height weight
## 1 Alice 25 F 85 159.0612 72.20515
## 2 Bob 32 M 76 174.5540 63.71620
## 3 Charlie 28 M 92 198.6179 75.73870
## 4 David 24 M 81 156.6031 73.91578
## 5 Eve 27 F 88 167.1612 97.90217
## 6 Frank 29 M 79 193.2816 54.55027
## 7 Grace 31 F 94 177.4369 98.49600
## 8 Harry 26 M 83 187.4201 97.41685
## 9 Ivy 30 F 90 191.7643 54.76353
## 10 Jack 33 M 86 178.5300 85.79550
Popular Posts
[How I Master R’s Techniques to Generate, Aggregate, Count, Attach, Change, Format, and Combine Data Sets](https://www.data03.online/2023/08/generate-aggregate-count-attach-change.html ” How I Master R’s Techniques to Generate, Aggregate, Count, Attach, Change, Format, and Combine Data Sets”)
How to Use dplyr in R: A Tutorial on Data Manipulation with Examples
The ‘score’ variable is removed from the data frame.
## name age gender
## 1 Alice 25 F
## 2 Bob 32 M
## 3 Charlie 28 M
## 4 David 24 M
## 5 Eve 27 F
## 6 Frank 29 M
## 7 Grace 31 F
## 8 Harry 26 M
## 9 Ivy 30 F
## 10 Jack 33 M
Numeric variables in the data frame are rounded using the
mutate_all
function.
## age score
## 1 25 85
## 2 32 76
## 3 28 92
## 4 24 81
## 5 27 88
## 6 29 79
## 7 31 94
## 8 26 83
## 9 30 90
## 10 33 86
The ‘gender’ variable is converted to uppercase, enhancing uniformity.
## name age gender score
## 1 Alice 25 F 85
## 2 Bob 32 M 76
## 3 Charlie 28 M 92
## 4 David 24 M 81
## 5 Eve 27 F 88
## 6 Frank 29 M 79
## 7 Grace 31 F 94
## 8 Harry 26 M 83
## 9 Ivy 30 F 90
## 10 Jack 33 M 86
We introduce a new variable, ‘rank,’ by grouping the data by gender and assigning ranks based on scores within each group.
## # A tibble: 10 × 5
## name age gender score rank
## <chr> <dbl> <chr> <dbl> <dbl>
## 1 Alice 25 F 85 4
## 2 Bob 32 M 76 6
## 3 Charlie 28 M 92 1
## 4 David 24 M 81 4
## 5 Eve 27 F 88 3
## 6 Frank 29 M 79 5
## 7 Grace 31 F 94 1
## 8 Harry 26 M 83 3
## 9 Ivy 30 F 90 2
## 10 Jack 33 M 86 2
A ‘pass’ variable is introduced, categorizing individuals as ‘Yes’ or ‘No’ based on their scores.
## name age gender score pass
## 1 Alice 25 F 85 Yes
## 2 Bob 32 M 76 No
## 3 Charlie 28 M 92 Yes
## 4 David 24 M 81 Yes
## 5 Eve 27 F 88 Yes
## 6 Frank 29 M 79 No
## 7 Grace 31 F 94 Yes
## 8 Harry 26 M 83 Yes
## 9 Ivy 30 F 90 Yes
## 10 Jack 33 M 86 Yes
We generate dummy data for weight and height, creating a new data frame ‘df1.’ The BMI is calculated and added as a new variable, ‘bmi.’
## # A tibble: 100 × 3
## weight height bmi
## <dbl> <dbl> <dbl>
## 1 64.4 174. 21.3
## 2 89.4 163. 33.5
## 3 70.4 170. 24.5
## 4 94.2 188. 26.6
## 5 97.0 169. 33.8
## 6 52.3 186. 15.2
## 7 76.4 187. 21.9
## 8 94.6 174. 31.1
## 9 77.6 166. 28.0
## 10 72.8 156. 30.0
## # ℹ 90 more rows
A new variable, ‘log_score,’ is introduced by taking the logarithm of the ‘score’ variable.
## name age gender score log_score
## 1 Alice 25 F 85 4.442651
## 2 Bob 32 M 76 4.330733
## 3 Charlie 28 M 92 4.521789
## 4 David 24 M 81 4.394449
## 5 Eve 27 F 88 4.477337
## 6 Frank 29 M 79 4.369448
## 7 Grace 31 F 94 4.543295
## 8 Harry 26 M 83 4.418841
## 9 Ivy 30 F 90 4.499810
## 10 Jack 33 M 86 4.454347
An ‘id’ variable is created by combining the ‘name’ and ‘age’ variables.
## name age gender score id
## 1 Alice 25 F 85 Alice_25
## 2 Bob 32 M 76 Bob_32
## 3 Charlie 28 M 92 Charlie_28
## 4 David 24 M 81 David_24
## 5 Eve 27 F 88 Eve_27
## 6 Frank 29 M 79 Frank_29
## 7 Grace 31 F 94 Grace_31
## 8 Harry 26 M 83 Harry_26
## 9 Ivy 30 F 90 Ivy_30
## 10 Jack 33 M 86 Jack_33
We inspect the structure of the ‘df’ data frame using the
glimpse
function.
## Rows: 10
## Columns: 4
## $ name <chr> "Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "H…
## $ age <dbl> 25, 32, 28, 24, 27, 29, 31, 26, 30, 33
## $ gender <chr> "F", "M", "M", "M", "F", "M", "F", "M", "F", "M"
## $ score <dbl> 85, 76, 92, 81, 88, 79, 94, 83, 90, 86