Data Manipulation is one of the initial processes done in Data Analysis. It involves arranging or rearranging data points to make it easier for users/data analysts to perform necessary insights or business directives. Data Manipulation encompasses a broad range of tools and languages, which may include coding and non-coding techniques. It is not only used extensively by Data Analysts but also by business people and accountants to view the budget of a certain project.
It also has its programming language, DML (Data Manipulation Language) which is used to alter data in databases. Let’s know what exactly Data manipulation is
#pfizer dataset
data=read.csv("C:/Users/moola/OneDrive/Desktop/pfizers.csv")
data
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## 7 Abebe, Sheila Y Sheila Y Sheila Abebe
## 8 Abebe, Sheila Y Sheila Y Sheila Abebe
## 9 Abilene Family Foot Center Galen Chris Galen Albritton
## 10 Abolnik, Igor Z Igor Z Igor Abolnik
## 11 Abolnik, Igor Z Igor Z Igor Abolnik
## City State Category Cash Other Total
## 1 New Orleans LA Professional Advising 2625 0 2625
## 2 Paso Robles CA Expert-Led Forums 1000 0 1000
## 3 Miami FL Business Related Travel 0 448 448
## 4 Miami FL Meals 0 119 119
## 5 Miami FL Professional Advising 1800 0 1800
## 6 Flint MI Expert-Led Forums 750 0 750
## 7 Indianapolis IN Educational Items 0 47 47
## 8 Indianapolis IN Expert-Led Forums 825 0 825
## 9 Abilene TX Professional Advising 3000 0 3000
## 10 Provo UT Business Related Travel 0 396 396
## 11 Provo Ut Expert-Led Forums 1750 0 1750
‘pfizer’ is a sample data.
str(data)
## 'data.frame': 11 obs. of 10 variables:
## $ Org_Indiv : chr "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
## $ First_Plus: chr "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
## $ First_Name: chr "Steven" "Aakash" "Lilian" "Lilian" ...
## $ Last_Name : chr "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
## $ City : chr "New Orleans" "Paso Robles" "Miami" "Miami" ...
## $ State : chr "LA" "CA" "FL" "FL" ...
## $ Category : chr "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
## $ Cash : int 2625 1000 0 0 1800 750 0 825 3000 0 ...
## $ Other : int 0 0 448 119 0 0 47 0 0 396 ...
## $ Total : int 2625 1000 448 119 1800 750 47 825 3000 396 ...
summary(data)#summary
## Org_Indiv First_Plus First_Name Last_Name
## Length:11 Length:11 Length:11 Length:11
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## City State Category Cash
## Length:11 Length:11 Length:11 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 0
## Mode :character Mode :character Mode :character Median : 825
## Mean :1068
## 3rd Qu.:1775
## Max. :3000
## Other Total
## Min. : 0.00 Min. : 47
## 1st Qu.: 0.00 1st Qu.: 422
## Median : 0.00 Median : 825
## Mean : 91.82 Mean :1160
## 3rd Qu.: 83.00 3rd Qu.:1775
## Max. :448.00 Max. :3000
#conversion functions
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Example 1: Convert a column to a different data type (e.g., from character to numeric)
my_data <- as.numeric(data$Total)
my_data
## [1] 2625 1000 448 119 1800 750 47 825 3000 396 1750
# Example 2: Convert a column to a factor
my_data <- as.factor(data$Category)
my_data
## [1] Professional Advising Expert-Led Forums Business Related Travel
## [4] Meals Professional Advising Expert-Led Forums
## [7] Educational Items Expert-Led Forums Professional Advising
## [10] Business Related Travel Expert-Led Forums
## 5 Levels: Business Related Travel Educational Items ... Professional Advising
# Example 3: Perform some data transformations using dplyr (e.g., calculate a new column)
my_data <- data %>%
mutate(TotalSquared = Total^2)
my_data
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## 7 Abebe, Sheila Y Sheila Y Sheila Abebe
## 8 Abebe, Sheila Y Sheila Y Sheila Abebe
## 9 Abilene Family Foot Center Galen Chris Galen Albritton
## 10 Abolnik, Igor Z Igor Z Igor Abolnik
## 11 Abolnik, Igor Z Igor Z Igor Abolnik
## City State Category Cash Other Total TotalSquared
## 1 New Orleans LA Professional Advising 2625 0 2625 6890625
## 2 Paso Robles CA Expert-Led Forums 1000 0 1000 1000000
## 3 Miami FL Business Related Travel 0 448 448 200704
## 4 Miami FL Meals 0 119 119 14161
## 5 Miami FL Professional Advising 1800 0 1800 3240000
## 6 Flint MI Expert-Led Forums 750 0 750 562500
## 7 Indianapolis IN Educational Items 0 47 47 2209
## 8 Indianapolis IN Expert-Led Forums 825 0 825 680625
## 9 Abilene TX Professional Advising 3000 0 3000 9000000
## 10 Provo UT Business Related Travel 0 396 396 156816
## 11 Provo Ut Expert-Led Forums 1750 0 1750 3062500
# After performing the desired conversion and transformation operations, you can view the modified data frame:
str(my_data)
## 'data.frame': 11 obs. of 11 variables:
## $ Org_Indiv : chr "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
## $ First_Plus : chr "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
## $ First_Name : chr "Steven" "Aakash" "Lilian" "Lilian" ...
## $ Last_Name : chr "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
## $ City : chr "New Orleans" "Paso Robles" "Miami" "Miami" ...
## $ State : chr "LA" "CA" "FL" "FL" ...
## $ Category : chr "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
## $ Cash : int 2625 1000 0 0 1800 750 0 825 3000 0 ...
## $ Other : int 0 0 448 119 0 0 47 0 0 396 ...
## $ Total : int 2625 1000 448 119 1800 750 47 825 3000 396 ...
## $ TotalSquared: num 6890625 1000000 200704 14161 3240000 ...
The ‘dplyr’ package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles. It offers easy-to-use functions like filtering, sorting, and summarizing data, following tidy data principles for analysis.
#c.head of the data
head(data)
## Org_Indiv First_Plus First_Name Last_Name City
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig New Orleans
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja Paso Robles
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo Miami
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo Miami
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo Miami
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee Flint
## State Category Cash Other Total
## 1 LA Professional Advising 2625 0 2625
## 2 CA Expert-Led Forums 1000 0 1000
## 3 FL Business Related Travel 0 448 448
## 4 FL Meals 0 119 119
## 5 FL Professional Advising 1800 0 1800
## 6 MI Expert-Led Forums 750 0 750
# d) Filter the data for payments related to 'Expert-Led Forums' or 'Professional Advising' and arrange alphabetically by doctor
filtered_data <- data %>%
filter(Category %in% c("Expert-Led Forums", "Professional Advising")) %>%
arrange(Last_Name, First_Name)
# Print filtered data
print("Filtered Data:")
## [1] "Filtered Data:"
print(filtered_data)
## Org_Indiv First_Plus First_Name Last_Name
## 1 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 2 Abebe, Sheila Y Sheila Y Sheila Abebe
## 3 Abolnik, Igor Z Igor Z Igor Abolnik
## 4 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 5 Abilene Family Foot Center Galen Chris Galen Albritton
## 6 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 7 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## City State Category Cash Other Total
## 1 Miami FL Professional Advising 1800 0 1800
## 2 Indianapolis IN Expert-Led Forums 825 0 825
## 3 Provo Ut Expert-Led Forums 1750 0 1750
## 4 Paso Robles CA Expert-Led Forums 1000 0 1000
## 5 Abilene TX Professional Advising 3000 0 3000
## 6 New Orleans LA Professional Advising 2625 0 2625
## 7 Flint MI Expert-Led Forums 750 0 750
# Load the stringr package to use str_detect
library(stringr)
# e) Use pattern matching to filter text
pattern_matched_data <- data %>%
filter(str_detect(Category, "Professional Advising"))
# Print pattern matched data
print("Pattern Matched Data:")
## [1] "Pattern Matched Data:"
print(pattern_matched_data)
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 3 Abilene Family Foot Center Galen Chris Galen Albritton
## City State Category Cash Other Total
## 1 New Orleans LA Professional Advising 2625 0 2625
## 2 Miami FL Professional Advising 1800 0 1800
## 3 Abilene TX Professional Advising 3000 0 3000
Character data plays a vital role in data analysis and manipulation using R programming. To facilitate these tasks, the Stringr package was developed by Hadley Wickham. This package offers a range of functions that help in working with character strings in R. The Stringr package simplifies the string manipulation process and enables easy extraction, replacement, and manipulation of substrings in R.
# f) Append one data frame to another
# Assuming df2 is another dataframe
df2 <- data.frame(Org_Indiv = c("abilene center"),
First_Plus = c("Rachel green"),
First_Name = c("Rachel"),
Last_Name = c("Green"),
City = c("baltimore"),
State = c("milton"),
Category = c("Professional Advising"),
Cash = c(1450),
Other = c(320),
Total = c(1700))
appended_data <- rbind(data, df2)
# Print appended data
print("Appended Data:")
## [1] "Appended Data:"
print(appended_data)
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## 7 Abebe, Sheila Y Sheila Y Sheila Abebe
## 8 Abebe, Sheila Y Sheila Y Sheila Abebe
## 9 Abilene Family Foot Center Galen Chris Galen Albritton
## 10 Abolnik, Igor Z Igor Z Igor Abolnik
## 11 Abolnik, Igor Z Igor Z Igor Abolnik
## 12 abilene center Rachel green Rachel Green
## City State Category Cash Other Total
## 1 New Orleans LA Professional Advising 2625 0 2625
## 2 Paso Robles CA Expert-Led Forums 1000 0 1000
## 3 Miami FL Business Related Travel 0 448 448
## 4 Miami FL Meals 0 119 119
## 5 Miami FL Professional Advising 1800 0 1800
## 6 Flint MI Expert-Led Forums 750 0 750
## 7 Indianapolis IN Educational Items 0 47 47
## 8 Indianapolis IN Expert-Led Forums 825 0 825
## 9 Abilene TX Professional Advising 3000 0 3000
## 10 Provo UT Business Related Travel 0 396 396
## 11 Provo Ut Expert-Led Forums 1750 0 1750
## 12 baltimore milton Professional Advising 1450 320 1700
# g) Write data to a CSV file
write.csv(data, file = "data.csv", row.names = FALSE)
# h) View the structure and summary of the given data
str(data)
## 'data.frame': 11 obs. of 10 variables:
## $ Org_Indiv : chr "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
## $ First_Plus: chr "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
## $ First_Name: chr "Steven" "Aakash" "Lilian" "Lilian" ...
## $ Last_Name : chr "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
## $ City : chr "New Orleans" "Paso Robles" "Miami" "Miami" ...
## $ State : chr "LA" "CA" "FL" "FL" ...
## $ Category : chr "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
## $ Cash : int 2625 1000 0 0 1800 750 0 825 3000 0 ...
## $ Other : int 0 0 448 119 0 0 47 0 0 396 ...
## $ Total : int 2625 1000 448 119 1800 750 47 825 3000 396 ...
summary(data)
## Org_Indiv First_Plus First_Name Last_Name
## Length:11 Length:11 Length:11 Length:11
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## City State Category Cash
## Length:11 Length:11 Length:11 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 0
## Mode :character Mode :character Mode :character Median : 825
## Mean :1068
## 3rd Qu.:1775
## Max. :3000
## Other Total
## Min. : 0.00 Min. : 47
## 1st Qu.: 0.00 1st Qu.: 422
## Median : 0.00 Median : 825
## Mean : 91.82 Mean :1160
## 3rd Qu.: 83.00 3rd Qu.:1775
## Max. :448.00 Max. :3000
# i) Convert the 'Total' column to a numeric variable
data$Total <- as.numeric(data$Total)
data$Total
## [1] 2625 1000 448 119 1800 750 47 825 3000 396 1750
#mtcars dataset
data("mtcars")
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
is.null(mtcars)
## [1] FALSE
mtcars is an in-built dataset of rstudio. no null values are there in this dataset.Large number of basic as well as other complex application can be done on this data set.
selecting specific columns using select function:
# Select the mpg and cyl columns
mtcars_select <- select(mtcars, wt, hp)
mtcars_select
## wt hp
## Mazda RX4 2.620 110
## Mazda RX4 Wag 2.875 110
## Datsun 710 2.320 93
## Hornet 4 Drive 3.215 110
## Hornet Sportabout 3.440 175
## Valiant 3.460 105
## Duster 360 3.570 245
## Merc 240D 3.190 62
## Merc 230 3.150 95
## Merc 280 3.440 123
## Merc 280C 3.440 123
## Merc 450SE 4.070 180
## Merc 450SL 3.730 180
## Merc 450SLC 3.780 180
## Cadillac Fleetwood 5.250 205
## Lincoln Continental 5.424 215
## Chrysler Imperial 5.345 230
## Fiat 128 2.200 66
## Honda Civic 1.615 52
## Toyota Corolla 1.835 65
## Toyota Corona 2.465 97
## Dodge Challenger 3.520 150
## AMC Javelin 3.435 150
## Camaro Z28 3.840 245
## Pontiac Firebird 3.845 175
## Fiat X1-9 1.935 66
## Porsche 914-2 2.140 91
## Lotus Europa 1.513 113
## Ford Pantera L 3.170 264
## Ferrari Dino 2.770 175
## Maserati Bora 3.570 335
## Volvo 142E 2.780 109
filtering the data through filter function using a condition:
# Filter for cars with more than 6 cylinders
mtcars_filter <- filter(mtcars, hp > 6)
mtcars_filter
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
sorting using arrange function:
# Arrange the dataset by descending mpg
mtcars_arrange <- arrange(mtcars, desc(cyl))
mtcars_arrange
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
desc function sorts data in descending order.
renaming a column using rename function:
# Rename the mpg column to miles_per_gallon
mtcars_rename <- rename(mtcars, miles_per_gallon = mpg)
mtcars_rename
## miles_per_gallon cyl disp hp drat wt qsec vs am gear
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4
## carb
## Mazda RX4 4
## Mazda RX4 Wag 4
## Datsun 710 1
## Hornet 4 Drive 1
## Hornet Sportabout 2
## Valiant 1
## Duster 360 4
## Merc 240D 2
## Merc 230 2
## Merc 280 4
## Merc 280C 4
## Merc 450SE 3
## Merc 450SL 3
## Merc 450SLC 3
## Cadillac Fleetwood 4
## Lincoln Continental 4
## Chrysler Imperial 4
## Fiat 128 1
## Honda Civic 2
## Toyota Corolla 1
## Toyota Corona 1
## Dodge Challenger 2
## AMC Javelin 2
## Camaro Z28 4
## Pontiac Firebird 2
## Fiat X1-9 1
## Porsche 914-2 2
## Lotus Europa 2
## Ford Pantera L 4
## Ferrari Dino 6
## Maserati Bora 8
## Volvo 142E 2
add a column using mutate function:
# Add a new column for horsepower per weight
mtcars_mutate <- mutate(mtcars, hp_per_wt = hp / wt)
mtcars_mutate
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## hp_per_wt
## Mazda RX4 41.98473
## Mazda RX4 Wag 38.26087
## Datsun 710 40.08621
## Hornet 4 Drive 34.21462
## Hornet Sportabout 50.87209
## Valiant 30.34682
## Duster 360 68.62745
## Merc 240D 19.43574
## Merc 230 30.15873
## Merc 280 35.75581
## Merc 280C 35.75581
## Merc 450SE 44.22604
## Merc 450SL 48.25737
## Merc 450SLC 47.61905
## Cadillac Fleetwood 39.04762
## Lincoln Continental 39.63864
## Chrysler Imperial 43.03087
## Fiat 128 30.00000
## Honda Civic 32.19814
## Toyota Corolla 35.42234
## Toyota Corona 39.35091
## Dodge Challenger 42.61364
## AMC Javelin 43.66812
## Camaro Z28 63.80208
## Pontiac Firebird 45.51365
## Fiat X1-9 34.10853
## Porsche 914-2 42.52336
## Lotus Europa 74.68605
## Ford Pantera L 83.28076
## Ferrari Dino 63.17690
## Maserati Bora 93.83754
## Volvo 142E 39.20863
# Combine two datasets with the same columns
mtcars_combined <- bind_rows(mtcars, mtcars)
mtcars_combined
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4...1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag...2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710...3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive...4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout...5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant...6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360...7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D...8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230...9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280...10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C...11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE...12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL...13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC...14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood...15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental...16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial...17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128...18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic...19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla...20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona...21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger...22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin...23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28...24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird...25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9...26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2...27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa...28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L...29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino...30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora...31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E...32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Mazda RX4...33 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag...34 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710...35 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive...36 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout...37 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant...38 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360...39 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D...40 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230...41 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280...42 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C...43 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE...44 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL...45 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC...46 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood...47 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental...48 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial...49 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128...50 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic...51 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla...52 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona...53 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger...54 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin...55 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28...56 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird...57 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9...58 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2...59 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa...60 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L...61 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino...62 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora...63 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E...64 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Group the dataset by number of cylinders and summarize the mean wt
mtcars_grouped <- group_by(mtcars, wt)
mtcars_summarized <- summarize(mtcars_grouped, mean_wt = mean(wt))
mtcars_grouped
## # A tibble: 32 × 11
## # Groups: wt [29]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # ℹ 22 more rows
mtcars_summarized
## # A tibble: 29 × 2
## wt mean_wt
## <dbl> <dbl>
## 1 1.51 1.51
## 2 1.62 1.62
## 3 1.84 1.84
## 4 1.94 1.94
## 5 2.14 2.14
## 6 2.2 2.2
## 7 2.32 2.32
## 8 2.46 2.46
## 9 2.62 2.62
## 10 2.77 2.77
## # ℹ 19 more rows
the above functions are some of the basic data manipulation functions. these functions are available in dplyr package ready to use.