Data manipulation in R involves the systematic process of refining and restructuring data to enhance its quality and prepare it for analysis. This includes handling missing values, outliers, and transforming variables. Techniques like aggregation, subsetting, merging, and reshaping data are pivotal for extracting meaningful insights, enabling informed decision-making.
#pfizer dataset
data=read.csv("C:/Users/sudha/OneDrive/Desktop/dv/data.csv")
data
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## 7 Abebe, Sheila Y Sheila Y Sheila Abebe
## 8 Abebe, Sheila Y Sheila Y Sheila Abebe
## 9 Abilene Family Foot Center Galen Chris Galen Albritton
## 10 Abolnik, Igor Z Igor Z Igor Abolnik
## 11 Abolnik, Igor Z Igor Z Igor Abolnik
## City State Category Cash Other Total
## 1 New Orleans LA Professional Advising 2625 0 2625
## 2 Paso Robles CA Expert-Led Forums 1000 0 1000
## 3 Miami FL Business Related Travel 0 448 448
## 4 Miami FL Meals 0 119 119
## 5 Miami FL Professional Advising 1800 0 1800
## 6 Flint MI Expert-Led Forums 750 0 750
## 7 Indianapolis IN Educational Items 0 47 47
## 8 Indianapolis IN Expert-Led Forums 825 0 825
## 9 Abilene TX Professional Advising 3000 0 3000
## 10 Provo UT Business Related Travel 0 396 396
## 11 Provo Ut Expert-Led Forums 1750 0 1750
‘pfizer’ is a sample data.
#a. and b. structure and summary of the data
str(data)#structure
## 'data.frame': 11 obs. of 10 variables:
## $ Org_Indiv : chr "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
## $ First_Plus: chr "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
## $ First_Name: chr "Steven" "Aakash" "Lilian" "Lilian" ...
## $ Last_Name : chr "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
## $ City : chr "New Orleans" "Paso Robles" "Miami" "Miami" ...
## $ State : chr "LA" "CA" "FL" "FL" ...
## $ Category : chr "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
## $ Cash : int 2625 1000 0 0 1800 750 0 825 3000 0 ...
## $ Other : int 0 0 448 119 0 0 47 0 0 396 ...
## $ Total : int 2625 1000 448 119 1800 750 47 825 3000 396 ...
summary(data)#summary
## Org_Indiv First_Plus First_Name Last_Name
## Length:11 Length:11 Length:11 Length:11
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## City State Category Cash
## Length:11 Length:11 Length:11 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 0
## Mode :character Mode :character Mode :character Median : 825
## Mean :1068
## 3rd Qu.:1775
## Max. :3000
## Other Total
## Min. : 0.00 Min. : 47
## 1st Qu.: 0.00 1st Qu.: 422
## Median : 0.00 Median : 825
## Mean : 91.82 Mean :1160
## 3rd Qu.: 83.00 3rd Qu.:1775
## Max. :448.00 Max. :3000
#conversion functions
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Example 1: Convert a column to a different data type (e.g., from character to numeric)
my_data <- as.numeric(data$Total)
my_data
## [1] 2625 1000 448 119 1800 750 47 825 3000 396 1750
# Example 2: Convert a column to a factor
my_data <- as.factor(data$Category)
my_data
## [1] Professional Advising Expert-Led Forums Business Related Travel
## [4] Meals Professional Advising Expert-Led Forums
## [7] Educational Items Expert-Led Forums Professional Advising
## [10] Business Related Travel Expert-Led Forums
## 5 Levels: Business Related Travel Educational Items ... Professional Advising
# Example 3: Perform some data transformations using dplyr (e.g., calculate a new column)
my_data <- data %>%
mutate(TotalSquared = Total^2)
my_data
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## 7 Abebe, Sheila Y Sheila Y Sheila Abebe
## 8 Abebe, Sheila Y Sheila Y Sheila Abebe
## 9 Abilene Family Foot Center Galen Chris Galen Albritton
## 10 Abolnik, Igor Z Igor Z Igor Abolnik
## 11 Abolnik, Igor Z Igor Z Igor Abolnik
## City State Category Cash Other Total TotalSquared
## 1 New Orleans LA Professional Advising 2625 0 2625 6890625
## 2 Paso Robles CA Expert-Led Forums 1000 0 1000 1000000
## 3 Miami FL Business Related Travel 0 448 448 200704
## 4 Miami FL Meals 0 119 119 14161
## 5 Miami FL Professional Advising 1800 0 1800 3240000
## 6 Flint MI Expert-Led Forums 750 0 750 562500
## 7 Indianapolis IN Educational Items 0 47 47 2209
## 8 Indianapolis IN Expert-Led Forums 825 0 825 680625
## 9 Abilene TX Professional Advising 3000 0 3000 9000000
## 10 Provo UT Business Related Travel 0 396 396 156816
## 11 Provo Ut Expert-Led Forums 1750 0 1750 3062500
# After performing the desired conversion and transformation operations, you can view the modified data frame:
str(my_data)
## 'data.frame': 11 obs. of 11 variables:
## $ Org_Indiv : chr "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
## $ First_Plus : chr "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
## $ First_Name : chr "Steven" "Aakash" "Lilian" "Lilian" ...
## $ Last_Name : chr "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
## $ City : chr "New Orleans" "Paso Robles" "Miami" "Miami" ...
## $ State : chr "LA" "CA" "FL" "FL" ...
## $ Category : chr "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
## $ Cash : int 2625 1000 0 0 1800 750 0 825 3000 0 ...
## $ Other : int 0 0 448 119 0 0 47 0 0 396 ...
## $ Total : int 2625 1000 448 119 1800 750 47 825 3000 396 ...
## $ TotalSquared: num 6890625 1000000 200704 14161 3240000 ...
dplyr is an R package for efficient data manipulation.
It offers easy-to-use functions like filtering, sorting, and summarizing
data, following tidy data principles for analysis.
#c.head of the data
head(data)
## Org_Indiv First_Plus First_Name Last_Name City
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig New Orleans
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja Paso Robles
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo Miami
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo Miami
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo Miami
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee Flint
## State Category Cash Other Total
## 1 LA Professional Advising 2625 0 2625
## 2 CA Expert-Led Forums 1000 0 1000
## 3 FL Business Related Travel 0 448 448
## 4 FL Meals 0 119 119
## 5 FL Professional Advising 1800 0 1800
## 6 MI Expert-Led Forums 750 0 750
# d) Filter the data for payments related to 'Expert-Led Forums' or 'Professional Advising' and arrange alphabetically by doctor
filtered_data <- data %>%
filter(Category %in% c("Expert-Led Forums", "Professional Advising")) %>%
arrange(Last_Name, First_Name)
# Print filtered data
print("Filtered Data:")
## [1] "Filtered Data:"
print(filtered_data)
## Org_Indiv First_Plus First_Name Last_Name
## 1 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 2 Abebe, Sheila Y Sheila Y Sheila Abebe
## 3 Abolnik, Igor Z Igor Z Igor Abolnik
## 4 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 5 Abilene Family Foot Center Galen Chris Galen Albritton
## 6 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 7 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## City State Category Cash Other Total
## 1 Miami FL Professional Advising 1800 0 1800
## 2 Indianapolis IN Expert-Led Forums 825 0 825
## 3 Provo Ut Expert-Led Forums 1750 0 1750
## 4 Paso Robles CA Expert-Led Forums 1000 0 1000
## 5 Abilene TX Professional Advising 3000 0 3000
## 6 New Orleans LA Professional Advising 2625 0 2625
## 7 Flint MI Expert-Led Forums 750 0 750
# Load the stringr package to use str_detect
library(stringr)
# e) Use pattern matching to filter text
pattern_matched_data <- data %>%
filter(str_detect(Category, "Professional Advising"))
# Print pattern matched data
print("Pattern Matched Data:")
## [1] "Pattern Matched Data:"
print(pattern_matched_data)
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 3 Abilene Family Foot Center Galen Chris Galen Albritton
## City State Category Cash Other Total
## 1 New Orleans LA Professional Advising 2625 0 2625
## 2 Miami FL Professional Advising 1800 0 1800
## 3 Abilene TX Professional Advising 3000 0 3000
The stringr package is an R package that provides a set
of functions for working with strings, making string manipulation and
pattern matching more convenient and intuitive.
# f) Append one data frame to another
# Assuming df2 is another dataframe
df2 <- data.frame(Org_Indiv = c("Sample Org"),
First_Plus = c("John Doe"),
First_Name = c("John"),
Last_Name = c("Doe"),
City = c("Sample City"),
State = c("XX"),
Category = c("Professional Advising"),
Cash = c(1500),
Other = c(200),
Total = c(1700))
appended_data <- rbind(data, df2)
# Print appended data
print("Appended Data:")
## [1] "Appended Data:"
print(appended_data)
## Org_Indiv First_Plus First_Name Last_Name
## 1 3-D Medical Services Llc Steven Bruce Steven Deitelzweig
## 2 Aa Doctors, Inc. Aakash Mohan Aakash Ahuja
## 3 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 4 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 5 Abbo, Lilian Margarita Lilian Margarita Lilian Abbo
## 6 Abdullah Raffee Md Pc Abdullah Abdullah Raffee
## 7 Abebe, Sheila Y Sheila Y Sheila Abebe
## 8 Abebe, Sheila Y Sheila Y Sheila Abebe
## 9 Abilene Family Foot Center Galen Chris Galen Albritton
## 10 Abolnik, Igor Z Igor Z Igor Abolnik
## 11 Abolnik, Igor Z Igor Z Igor Abolnik
## 12 Sample Org John Doe John Doe
## City State Category Cash Other Total
## 1 New Orleans LA Professional Advising 2625 0 2625
## 2 Paso Robles CA Expert-Led Forums 1000 0 1000
## 3 Miami FL Business Related Travel 0 448 448
## 4 Miami FL Meals 0 119 119
## 5 Miami FL Professional Advising 1800 0 1800
## 6 Flint MI Expert-Led Forums 750 0 750
## 7 Indianapolis IN Educational Items 0 47 47
## 8 Indianapolis IN Expert-Led Forums 825 0 825
## 9 Abilene TX Professional Advising 3000 0 3000
## 10 Provo UT Business Related Travel 0 396 396
## 11 Provo Ut Expert-Led Forums 1750 0 1750
## 12 Sample City XX Professional Advising 1500 200 1700
# g) Write data to a CSV file
write.csv(data, file = "data.csv", row.names = FALSE)
# h) View the structure and summary of the given data
str(data)
## 'data.frame': 11 obs. of 10 variables:
## $ Org_Indiv : chr "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
## $ First_Plus: chr "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
## $ First_Name: chr "Steven" "Aakash" "Lilian" "Lilian" ...
## $ Last_Name : chr "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
## $ City : chr "New Orleans" "Paso Robles" "Miami" "Miami" ...
## $ State : chr "LA" "CA" "FL" "FL" ...
## $ Category : chr "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
## $ Cash : int 2625 1000 0 0 1800 750 0 825 3000 0 ...
## $ Other : int 0 0 448 119 0 0 47 0 0 396 ...
## $ Total : int 2625 1000 448 119 1800 750 47 825 3000 396 ...
summary(data)
## Org_Indiv First_Plus First_Name Last_Name
## Length:11 Length:11 Length:11 Length:11
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## City State Category Cash
## Length:11 Length:11 Length:11 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 0
## Mode :character Mode :character Mode :character Median : 825
## Mean :1068
## 3rd Qu.:1775
## Max. :3000
## Other Total
## Min. : 0.00 Min. : 47
## 1st Qu.: 0.00 1st Qu.: 422
## Median : 0.00 Median : 825
## Mean : 91.82 Mean :1160
## 3rd Qu.: 83.00 3rd Qu.:1775
## Max. :448.00 Max. :3000
# i) Convert the 'Total' column to a numeric variable
data$Total <- as.numeric(data$Total)
data$Total
## [1] 2625 1000 448 119 1800 750 47 825 3000 396 1750
#mtcars dataset
data("mtcars")
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
is.null(mtcars)
## [1] FALSE
mtcars is an in-built dataset of rstudio. no null values are there in this dataset.
selecting specific columns using select function:
# Select the mpg and cyl columns
mtcars_select <- select(mtcars, mpg, cyl)
mtcars_select
## mpg cyl
## Mazda RX4 21.0 6
## Mazda RX4 Wag 21.0 6
## Datsun 710 22.8 4
## Hornet 4 Drive 21.4 6
## Hornet Sportabout 18.7 8
## Valiant 18.1 6
## Duster 360 14.3 8
## Merc 240D 24.4 4
## Merc 230 22.8 4
## Merc 280 19.2 6
## Merc 280C 17.8 6
## Merc 450SE 16.4 8
## Merc 450SL 17.3 8
## Merc 450SLC 15.2 8
## Cadillac Fleetwood 10.4 8
## Lincoln Continental 10.4 8
## Chrysler Imperial 14.7 8
## Fiat 128 32.4 4
## Honda Civic 30.4 4
## Toyota Corolla 33.9 4
## Toyota Corona 21.5 4
## Dodge Challenger 15.5 8
## AMC Javelin 15.2 8
## Camaro Z28 13.3 8
## Pontiac Firebird 19.2 8
## Fiat X1-9 27.3 4
## Porsche 914-2 26.0 4
## Lotus Europa 30.4 4
## Ford Pantera L 15.8 8
## Ferrari Dino 19.7 6
## Maserati Bora 15.0 8
## Volvo 142E 21.4 4
filtering the data through filter function using a condition:
# Filter for cars with more than 6 cylinders
mtcars_filter <- filter(mtcars, cyl > 6)
mtcars_filter
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
sorting using arrange function:
# Arrange the dataset by descending mpg
mtcars_arrange <- arrange(mtcars, desc(mpg))
mtcars_arrange
## mpg cyl disp hp drat wt qsec vs am gear carb
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
desc function sorts data in descending order.
renaming a column using rename function:
# Rename the mpg column to miles_per_gallon
mtcars_rename <- rename(mtcars, miles_per_gallon = mpg)
mtcars_rename
## miles_per_gallon cyl disp hp drat wt qsec vs am gear
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4
## carb
## Mazda RX4 4
## Mazda RX4 Wag 4
## Datsun 710 1
## Hornet 4 Drive 1
## Hornet Sportabout 2
## Valiant 1
## Duster 360 4
## Merc 240D 2
## Merc 230 2
## Merc 280 4
## Merc 280C 4
## Merc 450SE 3
## Merc 450SL 3
## Merc 450SLC 3
## Cadillac Fleetwood 4
## Lincoln Continental 4
## Chrysler Imperial 4
## Fiat 128 1
## Honda Civic 2
## Toyota Corolla 1
## Toyota Corona 1
## Dodge Challenger 2
## AMC Javelin 2
## Camaro Z28 4
## Pontiac Firebird 2
## Fiat X1-9 1
## Porsche 914-2 2
## Lotus Europa 2
## Ford Pantera L 4
## Ferrari Dino 6
## Maserati Bora 8
## Volvo 142E 2
add a column using mutate function:
# Add a new column for horsepower per weight
mtcars_mutate <- mutate(mtcars, hp_per_wt = hp / wt)
mtcars_mutate
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## hp_per_wt
## Mazda RX4 41.98473
## Mazda RX4 Wag 38.26087
## Datsun 710 40.08621
## Hornet 4 Drive 34.21462
## Hornet Sportabout 50.87209
## Valiant 30.34682
## Duster 360 68.62745
## Merc 240D 19.43574
## Merc 230 30.15873
## Merc 280 35.75581
## Merc 280C 35.75581
## Merc 450SE 44.22604
## Merc 450SL 48.25737
## Merc 450SLC 47.61905
## Cadillac Fleetwood 39.04762
## Lincoln Continental 39.63864
## Chrysler Imperial 43.03087
## Fiat 128 30.00000
## Honda Civic 32.19814
## Toyota Corolla 35.42234
## Toyota Corona 39.35091
## Dodge Challenger 42.61364
## AMC Javelin 43.66812
## Camaro Z28 63.80208
## Pontiac Firebird 45.51365
## Fiat X1-9 34.10853
## Porsche 914-2 42.52336
## Lotus Europa 74.68605
## Ford Pantera L 83.28076
## Ferrari Dino 63.17690
## Maserati Bora 93.83754
## Volvo 142E 39.20863
# Combine two datasets with the same columns
mtcars_combined <- bind_rows(mtcars, mtcars)
mtcars_combined
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4...1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag...2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710...3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive...4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout...5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant...6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360...7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D...8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230...9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280...10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C...11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE...12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL...13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC...14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood...15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental...16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial...17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128...18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic...19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla...20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona...21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger...22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin...23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28...24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird...25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9...26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2...27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa...28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L...29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino...30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora...31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E...32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Mazda RX4...33 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag...34 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710...35 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive...36 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout...37 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant...38 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360...39 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D...40 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230...41 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280...42 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C...43 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE...44 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL...45 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC...46 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood...47 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental...48 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial...49 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128...50 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic...51 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla...52 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona...53 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger...54 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin...55 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28...56 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird...57 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9...58 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2...59 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa...60 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L...61 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino...62 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora...63 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E...64 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Group the dataset by number of cylinders and summarize the mean mpg
mtcars_grouped <- group_by(mtcars, cyl)
mtcars_summarized <- summarize(mtcars_grouped, mean_mpg = mean(mpg))
mtcars_grouped
## # A tibble: 32 × 11
## # Groups: cyl [3]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # ℹ 22 more rows
mtcars_summarized
## # A tibble: 3 × 2
## cyl mean_mpg
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1
the above functions are some of the data manipulation functions. these functions are available in dplyr package.