data manipulation in r

Data manipulation in R involves the systematic process of refining and restructuring data to enhance its quality and prepare it for analysis. This includes handling missing values, outliers, and transforming variables. Techniques like aggregation, subsetting, merging, and reshaping data are pivotal for extracting meaningful insights, enabling informed decision-making.

#pfizer dataset
data=read.csv("C:/Users/sudha/OneDrive/Desktop/dv/data.csv")
data
##                     Org_Indiv       First_Plus First_Name   Last_Name
## 1    3-D Medical Services Llc     Steven Bruce     Steven Deitelzweig
## 2            Aa Doctors, Inc.     Aakash Mohan     Aakash       Ahuja
## 3      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 4      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 5      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 6       Abdullah Raffee Md Pc         Abdullah   Abdullah      Raffee
## 7             Abebe, Sheila Y         Sheila Y     Sheila       Abebe
## 8             Abebe, Sheila Y         Sheila Y     Sheila       Abebe
## 9  Abilene Family Foot Center      Galen Chris      Galen   Albritton
## 10            Abolnik, Igor Z           Igor Z       Igor     Abolnik
## 11            Abolnik, Igor Z           Igor Z       Igor     Abolnik
##            City State                Category Cash Other Total
## 1   New Orleans    LA   Professional Advising 2625     0  2625
## 2   Paso Robles    CA       Expert-Led Forums 1000     0  1000
## 3         Miami    FL Business Related Travel    0   448   448
## 4         Miami    FL                   Meals    0   119   119
## 5         Miami    FL   Professional Advising 1800     0  1800
## 6         Flint    MI       Expert-Led Forums  750     0   750
## 7  Indianapolis    IN       Educational Items    0    47    47
## 8  Indianapolis    IN       Expert-Led Forums  825     0   825
## 9       Abilene    TX   Professional Advising 3000     0  3000
## 10        Provo    UT Business Related Travel    0   396   396
## 11        Provo    Ut       Expert-Led Forums 1750     0  1750

‘pfizer’ is a sample data.

#a. and b. structure and summary of the data
str(data)#structure
## 'data.frame':    11 obs. of  10 variables:
##  $ Org_Indiv : chr  "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
##  $ First_Plus: chr  "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
##  $ First_Name: chr  "Steven" "Aakash" "Lilian" "Lilian" ...
##  $ Last_Name : chr  "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
##  $ City      : chr  "New Orleans" "Paso Robles" "Miami" "Miami" ...
##  $ State     : chr  "LA" "CA" "FL" "FL" ...
##  $ Category  : chr  "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
##  $ Cash      : int  2625 1000 0 0 1800 750 0 825 3000 0 ...
##  $ Other     : int  0 0 448 119 0 0 47 0 0 396 ...
##  $ Total     : int  2625 1000 448 119 1800 750 47 825 3000 396 ...
summary(data)#summary
##   Org_Indiv          First_Plus         First_Name         Last_Name        
##  Length:11          Length:11          Length:11          Length:11         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      City              State             Category              Cash     
##  Length:11          Length:11          Length:11          Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:   0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 825  
##                                                           Mean   :1068  
##                                                           3rd Qu.:1775  
##                                                           Max.   :3000  
##      Other            Total     
##  Min.   :  0.00   Min.   :  47  
##  1st Qu.:  0.00   1st Qu.: 422  
##  Median :  0.00   Median : 825  
##  Mean   : 91.82   Mean   :1160  
##  3rd Qu.: 83.00   3rd Qu.:1775  
##  Max.   :448.00   Max.   :3000
#conversion functions 
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Example 1: Convert a column to a different data type (e.g., from character to numeric)
my_data <- as.numeric(data$Total)
my_data
##  [1] 2625 1000  448  119 1800  750   47  825 3000  396 1750
# Example 2: Convert a column to a factor
my_data <- as.factor(data$Category)
my_data
##  [1] Professional Advising   Expert-Led Forums       Business Related Travel
##  [4] Meals                   Professional Advising   Expert-Led Forums      
##  [7] Educational Items       Expert-Led Forums       Professional Advising  
## [10] Business Related Travel Expert-Led Forums      
## 5 Levels: Business Related Travel Educational Items ... Professional Advising
# Example 3: Perform some data transformations using dplyr (e.g., calculate a new column)
my_data <- data %>%
  mutate(TotalSquared = Total^2)
my_data
##                     Org_Indiv       First_Plus First_Name   Last_Name
## 1    3-D Medical Services Llc     Steven Bruce     Steven Deitelzweig
## 2            Aa Doctors, Inc.     Aakash Mohan     Aakash       Ahuja
## 3      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 4      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 5      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 6       Abdullah Raffee Md Pc         Abdullah   Abdullah      Raffee
## 7             Abebe, Sheila Y         Sheila Y     Sheila       Abebe
## 8             Abebe, Sheila Y         Sheila Y     Sheila       Abebe
## 9  Abilene Family Foot Center      Galen Chris      Galen   Albritton
## 10            Abolnik, Igor Z           Igor Z       Igor     Abolnik
## 11            Abolnik, Igor Z           Igor Z       Igor     Abolnik
##            City State                Category Cash Other Total TotalSquared
## 1   New Orleans    LA   Professional Advising 2625     0  2625      6890625
## 2   Paso Robles    CA       Expert-Led Forums 1000     0  1000      1000000
## 3         Miami    FL Business Related Travel    0   448   448       200704
## 4         Miami    FL                   Meals    0   119   119        14161
## 5         Miami    FL   Professional Advising 1800     0  1800      3240000
## 6         Flint    MI       Expert-Led Forums  750     0   750       562500
## 7  Indianapolis    IN       Educational Items    0    47    47         2209
## 8  Indianapolis    IN       Expert-Led Forums  825     0   825       680625
## 9       Abilene    TX   Professional Advising 3000     0  3000      9000000
## 10        Provo    UT Business Related Travel    0   396   396       156816
## 11        Provo    Ut       Expert-Led Forums 1750     0  1750      3062500
# After performing the desired conversion and transformation operations, you can view the modified data frame:
str(my_data)
## 'data.frame':    11 obs. of  11 variables:
##  $ Org_Indiv   : chr  "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
##  $ First_Plus  : chr  "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
##  $ First_Name  : chr  "Steven" "Aakash" "Lilian" "Lilian" ...
##  $ Last_Name   : chr  "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
##  $ City        : chr  "New Orleans" "Paso Robles" "Miami" "Miami" ...
##  $ State       : chr  "LA" "CA" "FL" "FL" ...
##  $ Category    : chr  "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
##  $ Cash        : int  2625 1000 0 0 1800 750 0 825 3000 0 ...
##  $ Other       : int  0 0 448 119 0 0 47 0 0 396 ...
##  $ Total       : int  2625 1000 448 119 1800 750 47 825 3000 396 ...
##  $ TotalSquared: num  6890625 1000000 200704 14161 3240000 ...

dplyr is an R package for efficient data manipulation. It offers easy-to-use functions like filtering, sorting, and summarizing data, following tidy data principles for analysis.

#c.head of the data
head(data)
##                  Org_Indiv       First_Plus First_Name   Last_Name        City
## 1 3-D Medical Services Llc     Steven Bruce     Steven Deitelzweig New Orleans
## 2         Aa Doctors, Inc.     Aakash Mohan     Aakash       Ahuja Paso Robles
## 3   Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo       Miami
## 4   Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo       Miami
## 5   Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo       Miami
## 6    Abdullah Raffee Md Pc         Abdullah   Abdullah      Raffee       Flint
##   State                Category Cash Other Total
## 1    LA   Professional Advising 2625     0  2625
## 2    CA       Expert-Led Forums 1000     0  1000
## 3    FL Business Related Travel    0   448   448
## 4    FL                   Meals    0   119   119
## 5    FL   Professional Advising 1800     0  1800
## 6    MI       Expert-Led Forums  750     0   750
# d) Filter the data for payments related to 'Expert-Led Forums' or 'Professional Advising' and arrange alphabetically by doctor
filtered_data <- data %>%
  filter(Category %in% c("Expert-Led Forums", "Professional Advising")) %>%
  arrange(Last_Name, First_Name)

# Print filtered data
print("Filtered Data:")
## [1] "Filtered Data:"
print(filtered_data)
##                    Org_Indiv       First_Plus First_Name   Last_Name
## 1     Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 2            Abebe, Sheila Y         Sheila Y     Sheila       Abebe
## 3            Abolnik, Igor Z           Igor Z       Igor     Abolnik
## 4           Aa Doctors, Inc.     Aakash Mohan     Aakash       Ahuja
## 5 Abilene Family Foot Center      Galen Chris      Galen   Albritton
## 6   3-D Medical Services Llc     Steven Bruce     Steven Deitelzweig
## 7      Abdullah Raffee Md Pc         Abdullah   Abdullah      Raffee
##           City State              Category Cash Other Total
## 1        Miami    FL Professional Advising 1800     0  1800
## 2 Indianapolis    IN     Expert-Led Forums  825     0   825
## 3        Provo    Ut     Expert-Led Forums 1750     0  1750
## 4  Paso Robles    CA     Expert-Led Forums 1000     0  1000
## 5      Abilene    TX Professional Advising 3000     0  3000
## 6  New Orleans    LA Professional Advising 2625     0  2625
## 7        Flint    MI     Expert-Led Forums  750     0   750
# Load the stringr package to use str_detect
library(stringr)

# e) Use pattern matching to filter text
pattern_matched_data <- data %>%
  filter(str_detect(Category, "Professional Advising"))

# Print pattern matched data
print("Pattern Matched Data:")
## [1] "Pattern Matched Data:"
print(pattern_matched_data)
##                    Org_Indiv       First_Plus First_Name   Last_Name
## 1   3-D Medical Services Llc     Steven Bruce     Steven Deitelzweig
## 2     Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 3 Abilene Family Foot Center      Galen Chris      Galen   Albritton
##          City State              Category Cash Other Total
## 1 New Orleans    LA Professional Advising 2625     0  2625
## 2       Miami    FL Professional Advising 1800     0  1800
## 3     Abilene    TX Professional Advising 3000     0  3000

The stringr package is an R package that provides a set of functions for working with strings, making string manipulation and pattern matching more convenient and intuitive.

# f) Append one data frame to another
# Assuming df2 is another dataframe
df2 <- data.frame(Org_Indiv = c("Sample Org"), 
                  First_Plus = c("John Doe"), 
                  First_Name = c("John"), 
                  Last_Name = c("Doe"), 
                  City = c("Sample City"), 
                  State = c("XX"), 
                  Category = c("Professional Advising"), 
                  Cash = c(1500), 
                  Other = c(200), 
                  Total = c(1700))

appended_data <- rbind(data, df2)

# Print appended data
print("Appended Data:")
## [1] "Appended Data:"
print(appended_data)
##                     Org_Indiv       First_Plus First_Name   Last_Name
## 1    3-D Medical Services Llc     Steven Bruce     Steven Deitelzweig
## 2            Aa Doctors, Inc.     Aakash Mohan     Aakash       Ahuja
## 3      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 4      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 5      Abbo, Lilian Margarita Lilian Margarita     Lilian        Abbo
## 6       Abdullah Raffee Md Pc         Abdullah   Abdullah      Raffee
## 7             Abebe, Sheila Y         Sheila Y     Sheila       Abebe
## 8             Abebe, Sheila Y         Sheila Y     Sheila       Abebe
## 9  Abilene Family Foot Center      Galen Chris      Galen   Albritton
## 10            Abolnik, Igor Z           Igor Z       Igor     Abolnik
## 11            Abolnik, Igor Z           Igor Z       Igor     Abolnik
## 12                 Sample Org         John Doe       John         Doe
##            City State                Category Cash Other Total
## 1   New Orleans    LA   Professional Advising 2625     0  2625
## 2   Paso Robles    CA       Expert-Led Forums 1000     0  1000
## 3         Miami    FL Business Related Travel    0   448   448
## 4         Miami    FL                   Meals    0   119   119
## 5         Miami    FL   Professional Advising 1800     0  1800
## 6         Flint    MI       Expert-Led Forums  750     0   750
## 7  Indianapolis    IN       Educational Items    0    47    47
## 8  Indianapolis    IN       Expert-Led Forums  825     0   825
## 9       Abilene    TX   Professional Advising 3000     0  3000
## 10        Provo    UT Business Related Travel    0   396   396
## 11        Provo    Ut       Expert-Led Forums 1750     0  1750
## 12  Sample City    XX   Professional Advising 1500   200  1700
# g) Write data to a CSV file
write.csv(data, file = "data.csv", row.names = FALSE)
# h) View the structure and summary of the given data
str(data)
## 'data.frame':    11 obs. of  10 variables:
##  $ Org_Indiv : chr  "3-D Medical Services Llc" "Aa Doctors, Inc." "Abbo, Lilian Margarita" "Abbo, Lilian Margarita" ...
##  $ First_Plus: chr  "Steven Bruce" "Aakash Mohan" "Lilian Margarita" "Lilian Margarita" ...
##  $ First_Name: chr  "Steven" "Aakash" "Lilian" "Lilian" ...
##  $ Last_Name : chr  "Deitelzweig" "Ahuja" "Abbo" "Abbo" ...
##  $ City      : chr  "New Orleans" "Paso Robles" "Miami" "Miami" ...
##  $ State     : chr  "LA" "CA" "FL" "FL" ...
##  $ Category  : chr  "Professional Advising" "Expert-Led Forums" "Business Related Travel" "Meals" ...
##  $ Cash      : int  2625 1000 0 0 1800 750 0 825 3000 0 ...
##  $ Other     : int  0 0 448 119 0 0 47 0 0 396 ...
##  $ Total     : int  2625 1000 448 119 1800 750 47 825 3000 396 ...
summary(data)
##   Org_Indiv          First_Plus         First_Name         Last_Name        
##  Length:11          Length:11          Length:11          Length:11         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      City              State             Category              Cash     
##  Length:11          Length:11          Length:11          Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:   0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 825  
##                                                           Mean   :1068  
##                                                           3rd Qu.:1775  
##                                                           Max.   :3000  
##      Other            Total     
##  Min.   :  0.00   Min.   :  47  
##  1st Qu.:  0.00   1st Qu.: 422  
##  Median :  0.00   Median : 825  
##  Mean   : 91.82   Mean   :1160  
##  3rd Qu.: 83.00   3rd Qu.:1775  
##  Max.   :448.00   Max.   :3000
# i) Convert the 'Total' column to a numeric variable
data$Total <- as.numeric(data$Total)
data$Total
##  [1] 2625 1000  448  119 1800  750   47  825 3000  396 1750
#mtcars dataset
data("mtcars")
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
is.null(mtcars)
## [1] FALSE

mtcars is an in-built dataset of rstudio. no null values are there in this dataset.

selecting specific columns using select function:

# Select the mpg and cyl columns
mtcars_select <- select(mtcars, mpg, cyl)
mtcars_select
##                      mpg cyl
## Mazda RX4           21.0   6
## Mazda RX4 Wag       21.0   6
## Datsun 710          22.8   4
## Hornet 4 Drive      21.4   6
## Hornet Sportabout   18.7   8
## Valiant             18.1   6
## Duster 360          14.3   8
## Merc 240D           24.4   4
## Merc 230            22.8   4
## Merc 280            19.2   6
## Merc 280C           17.8   6
## Merc 450SE          16.4   8
## Merc 450SL          17.3   8
## Merc 450SLC         15.2   8
## Cadillac Fleetwood  10.4   8
## Lincoln Continental 10.4   8
## Chrysler Imperial   14.7   8
## Fiat 128            32.4   4
## Honda Civic         30.4   4
## Toyota Corolla      33.9   4
## Toyota Corona       21.5   4
## Dodge Challenger    15.5   8
## AMC Javelin         15.2   8
## Camaro Z28          13.3   8
## Pontiac Firebird    19.2   8
## Fiat X1-9           27.3   4
## Porsche 914-2       26.0   4
## Lotus Europa        30.4   4
## Ford Pantera L      15.8   8
## Ferrari Dino        19.7   6
## Maserati Bora       15.0   8
## Volvo 142E          21.4   4

filtering the data through filter function using a condition:

# Filter for cars with more than 6 cylinders
mtcars_filter <- filter(mtcars, cyl > 6)
mtcars_filter
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

sorting using arrange function:

# Arrange the dataset by descending mpg
mtcars_arrange <- arrange(mtcars, desc(mpg))
mtcars_arrange
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4

desc function sorts data in descending order.

renaming a column using rename function:

# Rename the mpg column to miles_per_gallon
mtcars_rename <- rename(mtcars, miles_per_gallon = mpg)
mtcars_rename
##                     miles_per_gallon cyl  disp  hp drat    wt  qsec vs am gear
## Mazda RX4                       21.0   6 160.0 110 3.90 2.620 16.46  0  1    4
## Mazda RX4 Wag                   21.0   6 160.0 110 3.90 2.875 17.02  0  1    4
## Datsun 710                      22.8   4 108.0  93 3.85 2.320 18.61  1  1    4
## Hornet 4 Drive                  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3
## Hornet Sportabout               18.7   8 360.0 175 3.15 3.440 17.02  0  0    3
## Valiant                         18.1   6 225.0 105 2.76 3.460 20.22  1  0    3
## Duster 360                      14.3   8 360.0 245 3.21 3.570 15.84  0  0    3
## Merc 240D                       24.4   4 146.7  62 3.69 3.190 20.00  1  0    4
## Merc 230                        22.8   4 140.8  95 3.92 3.150 22.90  1  0    4
## Merc 280                        19.2   6 167.6 123 3.92 3.440 18.30  1  0    4
## Merc 280C                       17.8   6 167.6 123 3.92 3.440 18.90  1  0    4
## Merc 450SE                      16.4   8 275.8 180 3.07 4.070 17.40  0  0    3
## Merc 450SL                      17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
## Merc 450SLC                     15.2   8 275.8 180 3.07 3.780 18.00  0  0    3
## Cadillac Fleetwood              10.4   8 472.0 205 2.93 5.250 17.98  0  0    3
## Lincoln Continental             10.4   8 460.0 215 3.00 5.424 17.82  0  0    3
## Chrysler Imperial               14.7   8 440.0 230 3.23 5.345 17.42  0  0    3
## Fiat 128                        32.4   4  78.7  66 4.08 2.200 19.47  1  1    4
## Honda Civic                     30.4   4  75.7  52 4.93 1.615 18.52  1  1    4
## Toyota Corolla                  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4
## Toyota Corona                   21.5   4 120.1  97 3.70 2.465 20.01  1  0    3
## Dodge Challenger                15.5   8 318.0 150 2.76 3.520 16.87  0  0    3
## AMC Javelin                     15.2   8 304.0 150 3.15 3.435 17.30  0  0    3
## Camaro Z28                      13.3   8 350.0 245 3.73 3.840 15.41  0  0    3
## Pontiac Firebird                19.2   8 400.0 175 3.08 3.845 17.05  0  0    3
## Fiat X1-9                       27.3   4  79.0  66 4.08 1.935 18.90  1  1    4
## Porsche 914-2                   26.0   4 120.3  91 4.43 2.140 16.70  0  1    5
## Lotus Europa                    30.4   4  95.1 113 3.77 1.513 16.90  1  1    5
## Ford Pantera L                  15.8   8 351.0 264 4.22 3.170 14.50  0  1    5
## Ferrari Dino                    19.7   6 145.0 175 3.62 2.770 15.50  0  1    5
## Maserati Bora                   15.0   8 301.0 335 3.54 3.570 14.60  0  1    5
## Volvo 142E                      21.4   4 121.0 109 4.11 2.780 18.60  1  1    4
##                     carb
## Mazda RX4              4
## Mazda RX4 Wag          4
## Datsun 710             1
## Hornet 4 Drive         1
## Hornet Sportabout      2
## Valiant                1
## Duster 360             4
## Merc 240D              2
## Merc 230               2
## Merc 280               4
## Merc 280C              4
## Merc 450SE             3
## Merc 450SL             3
## Merc 450SLC            3
## Cadillac Fleetwood     4
## Lincoln Continental    4
## Chrysler Imperial      4
## Fiat 128               1
## Honda Civic            2
## Toyota Corolla         1
## Toyota Corona          1
## Dodge Challenger       2
## AMC Javelin            2
## Camaro Z28             4
## Pontiac Firebird       2
## Fiat X1-9              1
## Porsche 914-2          2
## Lotus Europa           2
## Ford Pantera L         4
## Ferrari Dino           6
## Maserati Bora          8
## Volvo 142E             2

add a column using mutate function:

# Add a new column for horsepower per weight
mtcars_mutate <- mutate(mtcars, hp_per_wt = hp / wt)
mtcars_mutate
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
##                     hp_per_wt
## Mazda RX4            41.98473
## Mazda RX4 Wag        38.26087
## Datsun 710           40.08621
## Hornet 4 Drive       34.21462
## Hornet Sportabout    50.87209
## Valiant              30.34682
## Duster 360           68.62745
## Merc 240D            19.43574
## Merc 230             30.15873
## Merc 280             35.75581
## Merc 280C            35.75581
## Merc 450SE           44.22604
## Merc 450SL           48.25737
## Merc 450SLC          47.61905
## Cadillac Fleetwood   39.04762
## Lincoln Continental  39.63864
## Chrysler Imperial    43.03087
## Fiat 128             30.00000
## Honda Civic          32.19814
## Toyota Corolla       35.42234
## Toyota Corona        39.35091
## Dodge Challenger     42.61364
## AMC Javelin          43.66812
## Camaro Z28           63.80208
## Pontiac Firebird     45.51365
## Fiat X1-9            34.10853
## Porsche 914-2        42.52336
## Lotus Europa         74.68605
## Ford Pantera L       83.28076
## Ferrari Dino         63.17690
## Maserati Bora        93.83754
## Volvo 142E           39.20863
# Combine two datasets with the same columns
mtcars_combined <- bind_rows(mtcars, mtcars)
mtcars_combined
##                           mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4...1            21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag...2        21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710...3           22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive...4       21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout...5    18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant...6              18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360...7           14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D...8            24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230...9             22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280...10            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C...11           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE...12          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL...13          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC...14         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood...15  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental...16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial...17   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128...18            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic...19         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla...20      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona...21       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger...22    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin...23         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28...24          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird...25    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9...26           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2...27       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa...28        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L...29      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino...30        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora...31       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E...32          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Mazda RX4...33           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag...34       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710...35          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive...36      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout...37   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant...38             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360...39          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D...40           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230...41            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280...42            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C...43           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE...44          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL...45          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC...46         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood...47  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental...48 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial...49   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128...50            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic...51         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla...52      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona...53       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger...54    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin...55         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28...56          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird...57    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9...58           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2...59       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa...60        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L...61      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino...62        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora...63       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E...64          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# Group the dataset by number of cylinders and summarize the mean mpg
mtcars_grouped <- group_by(mtcars, cyl)
mtcars_summarized <- summarize(mtcars_grouped, mean_mpg = mean(mpg))
mtcars_grouped
## # A tibble: 32 × 11
## # Groups:   cyl [3]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
## # ℹ 22 more rows
mtcars_summarized
## # A tibble: 3 × 2
##     cyl mean_mpg
##   <dbl>    <dbl>
## 1     4     26.7
## 2     6     19.7
## 3     8     15.1

the above functions are some of the data manipulation functions. these functions are available in dplyr package.