We will see some uses of the dplyr package by loading a data set of contestants on the Bachelorette season’s 11-15.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df <- read.csv("https://raw.githubusercontent.com/pmalo46/SPRING2020TIDYVERSE/master/BacheloretteDSFinal-Dogu.csv")
head(df)
## Season Name Age Hometown State
## 1 15 Jed Wyatt 25 Sevierville, Tennessee TN
## 2 15 Tyler Cameron 26 Jupiter, Florida FL
## 3 15 Peter Weber 27 Westlake Village, California CA
## 4 15 Luke Parker 24 Gainesville, Georgia GA
## 5 15 Garrett Powell 27 Homewood, Alabama AL
## 6 15 Mike Johnson 31 San Antonio, Texas TX
## College Occupation Win_Loss Height..cm.
## 1 Belmont University Singer/Sonwriter 1 190.50
## 2 Wake Forest General Contractor 0 187.96
## 3 Baylor University Pilot 0 175.25
## 4 Faulkner University Import/Export Manager 0 175.00
## 5 Mississippi State University Golf Pro 0 NaN
## 6 NaN Portfolio Manager 0 180.00
## Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1 Yes Brown Brown
## 2 No Brown Green
## 3 No Brown Brown
## 4 No Blonde Brown
## 5 No Brown Green
## 6 No Brown Brown
One of the most useful functions in the dplyr package is the filter function, which allows us to filter down to only rows that meet a certain condition.
filter(df, Win_Loss == 1)
## Season Name Age Hometown State
## 1 15 Jed Wyatt 25 Sevierville, Tennessee TN
## 2 14 Jason Tartick 29 Buffalo, New York NY
## 3 13 Bryan Abasolo 37 Miami, Florida FL
## 4 12 Jordan Rodgers 27 Chico, California CA
## 5 11 Shawn Booth 28 Windsor Locks, Connecticut CT
## College Occupation Win_Loss Height..cm.
## 1 Belmont University Singer/Sonwriter 1 190.50
## 2 University of Rochester Senior Corporate Banker 1 175.26
## 3 University of Florida Chiropractor 1 187.96
## 4 Butte College Former Pro Quarterback 1 187.96
## 5 Keene State College Personal Trainer 1 187.96
## Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1 Yes Brown Brown
## 2 No Brown Brown
## 3 No Brown Brown
## 4 No Brown Brown
## 5 No Brown Brown
The table above shows the winners of the last five seasons. Another useful function is the group_by function.
group_by(df, State) %>%
summarise(mean(Height..cm.))
## # A tibble: 32 x 2
## State `mean(Height..cm.)`
## <fct> <dbl>
## 1 AL NaN
## 2 AR 188.
## 3 AZ 185.
## 4 CA NaN
## 5 CO 187.
## 6 CT NaN
## 7 FL NaN
## 8 GA NaN
## 9 IA NaN
## 10 ID 188.
## # ... with 22 more rows
The chunk above uses the group_by method to group the contestants by which state they are from, and then take the average height by state. Another dplyr method, ‘summarise’ is on display here, which allows us to reduce multiple values down to a single value. Another useful function is arrange()
as_tibble(tail(arrange(df, Occupation), 15))
## # A tibble: 15 x 12
## Season Name Age Hometown State College Occupation Win_Loss Height..cm.
## <int> <fct> <int> <fct> <fct> <fct> <fct> <int> <dbl>
## 1 12 "Nic~ 26 San Fra~ CA Other "Software~ 0 NaN
## 2 11 "Ben~ 26 Warsaw,~ IN Indian~ "Software~ 0 198.
## 3 14 "Mic~ 27 Cincinn~ OH Univer~ "Sports A~ 0 NaN
## 4 12 "Pet~ 26 Rockdal~ IL Joliet~ "Staffing~ 0 180.
## 5 13 "Dea~ 26 Aspen, ~ CO Univer~ "Startup ~ 0 188
## 6 15 "Dev~ 27 Sherman~ CA Univer~ "Talent M~ 0 NaN
## 7 15 "Dyl~ 24 San Die~ CA Willia~ "Tech Ent~ 0 180.
## 8 12 "Jon~ 29 Vancouv~ Other Other "Technica~ 0 185.
## 9 12 "Chr~ 26 Los Ang~ CA Califo~ "Telecom ~ 0 180.
## 10 15 "Joe~ 30 Chicago~ IL North ~ "The Box ~ 0 NaN
## 11 12 "Ale~ 25 Oceansi~ CA Palm B~ "U.S. Mar~ 0 170.
## 12 13 "Bla~ 29 San Fra~ CA Other "U.S. Mar~ 0 180.
## 13 15 "Gra~ 30 San Cle~ CA Saddle~ "Unemploy~ 0 NaN
## 14 14 "Dav~ 25 Cherry ~ NJ Univer~ "Venture ~ 0 NaN
## 15 12 "Luk~ 31 Burnet,~ TX West P~ "War Vete~ 0 185.
## # ... with 3 more variables: Girlfriend.While.on.the.Show. <fct>,
## # Hair.Color <fct>, Eye.Color <fct>
The chunk above uses arrange() to sort the contestants alphabetically, while the as_tibble method makes the output more easily viewable.
These demonstrate some of the many uses of the great dplyr package.
In addition to all of the above features, there is much more that dplyr can do! For example, dplyr is able to extract a random sample of your data, either by selectign a certain number of samples or a certain percentage of your data to sample.
n <- sample_n(df,20,replace = FALSE)
frac <- sample_frac(df,0.5,replace = FALSE)
n
## Season Name Age Hometown State
## 1 15 Devin Harris 27 Sherman Oaks, California CA
## 2 13 Peter Kraus 31 Madison, Wisconsin WI
## 3 15 Brian Bowles 30 Louisville, Kentucky KY
## 4 11 Daniel Finney 28 Nashville, Tennessee TN
## 5 13 Eric Bigger 29 Baltimore, Maryland MD
## 6 14 Connor Obrochta 25 St. Petersburg, Florida FL
## 7 12 William "Will" Haduch 26 Jersey City, New Jersey NJ
## 8 15 Luke Stone 29 Marion, Massachusetts MA
## 9 13 Mohit Sehgal 26 Pacifica, California CA
## 10 14 John Graham 28 Chicago, Illinois IL
## 11 15 Matt Donald 26 Los Gatos, California CA
## 12 14 Ryan Peterson 26 Mashpee, Massachusetts MA
## 13 15 Jonathan Saunders 27 Los Angeles, California CA
## 14 15 Tyler Cameron 26 Jupiter, Florida FL
## 15 13 Anthony Battle 26 Chicago, Illinois IL
## 16 13 Blake Killpack 29 San Francisco, California CA
## 17 12 Nicholas "Nick" Benvenutti 33 Carthage, Illinois IL
## 18 12 James Fuertes 34 Franklin, Tennessee TN
## 19 12 Wells Adams 31 Monterey, California CA
## 20 13 DeMario Jackson 30 Century City, California CA
## College Occupation Win_Loss
## 1 University of San Diego Talent Manager 0
## 2 Madison Technical College Business Owner 0
## 3 Centre College Math Teacher 0
## 4 Other Fashion Designer 0
## 5 Hampton University Personal Trainer 0
## 6 University of Tampa Fitness Coach 0
## 7 The College of New Jersey Civil Engineer 0
## 8 The George Washington University Political Consultant 0
## 9 Other Product Manager 0
## 10 Columbia University Software Engineer 0
## 11 High Point University Medical Device Salesman 0
## 12 University of California, Davis Banjoist 0
## 13 Other Server 0
## 14 Wake Forest General Contractor 0
## 15 Northwestern University Education Software Manager 0
## 16 Other U.S. Marine Veteran 0
## 17 Clemson University Electrical Engineer 0
## 18 Liberty University Boxing Club Owner 0
## 19 University of Oxford Radio DJ 0
## 20 University of California, Fresno Executive Recruiter 0
## Height..cm. Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1 NaN No Brown Brown
## 2 190.50 No Brown Brown
## 3 NaN No Brown Brown
## 4 190.50 No Brown Blue
## 5 187.96 No Brown Brown
## 6 182.80 No Brown Brown
## 7 190.50 No Brown Green
## 8 187.96 No Brown Brown
## 9 175.00 No Brown Brown
## 10 NaN No Brown Brown
## 11 NaN No Blonde Green
## 12 182.80 No Brown Brown
## 13 NaN No Brown Brown
## 14 187.96 No Brown Green
## 15 190.50 No Brown Brown
## 16 180.33 No Brown Brown
## 17 185.00 No Brown Brown
## 18 187.96 No Brown Brown
## 19 182.88 No Brown Brown
## 20 193.00 Yes Brown Brown
frac
## Season Name Age Hometown State
## 1 14 Michael "Mike" Renner 27 Cincinnati, Ohio OH
## 2 12 Jordan Rodgers 27 Chico, California CA
## 3 11 Jonathan Holloway 33 Sylvan Lake, Michigan MI
## 4 15 Garrett Powell 27 Homewood, Alabama AL
## 5 12 Salvatore "Sal" DeJulio 28 Hubbard, Ohio OH
## 6 12 Chase McNary 27 Castle Rock, Colorado CO
## 7 14 Jason Tartick 29 Buffalo, New York NY
## 8 14 Nicholas "Nick" Spetsas 27 Palm Coast, Florida FL
## 9 15 Tyler Gwozdz 28 Boca Raton, Florida FL
## 10 14 Christian Estrada 28 San Diego, California CA
## 11 14 John Graham 28 Chicago, Illinois IL
## 12 14 Colton Underwood 26 Washington, Illinois IL
## 13 11 Joseph "Joe" Bailey 28 Glasgow, Kentucky KY
## 14 13 Blake Elarbee 31 Marina del Rey, California CA
## 15 11 Corey Stansell 30 New York, New York NY
## 16 13 Jedidiah Ballard 35 Augusta, Georgia GA
## 17 12 Vincent "Vinny" Ventiera 28 Kings Park, New York NY
## 18 14 Christon Staples 31 Los Angeles, California CA
## 19 12 Brandon Howell 28 Marysville, Washington WA
## 20 14 Darius Feaster 26 Sherman Oaks, California CA
## 21 11 Daniel Finney 28 Nashville, Tennessee TN
## 22 12 Alexander "Alex" Woytkiw 25 Oceanside, California CA
## 23 13 Dean Unglert 26 Aspen, Colorado CO
## 24 12 William "Will" Haduch 26 Jersey City, New Jersey NJ
## 25 14 Lincoln Adim 26 Boston, Massachusetts MA
## 26 12 Grant Kemp 27 San Francisco, California CA
## 27 11 Brady Toops 33 Wauseon, Ohio OH
## 28 15 Matt Donald 26 Los Gatos, California CA
## 29 15 Connor Jenkins 28 Newport Beach, California CA
## 30 14 Blake Horstmann 28 Bailey, Colorado CO
## 31 13 Blake Killpack 29 San Francisco, California CA
## 32 13 Matthew "Matt" Munson 32 Meriden, Connecticut CT
## 33 15 John Paul Jones 24 Lanham, Maryland MD
## 34 11 Joshua "Josh" Seiter 27 Chicago, Illinois IL
## 35 11 Ryan McDill 28 Kansas City, Missouri MO
## 36 13 Milton LaCroix 31 North Bay Village, Florida FL
## 37 15 Thomas Staton 27 Detroit, Michigan MI
## 38 15 Chasen Coscia 27 Ann Arbor, Michigan MI
## 39 12 Luke Pell 31 Burnet, Texas TX
## 40 13 Bryce Powers 30 Orlando, Florida FL
## 41 11 John "J.J." Lane III 32 Dacono, Colorado CO
## 42 15 Hunter Jones 24 Westchester, California CA
## 43 12 Nicholas "Nick" Benvenutti 33 Carthage, Illinois IL
## 44 14 Connor Obrochta 25 St. Petersburg, Florida FL
## 45 15 Jed Wyatt 25 Sevierville, Tennessee TN
## 46 15 Dylan Barbour 24 San Diego, California CA
## 47 11 Benjamin "Ben" Higgins 26 Warsaw, Indiana IN
## 48 11 Ian Thomson 28 Ramsey, New Jersey NJ
## 49 13 Peter Kraus 31 Madison, Wisconsin WI
## 50 12 Christian Bishop 26 Los Angeles, California CA
## 51 11 Bradley Cox 25 Duluth, Georgia GA
## 52 15 Connor Saeli 24 Birmingham, Michigan MI
## 53 11 Benedikt "Ben" Zorn 26 Falls Church, Virginia VA
## 54 14 Grant Vandevanter 27 Danville, California CA
## 55 12 Jake Patton 26 Playa Vista, California CA
## 56 15 Dustin Kendrick 30 Chicago, Illinois IL
## 57 12 Evan Bass 33 Hartford, Connecticut CT
## 58 14 Leandro "Leo" Dottavio 31 Los Angeles, California CA
## 59 11 Clinton "Clint" Arlis 27 Batavia, Illinois IL
## 60 15 Luke Stone 29 Marion, Massachusetts MA
## 61 15 Devin Harris 27 Sherman Oaks, California CA
## 62 11 Cory Shivar 35 Seven Springs, North Carolina NC
## 63 12 Wells Adams 31 Monterey, California CA
## 64 11 David Cox 28 Orlando, Florida FL
## 65 13 Lucas Yancey 30 Woodside, California CA
## 66 12 James Spadafore 27 Phoenix, Arizona AZ
## 67 13 Bryan Abasolo 37 Miami, Florida FL
## 68 11 Tanner Tolbert 28 Stilwell, Kansas KA
## 69 11 Nicholas "Nick" Viall 34 Waukesha, Wisconsin WI
## 70 15 Ryan Spirko 25 Philadelphia, Pennsylvania PA
## College Occupation
## 1 University of Notre Dame Sports Analyst
## 2 Butte College Former Pro Quarterback
## 3 Other Automotive Spokesman
## 4 Mississippi State University Golf Pro
## 5 Other Operations Manager
## 6 Colorado State University Medical Sales Rep
## 7 University of Rochester Senior Corporate Banker
## 8 Flagler University Attorney
## 9 Wake Forest Psychology Graduate Student
## 10 Other Banker
## 11 Columbia University Software Engineer
## 12 Illinois State University Former Pro Football Player
## 13 Other Insurance Agent
## 14 Other Aspiring Drummer
## 15 University of Houston Investment Banker
## 16 Other ER Physician
## 17 Five Towns College Barber
## 18 Wayne State University Former Harlem Globetrotter
## 19 Other Hipster
## 20 University of Wisconsin Pharmaceutical Sales Representative
## 21 Other Fashion Designer
## 22 Palm Beach State College U.S. Marine
## 23 University of Colorado Startup Recruiter
## 24 The College of New Jersey Civil Engineer
## 25 University of Kentucky Account Sales Executive
## 26 University of Pretoria Firefighter
## 27 Other Singer/Songwriter
## 28 High Point University Medical Device Salesman
## 29 University of Missouri-Columbia Sales Manager
## 30 Hastings College Sales Representative
## 31 Other U.S. Marine Veteran
## 32 Other Construction Sales Rep
## 33 Catholic University of America Financial Analyst
## 34 Chicago-Kent College Law Student/Exotic Dancer
## 35 Other Junkyard Specialist
## 36 Other Hotel Recreation Supervisor
## 37 Other International Pro Basketball Player
## 38 Other Pilot
## 39 West Point War Veteran
## 40 Other Firefighter
## 41 Westmont College Former Investment Banker
## 42 Other Pro Surfer
## 43 Clemson University Electrical Engineer
## 44 University of Tampa Fitness Coach
## 45 Belmont University Singer/Sonwriter
## 46 Williams College Tech Entrepreneur
## 47 Indiana University Software Salesman
## 48 Other Executive Recruiter
## 49 Madison Technical College Business Owner
## 50 California St Northridge Telecom Consultant
## 51 Other International Auto Shipper
## 52 Souther Methodist University Investment Analyst
## 53 San Jose State College Fitness Coach
## 54 Hastings College Electrician
## 55 Other Landscape Architect
## 56 Northeastern Illionois University Real Estate Broker
## 57 Other Erectile Dysfunction Specialist
## 58 California State University Actor
## 59 Other Architectural Engineer
## 60 The George Washington University Political Consultant
## 61 University of San Diego Talent Manager
## 62 Other Residential Developer
## 63 University of Oxford Radio DJ
## 64 Other Real Estate Agent
## 65 UC Berkeley Real Estate Agent
## 66 Onondaga Community College Bachelor Superfan
## 67 University of Florida Chiropractor
## 68 University of Kansas Auto Finance Manager
## 69 University of Wisconsin Software Sales Executive
## 70 Lehigh University Roller Boy
## Win_Loss Height..cm. Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1 0 NaN No Blonde Brown
## 2 1 187.96 No Brown Brown
## 3 0 NaN No Brown Brown
## 4 0 NaN No Brown Green
## 5 0 187.96 No Brown Blue
## 6 0 190.50 No Brown Brown
## 7 1 175.26 No Brown Brown
## 8 0 NaN No Brown Brown
## 9 0 NaN No Brown Blue
## 10 0 182.80 No Brown Brown
## 11 0 NaN No Brown Brown
## 12 0 190.50 No Blonde Blue
## 13 0 NaN No Brown Brown
## 14 0 182.88 No Brown Brown
## 15 0 182.88 No Brown Brown
## 16 0 177.80 No Brown Brown
## 17 0 182.88 No Brown Brown
## 18 0 190.50 No Brown Brown
## 19 0 193.04 No Brown Brown
## 20 0 180.30 No Brown Brown
## 21 0 190.50 No Brown Blue
## 22 0 170.18 No Brown Brown
## 23 0 188.00 No Brown Blue
## 24 0 190.50 No Brown Green
## 25 0 NaN No Brown Brown
## 26 0 187.96 No Brown Brown
## 27 0 187.96 No Brown Brown
## 28 0 NaN No Blonde Green
## 29 0 NaN No Brown Brown
## 30 0 180.30 No Brown Brown
## 31 0 180.33 No Brown Brown
## 32 0 190.50 No Brown Brown
## 33 0 172.72 No Blonde Blue
## 34 0 187.96 No Brown Brown
## 35 0 190.50 No Brown Blue
## 36 0 193.04 No Brown Brown
## 37 0 NaN No Brown Brown
## 38 0 NaN No Brown Brown
## 39 0 185.42 No Brown Brown
## 40 0 187.96 No Brown Brown
## 41 0 185.00 No Brown Brown
## 42 0 NaN No Brown Brown
## 43 0 185.00 No Brown Brown
## 44 0 182.80 No Brown Brown
## 45 1 190.50 Yes Brown Brown
## 46 0 180.30 No Brown Brown
## 47 0 198.12 No Brown Brown
## 48 0 185.42 No Brown Brown
## 49 0 190.50 No Brown Brown
## 50 0 180.33 No Brown Brown
## 51 0 187.96 No Brown Brown
## 52 0 198.00 No Brown Brown
## 53 0 193.04 No Brown Blue
## 54 0 NaN No Brown Brown
## 55 0 187.96 No Brown Brown
## 56 0 188.00 No Brown Brown
## 57 0 180.33 No Brown Brown
## 58 0 193.00 No Brown Brown
## 59 0 175.25 No Blonde Blue
## 60 0 187.96 No Brown Brown
## 61 0 NaN No Brown Brown
## 62 0 182.88 No Brown Brown
## 63 0 182.88 No Brown Brown
## 64 0 NaN No Brown Brown
## 65 0 182.88 No Brown Brown
## 66 0 185.42 No Brown Brown
## 67 1 187.96 No Brown Brown
## 68 0 190.50 No Brown Brown
## 69 0 187.96 No Brown Brown
## 70 0 NaN No Brown Brown
You can also select the top rows using the top_n function.
byState <- group_by(df, State) %>%
summarise(mean(Height..cm.)) %>%
top_n(6, `mean(Height..cm.)`)
byState
## # A tibble: 6 x 2
## State `mean(Height..cm.)`
## <fct> <dbl>
## 1 IN 198.
## 2 KA 190.
## 3 MO 189.
## 4 TN 190.
## 5 WA 193.
## 6 WI 189.
One last thing you can do with dplyr is use the count function to see how often an instance of a variable occurs in a dataset.
topCollege <- top_n(count(df,College),5)
## Selecting by n
topHairColor <- top_n(count(df,Hair.Color),5)
## Selecting by n
topEyeColor <- top_n(count(df,Eye.Color),5)
## Selecting by n
topCollege
## # A tibble: 11 x 2
## College n
## <fct> <int>
## 1 Florida State University 2
## 2 Hastings College 2
## 3 Indiana University 2
## 4 Other 43
## 5 Palm Beach State College 2
## 6 Texas A&M University 2
## 7 University of California, Davis 2
## 8 University of Central Florida 2
## 9 University of Florida 2
## 10 University of Wisconsin 2
## 11 Wake Forest 2
topHairColor
## # A tibble: 2 x 2
## Hair.Color n
## <fct> <int>
## 1 Blonde 8
## 2 Brown 133
topEyeColor
## # A tibble: 3 x 2
## Eye.Color n
## <fct> <int>
## 1 Blue 14
## 2 Brown 120
## 3 Green 7
This is just a small sample of all the amazing things that dplyr can do!