Introduction

We will see some uses of the dplyr package by loading a data set of contestants on the Bachelorette season’s 11-15.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
df <- read.csv("https://raw.githubusercontent.com/pmalo46/SPRING2020TIDYVERSE/master/BacheloretteDSFinal-Dogu.csv")
head(df)
##   Season           Name Age                     Hometown State
## 1     15      Jed Wyatt  25       Sevierville, Tennessee    TN
## 2     15  Tyler Cameron  26             Jupiter, Florida    FL
## 3     15    Peter Weber  27 Westlake Village, California    CA
## 4     15    Luke Parker  24         Gainesville, Georgia    GA
## 5     15 Garrett Powell  27            Homewood, Alabama    AL
## 6     15   Mike Johnson  31           San Antonio, Texas    TX
##                        College            Occupation Win_Loss Height..cm.
## 1           Belmont University      Singer/Sonwriter        1      190.50
## 2                  Wake Forest    General Contractor        0      187.96
## 3            Baylor University                 Pilot        0      175.25
## 4          Faulkner University Import/Export Manager        0      175.00
## 5 Mississippi State University              Golf Pro        0         NaN
## 6                          NaN     Portfolio Manager        0      180.00
##   Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1                           Yes      Brown     Brown
## 2                            No      Brown     Green
## 3                            No      Brown     Brown
## 4                            No     Blonde     Brown
## 5                            No      Brown     Green
## 6                            No      Brown     Brown

One of the most useful functions in the dplyr package is the filter function, which allows us to filter down to only rows that meet a certain condition.

filter(df, Win_Loss == 1)
##   Season           Name Age                   Hometown State
## 1     15      Jed Wyatt  25     Sevierville, Tennessee    TN
## 2     14  Jason Tartick  29          Buffalo, New York    NY
## 3     13  Bryan Abasolo  37             Miami, Florida    FL
## 4     12 Jordan Rodgers  27          Chico, California    CA
## 5     11    Shawn Booth  28 Windsor Locks, Connecticut    CT
##                   College              Occupation Win_Loss Height..cm.
## 1      Belmont University        Singer/Sonwriter        1      190.50
## 2 University of Rochester Senior Corporate Banker        1      175.26
## 3   University of Florida            Chiropractor        1      187.96
## 4           Butte College  Former Pro Quarterback        1      187.96
## 5     Keene State College        Personal Trainer        1      187.96
##   Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1                           Yes      Brown     Brown
## 2                            No      Brown     Brown
## 3                            No      Brown     Brown
## 4                            No      Brown     Brown
## 5                            No      Brown     Brown

The table above shows the winners of the last five seasons. Another useful function is the group_by function.

group_by(df, State) %>%
  summarise(mean(Height..cm.))
## # A tibble: 32 x 2
##    State `mean(Height..cm.)`
##    <fct>               <dbl>
##  1 AL                   NaN 
##  2 AR                   188.
##  3 AZ                   185.
##  4 CA                   NaN 
##  5 CO                   187.
##  6 CT                   NaN 
##  7 FL                   NaN 
##  8 GA                   NaN 
##  9 IA                   NaN 
## 10 ID                   188.
## # ... with 22 more rows

The chunk above uses the group_by method to group the contestants by which state they are from, and then take the average height by state. Another dplyr method, ‘summarise’ is on display here, which allows us to reduce multiple values down to a single value. Another useful function is arrange()

as_tibble(tail(arrange(df, Occupation), 15))
## # A tibble: 15 x 12
##    Season Name    Age Hometown State College Occupation Win_Loss Height..cm.
##     <int> <fct> <int> <fct>    <fct> <fct>   <fct>         <int>       <dbl>
##  1     12 "Nic~    26 San Fra~ CA    Other   "Software~        0        NaN 
##  2     11 "Ben~    26 Warsaw,~ IN    Indian~ "Software~        0        198.
##  3     14 "Mic~    27 Cincinn~ OH    Univer~ "Sports A~        0        NaN 
##  4     12 "Pet~    26 Rockdal~ IL    Joliet~ "Staffing~        0        180.
##  5     13 "Dea~    26 Aspen, ~ CO    Univer~ "Startup ~        0        188 
##  6     15 "Dev~    27 Sherman~ CA    Univer~ "Talent M~        0        NaN 
##  7     15 "Dyl~    24 San Die~ CA    Willia~ "Tech Ent~        0        180.
##  8     12 "Jon~    29 Vancouv~ Other Other   "Technica~        0        185.
##  9     12 "Chr~    26 Los Ang~ CA    Califo~ "Telecom ~        0        180.
## 10     15 "Joe~    30 Chicago~ IL    North ~ "The Box ~        0        NaN 
## 11     12 "Ale~    25 Oceansi~ CA    Palm B~ "U.S. Mar~        0        170.
## 12     13 "Bla~    29 San Fra~ CA    Other   "U.S. Mar~        0        180.
## 13     15 "Gra~    30 San Cle~ CA    Saddle~ "Unemploy~        0        NaN 
## 14     14 "Dav~    25 Cherry ~ NJ    Univer~ "Venture ~        0        NaN 
## 15     12 "Luk~    31 Burnet,~ TX    West P~ "War Vete~        0        185.
## # ... with 3 more variables: Girlfriend.While.on.the.Show. <fct>,
## #   Hair.Color <fct>, Eye.Color <fct>

The chunk above uses arrange() to sort the contestants alphabetically, while the as_tibble method makes the output more easily viewable.

These demonstrate some of the many uses of the great dplyr package.

Extension

In addition to all of the above features, there is much more that dplyr can do! For example, dplyr is able to extract a random sample of your data, either by selectign a certain number of samples or a certain percentage of your data to sample.

n <- sample_n(df,20,replace = FALSE)
frac <- sample_frac(df,0.5,replace = FALSE)

n
##    Season                       Name Age                  Hometown State
## 1      15               Devin Harris  27  Sherman Oaks, California    CA
## 2      13                Peter Kraus  31        Madison, Wisconsin    WI
## 3      15               Brian Bowles  30      Louisville, Kentucky    KY
## 4      11              Daniel Finney  28      Nashville, Tennessee    TN
## 5      13                Eric Bigger  29       Baltimore, Maryland    MD
## 6      14            Connor Obrochta  25   St. Petersburg, Florida    FL
## 7      12      William "Will" Haduch  26   Jersey City, New Jersey    NJ
## 8      15                 Luke Stone  29     Marion, Massachusetts    MA
## 9      13               Mohit Sehgal  26      Pacifica, California    CA
## 10     14                John Graham  28         Chicago, Illinois    IL
## 11     15                Matt Donald  26     Los Gatos, California    CA
## 12     14              Ryan Peterson  26    Mashpee, Massachusetts    MA
## 13     15          Jonathan Saunders  27   Los Angeles, California    CA
## 14     15              Tyler Cameron  26          Jupiter, Florida    FL
## 15     13             Anthony Battle  26         Chicago, Illinois    IL
## 16     13             Blake Killpack  29 San Francisco, California    CA
## 17     12 Nicholas "Nick" Benvenutti  33        Carthage, Illinois    IL
## 18     12              James Fuertes  34       Franklin, Tennessee    TN
## 19     12                Wells Adams  31      Monterey, California    CA
## 20     13            DeMario Jackson  30  Century City, California    CA
##                             College                 Occupation Win_Loss
## 1           University of San Diego             Talent Manager        0
## 2         Madison Technical College             Business Owner        0
## 3                    Centre College               Math Teacher        0
## 4                             Other           Fashion Designer        0
## 5               Hampton University            Personal Trainer        0
## 6               University of Tampa              Fitness Coach        0
## 7         The College of New Jersey             Civil Engineer        0
## 8  The George Washington University       Political Consultant        0
## 9                             Other            Product Manager        0
## 10              Columbia University          Software Engineer        0
## 11            High Point University    Medical Device Salesman        0
## 12  University of California, Davis                   Banjoist        0
## 13                            Other                     Server        0
## 14                      Wake Forest         General Contractor        0
## 15          Northwestern University Education Software Manager        0
## 16                            Other        U.S. Marine Veteran        0
## 17               Clemson University        Electrical Engineer        0
## 18               Liberty University          Boxing Club Owner        0
## 19             University of Oxford                   Radio DJ        0
## 20 University of California, Fresno        Executive Recruiter        0
##    Height..cm. Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1          NaN                            No      Brown     Brown
## 2       190.50                            No      Brown     Brown
## 3          NaN                            No      Brown     Brown
## 4       190.50                            No      Brown      Blue
## 5       187.96                            No      Brown     Brown
## 6       182.80                            No      Brown     Brown
## 7       190.50                            No      Brown     Green
## 8       187.96                            No      Brown     Brown
## 9       175.00                            No      Brown     Brown
## 10         NaN                            No      Brown     Brown
## 11         NaN                            No     Blonde     Green
## 12      182.80                            No      Brown     Brown
## 13         NaN                            No      Brown     Brown
## 14      187.96                            No      Brown     Green
## 15      190.50                            No      Brown     Brown
## 16      180.33                            No      Brown     Brown
## 17      185.00                            No      Brown     Brown
## 18      187.96                            No      Brown     Brown
## 19      182.88                            No      Brown     Brown
## 20      193.00                           Yes      Brown     Brown
frac
##    Season                       Name Age                      Hometown State
## 1      14      Michael "Mike" Renner  27              Cincinnati, Ohio    OH
## 2      12             Jordan Rodgers  27             Chico, California    CA
## 3      11          Jonathan Holloway  33         Sylvan Lake, Michigan    MI
## 4      15             Garrett Powell  27             Homewood, Alabama    AL
## 5      12    Salvatore "Sal" DeJulio  28                 Hubbard, Ohio    OH
## 6      12               Chase McNary  27         Castle Rock, Colorado    CO
## 7      14              Jason Tartick  29             Buffalo, New York    NY
## 8      14    Nicholas "Nick" Spetsas  27           Palm Coast, Florida    FL
## 9      15               Tyler Gwozdz  28           Boca Raton, Florida    FL
## 10     14          Christian Estrada  28         San Diego, California    CA
## 11     14                John Graham  28             Chicago, Illinois    IL
## 12     14           Colton Underwood  26          Washington, Illinois    IL
## 13     11        Joseph "Joe" Bailey  28             Glasgow, Kentucky    KY
## 14     13              Blake Elarbee  31    Marina del Rey, California    CA
## 15     11            Corey Stansell   30            New York, New York    NY
## 16     13           Jedidiah Ballard  35              Augusta, Georgia    GA
## 17     12   Vincent "Vinny" Ventiera  28          Kings Park, New York    NY
## 18     14           Christon Staples  31       Los Angeles, California    CA
## 19     12             Brandon Howell  28        Marysville, Washington    WA
## 20     14             Darius Feaster  26      Sherman Oaks, California    CA
## 21     11              Daniel Finney  28          Nashville, Tennessee    TN
## 22     12   Alexander "Alex" Woytkiw  25         Oceanside, California    CA
## 23     13               Dean Unglert  26               Aspen, Colorado    CO
## 24     12      William "Will" Haduch  26       Jersey City, New Jersey    NJ
## 25     14               Lincoln Adim  26         Boston, Massachusetts    MA
## 26     12                 Grant Kemp  27     San Francisco, California    CA
## 27     11                Brady Toops  33                 Wauseon, Ohio    OH
## 28     15                Matt Donald  26         Los Gatos, California    CA
## 29     15             Connor Jenkins  28     Newport Beach, California    CA
## 30     14            Blake Horstmann  28              Bailey, Colorado    CO
## 31     13             Blake Killpack  29     San Francisco, California    CA
## 32     13      Matthew "Matt" Munson  32          Meriden, Connecticut    CT
## 33     15            John Paul Jones  24              Lanham, Maryland    MD
## 34     11       Joshua "Josh" Seiter  27             Chicago, Illinois    IL
## 35     11                Ryan McDill  28         Kansas City, Missouri    MO
## 36     13             Milton LaCroix  31    North Bay Village, Florida    FL
## 37     15              Thomas Staton  27             Detroit, Michigan    MI
## 38     15              Chasen Coscia  27           Ann Arbor, Michigan    MI
## 39     12                  Luke Pell  31                 Burnet, Texas    TX
## 40     13               Bryce Powers  30              Orlando, Florida    FL
## 41     11       John "J.J." Lane III  32              Dacono, Colorado    CO
## 42     15               Hunter Jones  24       Westchester, California    CA
## 43     12 Nicholas "Nick" Benvenutti  33            Carthage, Illinois    IL
## 44     14            Connor Obrochta  25       St. Petersburg, Florida    FL
## 45     15                  Jed Wyatt  25        Sevierville, Tennessee    TN
## 46     15              Dylan Barbour  24         San Diego, California    CA
## 47     11     Benjamin "Ben" Higgins  26               Warsaw, Indiana    IN
## 48     11                Ian Thomson  28            Ramsey, New Jersey    NJ
## 49     13                Peter Kraus  31            Madison, Wisconsin    WI
## 50     12           Christian Bishop  26       Los Angeles, California    CA
## 51     11                Bradley Cox  25               Duluth, Georgia    GA
## 52     15               Connor Saeli  24          Birmingham, Michigan    MI
## 53     11        Benedikt "Ben" Zorn  26        Falls Church, Virginia    VA
## 54     14          Grant Vandevanter  27          Danville, California    CA
## 55     12                Jake Patton  26       Playa Vista, California    CA
## 56     15            Dustin Kendrick  30             Chicago, Illinois    IL
## 57     12                  Evan Bass  33         Hartford, Connecticut    CT
## 58     14     Leandro "Leo" Dottavio  31       Los Angeles, California    CA
## 59     11      Clinton "Clint" Arlis  27             Batavia, Illinois    IL
## 60     15                 Luke Stone  29         Marion, Massachusetts    MA
## 61     15               Devin Harris  27      Sherman Oaks, California    CA
## 62     11                Cory Shivar  35 Seven Springs, North Carolina    NC
## 63     12                Wells Adams  31          Monterey, California    CA
## 64     11                  David Cox  28              Orlando, Florida    FL
## 65     13               Lucas Yancey  30          Woodside, California    CA
## 66     12            James Spadafore  27              Phoenix, Arizona    AZ
## 67     13              Bryan Abasolo  37                Miami, Florida    FL
## 68     11             Tanner Tolbert  28              Stilwell, Kansas    KA
## 69     11      Nicholas "Nick" Viall  34           Waukesha, Wisconsin    WI
## 70     15                Ryan Spirko  25    Philadelphia, Pennsylvania    PA
##                              College                          Occupation
## 1           University of Notre Dame                      Sports Analyst
## 2                      Butte College              Former Pro Quarterback
## 3                              Other                Automotive Spokesman
## 4       Mississippi State University                            Golf Pro
## 5                              Other                  Operations Manager
## 6          Colorado State University                   Medical Sales Rep
## 7            University of Rochester             Senior Corporate Banker
## 8                 Flagler University                            Attorney
## 9                        Wake Forest         Psychology Graduate Student
## 10                             Other                              Banker
## 11               Columbia University                   Software Engineer
## 12         Illinois State University          Former Pro Football Player
## 13                             Other                     Insurance Agent
## 14                             Other                    Aspiring Drummer
## 15             University of Houston                   Investment Banker
## 16                             Other                        ER Physician
## 17                Five Towns College                              Barber
## 18            Wayne State University         Former Harlem Globetrotter
## 19                             Other                             Hipster
## 20           University of Wisconsin Pharmaceutical Sales Representative
## 21                             Other                    Fashion Designer
## 22          Palm Beach State College                         U.S. Marine
## 23            University of Colorado                   Startup Recruiter
## 24         The College of New Jersey                      Civil Engineer
## 25            University of Kentucky             Account Sales Executive
## 26            University of Pretoria                         Firefighter
## 27                             Other                   Singer/Songwriter
## 28             High Point University             Medical Device Salesman
## 29   University of Missouri-Columbia                       Sales Manager
## 30                  Hastings College                Sales Representative
## 31                             Other                 U.S. Marine Veteran
## 32                             Other              Construction Sales Rep
## 33    Catholic University of America                   Financial Analyst
## 34              Chicago-Kent College           Law Student/Exotic Dancer
## 35                             Other                 Junkyard Specialist
## 36                             Other         Hotel Recreation Supervisor
## 37                             Other International Pro Basketball Player
## 38                             Other                              Pilot 
## 39                        West Point                         War Veteran
## 40                             Other                         Firefighter
## 41                  Westmont College            Former Investment Banker
## 42                             Other                          Pro Surfer
## 43                Clemson University                 Electrical Engineer
## 44               University of Tampa                       Fitness Coach
## 45                Belmont University                    Singer/Sonwriter
## 46                  Williams College                  Tech Entrepreneur 
## 47                Indiana University                   Software Salesman
## 48                             Other                 Executive Recruiter
## 49         Madison Technical College                      Business Owner
## 50          California St Northridge                  Telecom Consultant
## 51                             Other          International Auto Shipper
## 52      Souther Methodist University                  Investment Analyst
## 53            San Jose State College                       Fitness Coach
## 54                  Hastings College                         Electrician
## 55                             Other                 Landscape Architect
## 56 Northeastern Illionois University                  Real Estate Broker
## 57                             Other     Erectile Dysfunction Specialist
## 58       California State University                               Actor
## 59                             Other              Architectural Engineer
## 60  The George Washington University                Political Consultant
## 61           University of San Diego                      Talent Manager
## 62                             Other               Residential Developer
## 63              University of Oxford                            Radio DJ
## 64                             Other                   Real Estate Agent
## 65                       UC Berkeley                   Real Estate Agent
## 66        Onondaga Community College                  Bachelor Superfan
## 67             University of Florida                        Chiropractor
## 68              University of Kansas                Auto Finance Manager
## 69           University of Wisconsin            Software Sales Executive
## 70                 Lehigh University                          Roller Boy
##    Win_Loss Height..cm. Girlfriend.While.on.the.Show. Hair.Color Eye.Color
## 1         0         NaN                            No     Blonde     Brown
## 2         1      187.96                            No      Brown     Brown
## 3         0         NaN                            No      Brown     Brown
## 4         0         NaN                            No      Brown     Green
## 5         0      187.96                            No      Brown      Blue
## 6         0      190.50                            No      Brown     Brown
## 7         1      175.26                            No      Brown     Brown
## 8         0         NaN                            No      Brown     Brown
## 9         0         NaN                            No      Brown      Blue
## 10        0      182.80                            No      Brown     Brown
## 11        0         NaN                            No      Brown     Brown
## 12        0      190.50                            No     Blonde      Blue
## 13        0         NaN                            No      Brown     Brown
## 14        0      182.88                            No      Brown     Brown
## 15        0      182.88                            No      Brown     Brown
## 16        0      177.80                            No      Brown     Brown
## 17        0      182.88                            No      Brown     Brown
## 18        0      190.50                            No      Brown     Brown
## 19        0      193.04                            No      Brown     Brown
## 20        0      180.30                            No      Brown     Brown
## 21        0      190.50                            No      Brown      Blue
## 22        0      170.18                            No      Brown     Brown
## 23        0      188.00                            No      Brown      Blue
## 24        0      190.50                            No      Brown     Green
## 25        0         NaN                            No      Brown     Brown
## 26        0      187.96                            No      Brown     Brown
## 27        0      187.96                            No      Brown     Brown
## 28        0         NaN                            No     Blonde     Green
## 29        0         NaN                            No      Brown     Brown
## 30        0      180.30                            No      Brown     Brown
## 31        0      180.33                            No      Brown     Brown
## 32        0      190.50                            No      Brown     Brown
## 33        0      172.72                            No     Blonde      Blue
## 34        0      187.96                            No      Brown     Brown
## 35        0      190.50                            No      Brown      Blue
## 36        0      193.04                            No      Brown     Brown
## 37        0         NaN                            No      Brown     Brown
## 38        0         NaN                            No      Brown     Brown
## 39        0      185.42                            No      Brown     Brown
## 40        0      187.96                            No      Brown     Brown
## 41        0      185.00                            No      Brown     Brown
## 42        0         NaN                            No      Brown     Brown
## 43        0      185.00                            No      Brown     Brown
## 44        0      182.80                            No      Brown     Brown
## 45        1      190.50                           Yes      Brown     Brown
## 46        0      180.30                            No      Brown     Brown
## 47        0      198.12                            No      Brown     Brown
## 48        0      185.42                            No      Brown     Brown
## 49        0      190.50                            No      Brown     Brown
## 50        0      180.33                            No      Brown     Brown
## 51        0      187.96                            No      Brown     Brown
## 52        0      198.00                            No      Brown     Brown
## 53        0      193.04                            No      Brown      Blue
## 54        0         NaN                            No      Brown     Brown
## 55        0      187.96                            No      Brown     Brown
## 56        0      188.00                            No      Brown     Brown
## 57        0      180.33                            No      Brown     Brown
## 58        0      193.00                            No      Brown     Brown
## 59        0      175.25                            No     Blonde      Blue
## 60        0      187.96                            No      Brown     Brown
## 61        0         NaN                            No      Brown     Brown
## 62        0      182.88                            No      Brown     Brown
## 63        0      182.88                            No      Brown     Brown
## 64        0         NaN                            No      Brown     Brown
## 65        0      182.88                            No      Brown     Brown
## 66        0      185.42                            No      Brown     Brown
## 67        1      187.96                            No      Brown     Brown
## 68        0      190.50                            No      Brown     Brown
## 69        0      187.96                            No      Brown     Brown
## 70        0         NaN                            No      Brown     Brown

You can also select the top rows using the top_n function.

byState <- group_by(df, State) %>%
  summarise(mean(Height..cm.)) %>%
  top_n(6, `mean(Height..cm.)`)

byState
## # A tibble: 6 x 2
##   State `mean(Height..cm.)`
##   <fct>               <dbl>
## 1 IN                   198.
## 2 KA                   190.
## 3 MO                   189.
## 4 TN                   190.
## 5 WA                   193.
## 6 WI                   189.

One last thing you can do with dplyr is use the count function to see how often an instance of a variable occurs in a dataset.

topCollege <- top_n(count(df,College),5)
## Selecting by n
topHairColor <- top_n(count(df,Hair.Color),5)
## Selecting by n
topEyeColor <- top_n(count(df,Eye.Color),5)
## Selecting by n
topCollege
## # A tibble: 11 x 2
##    College                             n
##    <fct>                           <int>
##  1 Florida State University            2
##  2 Hastings College                    2
##  3 Indiana University                  2
##  4 Other                              43
##  5 Palm Beach State College            2
##  6 Texas A&M University                2
##  7 University of California, Davis     2
##  8 University of Central Florida       2
##  9 University of Florida               2
## 10 University of Wisconsin             2
## 11 Wake Forest                         2
topHairColor
## # A tibble: 2 x 2
##   Hair.Color     n
##   <fct>      <int>
## 1 Blonde         8
## 2 Brown        133
topEyeColor
## # A tibble: 3 x 2
##   Eye.Color     n
##   <fct>     <int>
## 1 Blue         14
## 2 Brown       120
## 3 Green         7

This is just a small sample of all the amazing things that dplyr can do!