## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Dplyr mostly uses five “verbs.”
Some basic behavior of R is altered. All dataframes touched by dplyr become tbl’s (pronounced tibble).
Typing the name of a dataframe is prevented from overwhelming the console by dumping the entire dataframe. Both rows and columns are restricted.
Use of View() and glimpse() give us ability to see everything.
## Rows: 53,940
## Columns: 10
## $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
## $ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
## $ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
## $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
## $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
## $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
## $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
## $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
## $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…
Piping with %>% is almost always used when dplyr is in use, although it is not strictly necessary.
Create a new version of diamonds, called d.
Use filter to eliminate any diamonds with x, y, or z = 0.
Cut, color and clarity should be simple character variables instead of factors.
Make a new variable ppc, the price per carat.
d <- diamonds %>%
filter(x>0, y>0, z>0) %>%
mutate(cut = as.character(cut),
color = as.character(color),
clarity = as.character(clarity),
ppc = price/carat) %>%
glimpse()
## Rows: 53,920
## Columns: 11
## $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
## $ cut <chr> "Ideal", "Premium", "Good", "Premium", "Good", "Very Good", "V…
## $ color <chr> "E", "E", "E", "I", "J", "J", "I", "H", "E", "H", "J", "J", "F…
## $ clarity <chr> "SI2", "SI1", "VS1", "VS2", "SI2", "VVS2", "VVS1", "SI1", "VS2…
## $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
## $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
## $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
## $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
## $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
## $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…
## $ ppc <dbl> 1417.391, 1552.381, 1421.739, 1151.724, 1080.645, 1400.000, 14…
## carat cut color clarity
## Min. :0.2000 Length:53920 Length:53920 Length:53920
## 1st Qu.:0.4000 Class :character Class :character Class :character
## Median :0.7000 Mode :character Mode :character Mode :character
## Mean :0.7977
## 3rd Qu.:1.0400
## Max. :5.0100
## depth table price x
## Min. :43.00 Min. :43.00 Min. : 326 Min. : 3.730
## 1st Qu.:61.00 1st Qu.:56.00 1st Qu.: 949 1st Qu.: 4.710
## Median :61.80 Median :57.00 Median : 2401 Median : 5.700
## Mean :61.75 Mean :57.46 Mean : 3931 Mean : 5.732
## 3rd Qu.:62.50 3rd Qu.:59.00 3rd Qu.: 5323 3rd Qu.: 6.540
## Max. :79.00 Max. :95.00 Max. :18823 Max. :10.740
## y z ppc
## Min. : 3.680 Min. : 1.07 Min. : 1051
## 1st Qu.: 4.720 1st Qu.: 2.91 1st Qu.: 2478
## Median : 5.710 Median : 3.53 Median : 3495
## Mean : 5.735 Mean : 3.54 Mean : 4008
## 3rd Qu.: 6.540 3rd Qu.: 4.04 3rd Qu.: 4949
## Max. :58.900 Max. :31.80 Max. :17829
Use Dplyr to create a subset d1 of d with cut = “Ideal” and ppc greater than $5,000. Keep only the columns clarity, color, and ppc.
## clarity color ppc
## Length:4780 Length:4780 Min. : 5001
## Class :character Class :character 1st Qu.: 5753
## Mode :character Mode :character Median : 6758
## Mean : 7104
## 3rd Qu.: 8091
## Max. :17078
dcells <- d %>%
group_by(cut,color,clarity) %>%
summarize(count = n(),
mean_ppc = mean(ppc),
mean_carat = mean(carat),
median_ppc = median(ppc),
sd_ppc = sd(ppc),
IQR_ppc = IQR(ppc)) %>%
ungroup() %>% # Important
arrange(desc(mean_ppc))
## `summarise()` has grouped output by 'cut', 'color'. You can override using the
## `.groups` argument.