# A tibble: 5 × 4
state num5k num2mil numrows
<chr> <int> <int> <int>
1 IL 1 1 102
2 IN 0 0 92
3 MI 1 1 83
4 OH 0 0 88
5 WI 2 0 72
## Problem C# part Imidwest %>%group_by(county) %>%summarize(x =n_distinct(state)) %>%arrange(desc(x)) %>%ungroup()
# A tibble: 320 × 2
county x
<chr> <int>
1 CRAWFORD 5
2 JACKSON 5
3 MONROE 5
4 ADAMS 4
5 BROWN 4
6 CLARK 4
7 CLINTON 4
8 JEFFERSON 4
9 LAKE 4
10 WASHINGTON 4
# ℹ 310 more rows
# part II# How does n() differ from n_distinct()? # When would they be the same? different?midwest %>%group_by(county) %>%summarize(x =n()) %>%ungroup()
# A tibble: 320 × 2
county x
<chr> <int>
1 ADAMS 4
2 ALCONA 1
3 ALEXANDER 1
4 ALGER 1
5 ALLEGAN 1
6 ALLEN 2
7 ALPENA 1
8 ANTRIM 1
9 ARENAC 1
10 ASHLAND 2
# ℹ 310 more rows
# part III# hint: # - How many distinctly different counties are there for each county?# - Can there be more than 1 (county) county in each county?# - What if we replace 'county' with 'state'?midwest %>%group_by(county) %>%summarize(x =n_distinct(county)) %>%ungroup()
# A tibble: 320 × 2
county x
<chr> <int>
1 ADAMS 1
2 ALCONA 1
3 ALEXANDER 1
4 ALGER 1
5 ALLEGAN 1
6 ALLEN 1
7 ALPENA 1
8 ANTRIM 1
9 ARENAC 1
10 ASHLAND 1
# ℹ 310 more rows
## Problem E# part Idiamonds %>%group_by(color, cut) %>%summarize(m =mean(price),s =sd(price)) %>%ungroup()
`summarise()` has grouped output by 'color'. You can override using the
`.groups` argument.
# A tibble: 35 × 4
color cut m s
<ord> <ord> <dbl> <dbl>
1 D Fair 4291. 3286.
2 D Good 3405. 3175.
3 D Very Good 3470. 3524.
4 D Premium 3631. 3712.
5 D Ideal 2629. 3001.
6 E Fair 3682. 2977.
7 E Good 3424. 3331.
8 E Very Good 3215. 3408.
9 E Premium 3539. 3795.
10 E Ideal 2598. 2956.
# ℹ 25 more rows
# part IIdiamonds %>%group_by(cut, color) %>%summarize(m =mean(price),s =sd(price)) %>%ungroup()
`summarise()` has grouped output by 'cut'. You can override using the `.groups`
argument.
# A tibble: 35 × 4
cut color m s
<ord> <ord> <dbl> <dbl>
1 Fair D 4291. 3286.
2 Fair E 3682. 2977.
3 Fair F 3827. 3223.
4 Fair G 4239. 3610.
5 Fair H 5136. 3886.
6 Fair I 4685. 3730.
7 Fair J 4976. 4050.
8 Good D 3405. 3175.
9 Good E 3424. 3331.
10 Good F 3496. 3202.
# ℹ 25 more rows
# part III# hint: # - How good is the sale if the price of diamonds equaled msale? # - e.x. The diamonds are x% off original price in msale.diamonds %>%group_by(cut, color, clarity) %>%summarize(m =mean(price),s =sd(price),msale = m *0.80) %>%ungroup()
`summarise()` has grouped output by 'cut', 'color'. You can override using the
`.groups` argument.
# A tibble: 276 × 6
cut color clarity m s msale
<ord> <ord> <ord> <dbl> <dbl> <dbl>
1 Fair D I1 7383 5899. 5906.
2 Fair D SI2 4355. 3260. 3484.
3 Fair D SI1 4273. 3019. 3419.
4 Fair D VS2 4513. 3383. 3610.
5 Fair D VS1 2921. 2550. 2337.
6 Fair D VVS2 3607 3629. 2886.
7 Fair D VVS1 4473 5457. 3578.
8 Fair D IF 1620. 525. 1296.
9 Fair E I1 2095. 824. 1676.
10 Fair E SI2 4172. 3055. 3338.
# ℹ 266 more rows
## Problem G# part Idiamonds %>%group_by(color) %>%summarize(m =mean(price)) %>%mutate(x1 =str_c("Diamond color ", color),x2 =5) %>%ungroup()
# A tibble: 7 × 4
color m x1 x2
<ord> <dbl> <chr> <dbl>
1 D 3170. Diamond color D 5
2 E 3077. Diamond color E 5
3 F 3725. Diamond color F 5
4 G 3999. Diamond color G 5
5 H 4487. Diamond color H 5
6 I 5092. Diamond color I 5
7 J 5324. Diamond color J 5
# part II# What does the first ungroup() do? Is it useful here? Why/why not?# Why isn't there a closing ungroup() after the mutate()?diamonds %>%group_by(color) %>%summarize(m =mean(price)) %>%ungroup() %>%mutate(x1 =str_c("Diamond color ", color),x2 =5)
# A tibble: 7 × 4
color m x1 x2
<ord> <dbl> <chr> <dbl>
1 D 3170. Diamond color D 5
2 E 3077. Diamond color E 5
3 F 3725. Diamond color F 5
4 G 3999. Diamond color G 5
5 H 4487. Diamond color H 5
6 I 5092. Diamond color I 5
7 J 5324. Diamond color J 5
## Problem H# part Idiamonds %>%group_by(color) %>%mutate(x1 = price *0.5) %>%summarize(m =mean(x1)) %>%ungroup()
# A tibble: 7 × 2
color m
<ord> <dbl>
1 D 1585.
2 E 1538.
3 F 1862.
4 G 2000.
5 H 2243.
6 I 2546.
7 J 2662.
# part II# What's the difference between part I and II?diamonds %>%group_by(color) %>%mutate(x1 = price *0.5) %>%ungroup() %>%summarize(m =mean(x1))
# A tibble: 1 × 1
m
<dbl>
1 1966.
## Problem H# part Idiamonds %>%group_by(color) %>%mutate(x1 = price *0.5) %>%summarize(m =mean(x1))
# A tibble: 7 × 2
color m
<ord> <dbl>
1 D 1585.
2 E 1538.
3 F 1862.
4 G 2000.
5 H 2243.
6 I 2546.
7 J 2662.
Grouping data is essential because it organizes raw information into a structured format, making it easier to understand, analyze, and derive insights. Here are some key reasons why grouping data is necessary:
Grouping allows you to condense large volumes of data, making it easier to spot patterns, trends, and outliers.
Grouped data makes it straightforward to compare different categories, time periods, or locations.
Summarizing data into groups can make statistical analyses more efficient and manageable.
Ungrouping data, or breaking down grouped data into its individual components, can be equally important in data analysis.
Ungrouping allows for a deeper, more granular analysis of individual data points.
Individual data points can reveal outliers or unusual patterns that may be averaged out or overlooked in grouped data.
When should you ungroup data?
If the analysis requires a close look at individual data points, such as examining specific behaviors, events, or trends, ungrouping can reveal finer details that grouped data may obscure.
If you need to detect unusual patterns, anomalies, or variations within your data, ungrouping can help identify these outliers, which could otherwise be hidden within averages or summary statistics.
When the current grouping does not serve your analysis goals, ungrouping allows you to reorganize the data.
If the code does not contain group_by(), do you still need ungroup() at the end?
No, if your code does not contain a group_by() function, then you don’t need to use ungroup() at the end. The ungroup() function is specifically used to remove grouping structure in a dataset that has been grouped with group_by(). If no grouping was applied, there’s no grouping structure to remove, so ungroup() is unnecessary