quarto_3

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Week 3

Problem A

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

midwest %>%
  group_by(state) %>%
  summarize(poptotalmean = mean (poptotal), # this calculates the total mean population
            poptotalmed = median(poptotal), # this calculates the median for population
            popmax = max(poptotal), # this calculates the maximum for population
            popmin = min(poptotal), # this calculates the minimum for population
            popdistinct = n_distinct(poptotal), # counts amount of unique values
            popfirst = first(poptotal), # creates new variable 'popfirst' with the value 'poptotal'
            popany = any(poptotal < 5000), # shows any poptotal below 5000
            popany2 =any(poptotal >2000000)) %>% # shows any poptotal over 2000000
  ungroup() # removes the group created by group_by which returns it to the ungrouped state

# A tibble: 5 × 9
  state poptotalmean poptotalmed  popmax popmin popdistinct popfirst popany
  <chr>        <dbl>       <dbl>   <int>  <int>       <int>    <int> <lgl> 
1 IL         112065.      24486. 5105067   4373         101    66090 TRUE  
2 IN          60263.      30362.  797159   5315          92    31095 FALSE 
3 MI         111992.      37308  2111687   1701          83    10145 TRUE  
4 OH         123263.      54930. 1412140  11098          88    25371 FALSE 
5 WI          67941.      33528   959275   3890          72    15682 TRUE  
# ℹ 1 more variable: popany2 <lgl>

Problem B

midwest %>% 
  group_by(state) %>% # segments the data based on unique values of 'state'
  summarize(num5k = sum(poptotal < 5000), # counts any poptotal less than 5000 and puts it in a new table called 'num5k'
          num2mil = sum(poptotal > 2000000), # counts any poptotal over 2000000 and puts it in a new variable called 'num2mil'
          numrows = n()) %>% # counts the number of rows within groups 
  ungroup() # removes the group created by group_by which returns it to the ungrouped state

# A tibble: 5 × 4
  state num5k num2mil numrows
  <chr> <int>   <int>   <int>
1 IL        1       1     102
2 IN        0       0      92
3 MI        1       1      83
4 OH        0       0      88
5 WI        2       0      72

Problem C Part 1

midwest %>% 
  group_by(county) %>%
  summarize(x = n_distinct(state)) %>% # assesses how many unique entries exist in "county"
  arrange(desc(x)) %>% # arranges data from largest to smallest 
  ungroup() # removes the group created by group_by which returns it to the ungrouped state

# A tibble: 320 × 2
   county         x
   <chr>      <int>
 1 CRAWFORD       5
 2 JACKSON        5
 3 MONROE         5
 4 ADAMS          4
 5 BROWN          4
 6 CLARK          4
 7 CLINTON        4
 8 JEFFERSON      4
 9 LAKE           4
10 WASHINGTON     4
# ℹ 310 more rows

Part 2

midwest %>% 
  group_by(county) %>% 
  summarize(x = n())%>% # counts the number of rows within the group 'county'
  ungroup() # removes the group created by group_by which returns it to the ungrouped state

# A tibble: 320 × 2
   county        x
   <chr>     <int>
 1 ADAMS         4
 2 ALCONA        1
 3 ALEXANDER     1
 4 ALGER         1
 5 ALLEGAN       1
 6 ALLEN         2
 7 ALPENA        1
 8 ANTRIM        1
 9 ARENAC        1
10 ASHLAND       2
# ℹ 310 more rows

The difference betweem using ‘n()” and ’n_distinct()’ is that n_distinct shows the unique values in a certain column where as n() counts the total number of rows in a group. - they will remain the same when there is only one unique value on the group being looked at.

Part 3

midwest %>% 
  group_by(county)%>%
  summarize(x = n_distinct(county))%>% # determins th number of unique entries of the column 'county'
  ungroup()

# A tibble: 320 × 2
   county        x
   <chr>     <int>
 1 ADAMS         1
 2 ALCONA        1
 3 ALEXANDER     1
 4 ALGER         1
 5 ALLEGAN       1
 6 ALLEN         1
 7 ALPENA        1
 8 ANTRIM        1
 9 ARENAC        1
10 ASHLAND       1
# ℹ 310 more rows

as shown above there isn’t more than 1 county for each county. If the counties were grouped together as “states” then this is when it would be viewed as more than 1 county.

Problem D

diamonds %>%
  group_by(clarity)%>%
  summarize(a = n_distinct(color),
            b = n_distinct(price),
            c = n()) %>% # summary of the data by counting unique values of colour and price and the total number of rows.
  ungroup()

# A tibble: 8 × 4
  clarity     a     b     c
  <ord>   <int> <int> <int>
1 I1          7   632   741
2 SI2         7  4904  9194
3 SI1         7  5380 13065
4 VS2         7  5051 12258
5 VS1         7  3926  8171
6 VVS2        7  2409  5066
7 VVS1        7  1623  3655
8 IF          7   902  1790

Problem E Part 1

diamonds %>%
  group_by(color, cut) %>%
  summarize(m = mean(price),
            s = sd(price))%>% # calculates the mean of price and standard deviation of price.
  ungroup()

`summarise()` has grouped output by 'color'. You can override using the
`.groups` argument.

# A tibble: 35 × 4
   color cut           m     s
   <ord> <ord>     <dbl> <dbl>
 1 D     Fair      4291. 3286.
 2 D     Good      3405. 3175.
 3 D     Very Good 3470. 3524.
 4 D     Premium   3631. 3712.
 5 D     Ideal     2629. 3001.
 6 E     Fair      3682. 2977.
 7 E     Good      3424. 3331.
 8 E     Very Good 3215. 3408.
 9 E     Premium   3539. 3795.
10 E     Ideal     2598. 2956.
# ℹ 25 more rows

Part 2

diamonds %>%
  group_by(cut, color) %>%
  summarize(m = mean(price),
            s = sd(price)) %>%
  ungroup()

`summarise()` has grouped output by 'cut'. You can override using the `.groups`
argument.

# A tibble: 35 × 4
   cut   color     m     s
   <ord> <ord> <dbl> <dbl>
 1 Fair  D     4291. 3286.
 2 Fair  E     3682. 2977.
 3 Fair  F     3827. 3223.
 4 Fair  G     4239. 3610.
 5 Fair  H     5136. 3886.
 6 Fair  I     4685. 3730.
 7 Fair  J     4976. 4050.
 8 Good  D     3405. 3175.
 9 Good  E     3424. 3331.
10 Good  F     3496. 3202.
# ℹ 25 more rows

Part 3

diamonds %>% 
  group_by(cut, color, clarity) %>% 
  summarize(m = mean (price),
            s = sd(price),
            msale = m * 0.80) %>% # calculates the 'msale' by timesing m by 0.8 
  ungroup()

`summarise()` has grouped output by 'cut', 'color'. You can override using the
`.groups` argument.

# A tibble: 276 × 6
   cut   color clarity     m     s msale
   <ord> <ord> <ord>   <dbl> <dbl> <dbl>
 1 Fair  D     I1      7383  5899. 5906.
 2 Fair  D     SI2     4355. 3260. 3484.
 3 Fair  D     SI1     4273. 3019. 3419.
 4 Fair  D     VS2     4513. 3383. 3610.
 5 Fair  D     VS1     2921. 2550. 2337.
 6 Fair  D     VVS2    3607  3629. 2886.
 7 Fair  D     VVS1    4473  5457. 3578.
 8 Fair  D     IF      1620.  525. 1296.
 9 Fair  E     I1      2095.  824. 1676.
10 Fair  E     SI2     4172. 3055. 3338.
# ℹ 266 more rows

if the price of the diamonds is equaled to the msale then it is considered a ‘fair’ sale

Problem F

diamonds %>% 
  group_by(cut) %>%
  summarize(potato = mean(depth),
            pizza = mean(price),
            popcorn = median(y),
            pineapple = potato - pizza, # clculates the difference between potato - mean of depth and pizza mean of price and assigns it to 'pineapple' 
            papya = pineapple ^ 2, # squares the value of pineapple and assigns it to 'papya' 
            peach = n()) %>% # counts the total number of rows in the data frame and assigns it to 'peach'
  ungroup()

# A tibble: 5 × 7
  cut       potato pizza popcorn pineapple     papya peach
  <ord>      <dbl> <dbl>   <dbl>     <dbl>     <dbl> <int>
1 Fair        64.0 4359.    6.1     -4295. 18444586.  1610
2 Good        62.4 3929.    5.99    -3866. 14949811.  4906
3 Very Good   61.8 3982.    5.77    -3920. 15365942. 12082
4 Premium     61.3 4584.    6.06    -4523. 20457466. 13791
5 Ideal       61.7 3458.    5.26    -3396. 11531679. 21551

Problem G Part 1

diamonds %>%
  group_by(color) %>%
  summarize(m = mean(price)) %>% # calculates the mean price of diamonds for each color group and puts it in a new column 'm'
  mutate(x1 = str_c("diamond color", color), # adds new columns combines the string "diamond color" with each unique color which results in a new coloumn that labels each color 
         x2 = 5) %>% # a constant value of 5 for every row, creating a new column 'x2'
  ungroup()

# A tibble: 7 × 4
  color     m x1                x2
  <ord> <dbl> <chr>          <dbl>
1 D     3170. diamond colorD     5
2 E     3077. diamond colorE     5
3 F     3725. diamond colorF     5
4 G     3999. diamond colorG     5
5 H     4487. diamond colorH     5
6 I     5092. diamond colorI     5
7 J     5324. diamond colorJ     5

Part 2

diamonds %>%
  group_by(color) %>%
  summarize(m = mean(price)) %>%
  ungroup() %>%
  mutate(x1 = str_c("diamond color", color),
         x2 = 5) # adds two new columns, using 'str_c to create a new string that combines 'diamond color with the corresponding color value. and the other column being a constant value of 5.

# A tibble: 7 × 4
  color     m x1                x2
  <ord> <dbl> <chr>          <dbl>
1 D     3170. diamond colorD     5
2 E     3077. diamond colorE     5
3 F     3725. diamond colorF     5
4 G     3999. diamond colorG     5
5 H     4487. diamond colorH     5
6 I     5092. diamond colorI     5
7 J     5324. diamond colorJ     5

The code ‘ungroup()’ removes the grouping structure. This is important because it can then perform operations on the entire data frame without any grouping context. If the function ‘ungroup()’ wasn’t used then any subsequent operations would still consider the original grouping.
If ‘ungroup()’ was included after the ‘mutate()’ coding dataset then it wouldn’t recognise that you are working with the the non-grouped dataset

Problem H Part 1

diamonds %>%
  group_by(color) %>%
  mutate(x1 = price * 0.5) %>% # creates a new column 'x1' within each group where the value is half of the corresponding price. by inc 'group_bycolor' it means it will operate within each color group.
  summarize(m = mean(x1)) %>% # this then calculates the mean of x1 and outs it in the new column of 'm'
  ungroup()

# A tibble: 7 × 2
  color     m
  <ord> <dbl>
1 D     1585.
2 E     1538.
3 F     1862.
4 G     2000.
5 H     2243.
6 I     2546.
7 J     2662.

Part 2

diamonds %>% 
  group_by(color) %>% 
  mutate(x1 = price * 0.5) %>%
  ungroup() %>% # removes the group created by group_by which returns it to the ungrouped state
  summarize(m = mean(x1)) # calculates the mean of x1 and puts it into new column 'm'

# A tibble: 1 × 1
      m
  <dbl>
1 1966.

The difference between the codes in part 1 and 2 is that part 1 provides the mean of half the prices for all the color groups whereas part 2 shows the mean of half the prices for all of the diamonds regardless of the color.

Added Notes

Grouping data is important for many reasons such as: 1. It allows you to target a specific set of data e.g color, 2. it prepares data for statistical analysis tests making it easier to view meaning the results are easier to make sense of by seeing them in tables and then turning that into charts, 3. when the data needs to be calculated as mean, median or standard deviation ‘sd’ for example, then it allows you to carry out the calculations in a simple manner.
Ungrouping data is also important for many reasons such as: 1. It allows you to extra operations such as extra calculations, adds to the entire dataset and not just within the groups, 2. It ensures that the continuing steps work on the intended data structure, important after a series of calculations or transformations 3. It allows for operations such as if you are wanting to calculate the overall mean the following code could be used:

diamonds %>%
group_by(color) %>%
summarize(mean_price = mean(price)) %>%
ungroup() %>%
mutate(overall_mean = mean(mean_price))

If group_by has not been used then you don’t need to use ungroup() as ungroup() is specifically for removing grouping from a dataset that has previously been ‘grouped_by()’

Good and bad Question

Good Question - In the diamonds dataset, does the carat weight effect the price and is there a correlation?

Bad question - Using the diamonds dataset how many diamonds can you identify?

Week 4

library(tidyverse)
library(modeldata)
?ggplot
?crickets
view(crickets)

Planning a data visualization

The Basics

ggplot(crickets, aes(x = temp,
                     y = rate,)) +  
  geom_point() + # adds points to the plot, representing individual observations
  labs(x = "Temperature",
       y = "Chirp rate",
       title = "Cricket chirps",
       caption = "Source: Mcdonald (2009)")

ggplot(crickets, aes(x = temp,
                     y = rate,
                     color = species)) +  # a ggplot object with crickits dataset, mapping temp to x-axis and rate to y-axis. 
  geom_point() + # adds points to the plot, representing individual observations
  labs(x = "Temperature",
       y = "Chirp rate",
       title = "Cricket chirps",
       caption = "Source: Mcdonald (2009)") + 
  scale_color_brewer(palette = "Dark2")

Modifying basic properties of the plot

ggplot(crickets, aes(x = temp,
                     y = rate,
                     color = species)) +  # a ggplot object with crickits dataset, mapping temp to x-axis and rate to y-axis. 
  
  geom_point(color = "red", 
             size = 2,
             alpha = .3,
             shape = "square") + # adds points to the plot, representing individual observations
  labs(x = "Temperature",
       y = "Chirp rate",
       title = "Cricket chirps",
       caption = "Source: Mcdonald (2009)")

Learn more about the options for the geom_point with ?geom_point

Adding another layer

ggplot(crickets, aes(x = temp,
                     y = rate,)) +  
  geom_point() + 
  geom_smooth(method = "lm",
              se = FALSE) +
  labs(x = "Temperature",
       y = "Chirp rate",
       title = "Cricket chirps",
       caption = "Source: Mcdonald (2009)")

`geom_smooth()` using formula = 'y ~ x'

ggplot(crickets, aes(x = temp,
                    y = rate,
                    color = species)) + 
  geom_point() + 
  geom_smooth(method = "lm",
              se = FALSE) +
labs(x = "Temperaure",
     y = "Chirp rate",
     color = "Species",
     title = "Cricket chirps",
     caption = "Source: Mcdonald (2009)") +
  scale_color_brewer(palette = "Dark2")

`geom_smooth()` using formula = 'y ~ x'

Other plots

ggplot(crickets, aes(x = rate)) + 
  geom_histogram(bins = 15) # one quantitative variable

ggplot(crickets, aes(x = rate)) +
  geom_freqpoly(bins = 15)

ggplot(crickets, aes(x = species)) +
  geom_bar(color = "black",
           fill = "lightblue")

ggplot(crickets, aes(x = species,
                     fill = species)) + 
  geom_bar(show.legend = FALSE) +
  scale_fill_brewer(palette = "Dark2")

ggplot(crickets, aes(x = species,
                     y = rate,
                     color = species)) +
  geom_boxplot(show.legend = FALSE) +
  scale_color_brewer(palette = "Dark2") +
  theme_minimal()

Faceting

# not great:
ggplot(crickets, aes(x = rate,
                     fill = species)) +
  geom_histogram(bins = 15) +
  scale_fill_brewer(palette = "Dark2")

ggplot(crickets, aes(x = rate,
                     fill = species)) +
  geom_histogram(bins = 15,
                 show.legend = FALSE) +
  facet_wrap(~species) +
scale_fill_brewer(palette = "Dark2")

ggplot(crickets, aes(x = rate,
                     fill = species)) +
  geom_histogram(bins = 15,
                 show.legend = FALSE) +
  facet_wrap(~species,
             ncol = 1) +
scale_fill_brewer(palette = "Dark2")

What is a good research hypothesis

To develop a good research hypothesis it first must be clear and it must be able to be tested through data analysis for example. different things that must be looked at before developing the hypothesis such as looking at a specific relationship for example that relates between variables, the hypothesis question must outline clearly what it is that is being measured. for example, after using the cricket data above a good hypothesis could have been “Does temperature have an influence on the rate of chirping in crickets and does it differ depending on the species?”.

Week 5

Box plot

# Load the ggplot2 package
library(ggplot2)

# Create the box plot
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + # this maps the species variable to the x-axis and sepal.length to the y-axis. fill = speecies makes each box colored by species.
  geom_boxplot() + # this adds the box plot layer 
  theme_minimal() + # optional, but it adds a clean theme
  labs(x = "Species", y = "Sepal Length") + # labels the x and y axis 
  scale_fill_manual(values = c("red", "green", "blue")) # sets colors for each species

Density plot

library(ggplot2)

ggplot(iris, aes(x = Petal.Length, color = Species)) + # this maps petal.length to x-axis and assigns different colors to each species
  geom_density() # creates density plot which shows the distribution of petal length for each species.

Scatter plot with line of regression

library(ggplot2)

ggplot(iris, aes(x = Petal.Length, y = Petal.Width)) + # maps petal,length to x-axis and petal.width to y-axis.
  geom_point(mapping = aes(color = Species, shape = Species))+ # this adds the points to the plot, with the points' color and shapes representing different species
  geom_smooth(method = "lm") # this adds a linear regression line (lm standing for linear model) to show the trent of petal length and width across all the species.

`geom_smooth()` using formula = 'y ~ x'

Creates a new variable

library(ggplot2)

iris %>%
  mutate(size=ifelse(Sepal.Length < median(Sepal.Length),
                     "small", "big"))

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  size
1            5.1         3.5          1.4         0.2     setosa small
2            4.9         3.0          1.4         0.2     setosa small
3            4.7         3.2          1.3         0.2     setosa small
4            4.6         3.1          1.5         0.2     setosa small
5            5.0         3.6          1.4         0.2     setosa small
6            5.4         3.9          1.7         0.4     setosa small
7            4.6         3.4          1.4         0.3     setosa small
8            5.0         3.4          1.5         0.2     setosa small
9            4.4         2.9          1.4         0.2     setosa small
10           4.9         3.1          1.5         0.1     setosa small
11           5.4         3.7          1.5         0.2     setosa small
12           4.8         3.4          1.6         0.2     setosa small
13           4.8         3.0          1.4         0.1     setosa small
14           4.3         3.0          1.1         0.1     setosa small
15           5.8         4.0          1.2         0.2     setosa   big
16           5.7         4.4          1.5         0.4     setosa small
17           5.4         3.9          1.3         0.4     setosa small
18           5.1         3.5          1.4         0.3     setosa small
19           5.7         3.8          1.7         0.3     setosa small
20           5.1         3.8          1.5         0.3     setosa small
21           5.4         3.4          1.7         0.2     setosa small
22           5.1         3.7          1.5         0.4     setosa small
23           4.6         3.6          1.0         0.2     setosa small
24           5.1         3.3          1.7         0.5     setosa small
25           4.8         3.4          1.9         0.2     setosa small
26           5.0         3.0          1.6         0.2     setosa small
27           5.0         3.4          1.6         0.4     setosa small
28           5.2         3.5          1.5         0.2     setosa small
29           5.2         3.4          1.4         0.2     setosa small
30           4.7         3.2          1.6         0.2     setosa small
31           4.8         3.1          1.6         0.2     setosa small
32           5.4         3.4          1.5         0.4     setosa small
33           5.2         4.1          1.5         0.1     setosa small
34           5.5         4.2          1.4         0.2     setosa small
35           4.9         3.1          1.5         0.2     setosa small
36           5.0         3.2          1.2         0.2     setosa small
37           5.5         3.5          1.3         0.2     setosa small
38           4.9         3.6          1.4         0.1     setosa small
39           4.4         3.0          1.3         0.2     setosa small
40           5.1         3.4          1.5         0.2     setosa small
41           5.0         3.5          1.3         0.3     setosa small
42           4.5         2.3          1.3         0.3     setosa small
43           4.4         3.2          1.3         0.2     setosa small
44           5.0         3.5          1.6         0.6     setosa small
45           5.1         3.8          1.9         0.4     setosa small
46           4.8         3.0          1.4         0.3     setosa small
47           5.1         3.8          1.6         0.2     setosa small
48           4.6         3.2          1.4         0.2     setosa small
49           5.3         3.7          1.5         0.2     setosa small
50           5.0         3.3          1.4         0.2     setosa small
51           7.0         3.2          4.7         1.4 versicolor   big
52           6.4         3.2          4.5         1.5 versicolor   big
53           6.9         3.1          4.9         1.5 versicolor   big
54           5.5         2.3          4.0         1.3 versicolor small
55           6.5         2.8          4.6         1.5 versicolor   big
56           5.7         2.8          4.5         1.3 versicolor small
57           6.3         3.3          4.7         1.6 versicolor   big
58           4.9         2.4          3.3         1.0 versicolor small
59           6.6         2.9          4.6         1.3 versicolor   big
60           5.2         2.7          3.9         1.4 versicolor small
61           5.0         2.0          3.5         1.0 versicolor small
62           5.9         3.0          4.2         1.5 versicolor   big
63           6.0         2.2          4.0         1.0 versicolor   big
64           6.1         2.9          4.7         1.4 versicolor   big
65           5.6         2.9          3.6         1.3 versicolor small
66           6.7         3.1          4.4         1.4 versicolor   big
67           5.6         3.0          4.5         1.5 versicolor small
68           5.8         2.7          4.1         1.0 versicolor   big
69           6.2         2.2          4.5         1.5 versicolor   big
70           5.6         2.5          3.9         1.1 versicolor small
71           5.9         3.2          4.8         1.8 versicolor   big
72           6.1         2.8          4.0         1.3 versicolor   big
73           6.3         2.5          4.9         1.5 versicolor   big
74           6.1         2.8          4.7         1.2 versicolor   big
75           6.4         2.9          4.3         1.3 versicolor   big
76           6.6         3.0          4.4         1.4 versicolor   big
77           6.8         2.8          4.8         1.4 versicolor   big
78           6.7         3.0          5.0         1.7 versicolor   big
79           6.0         2.9          4.5         1.5 versicolor   big
80           5.7         2.6          3.5         1.0 versicolor small
81           5.5         2.4          3.8         1.1 versicolor small
82           5.5         2.4          3.7         1.0 versicolor small
83           5.8         2.7          3.9         1.2 versicolor   big
84           6.0         2.7          5.1         1.6 versicolor   big
85           5.4         3.0          4.5         1.5 versicolor small
86           6.0         3.4          4.5         1.6 versicolor   big
87           6.7         3.1          4.7         1.5 versicolor   big
88           6.3         2.3          4.4         1.3 versicolor   big
89           5.6         3.0          4.1         1.3 versicolor small
90           5.5         2.5          4.0         1.3 versicolor small
91           5.5         2.6          4.4         1.2 versicolor small
92           6.1         3.0          4.6         1.4 versicolor   big
93           5.8         2.6          4.0         1.2 versicolor   big
94           5.0         2.3          3.3         1.0 versicolor small
95           5.6         2.7          4.2         1.3 versicolor small
96           5.7         3.0          4.2         1.2 versicolor small
97           5.7         2.9          4.2         1.3 versicolor small
98           6.2         2.9          4.3         1.3 versicolor   big
99           5.1         2.5          3.0         1.1 versicolor small
100          5.7         2.8          4.1         1.3 versicolor small
101          6.3         3.3          6.0         2.5  virginica   big
102          5.8         2.7          5.1         1.9  virginica   big
103          7.1         3.0          5.9         2.1  virginica   big
104          6.3         2.9          5.6         1.8  virginica   big
105          6.5         3.0          5.8         2.2  virginica   big
106          7.6         3.0          6.6         2.1  virginica   big
107          4.9         2.5          4.5         1.7  virginica small
108          7.3         2.9          6.3         1.8  virginica   big
109          6.7         2.5          5.8         1.8  virginica   big
110          7.2         3.6          6.1         2.5  virginica   big
111          6.5         3.2          5.1         2.0  virginica   big
112          6.4         2.7          5.3         1.9  virginica   big
113          6.8         3.0          5.5         2.1  virginica   big
114          5.7         2.5          5.0         2.0  virginica small
115          5.8         2.8          5.1         2.4  virginica   big
116          6.4         3.2          5.3         2.3  virginica   big
117          6.5         3.0          5.5         1.8  virginica   big
118          7.7         3.8          6.7         2.2  virginica   big
119          7.7         2.6          6.9         2.3  virginica   big
120          6.0         2.2          5.0         1.5  virginica   big
121          6.9         3.2          5.7         2.3  virginica   big
122          5.6         2.8          4.9         2.0  virginica small
123          7.7         2.8          6.7         2.0  virginica   big
124          6.3         2.7          4.9         1.8  virginica   big
125          6.7         3.3          5.7         2.1  virginica   big
126          7.2         3.2          6.0         1.8  virginica   big
127          6.2         2.8          4.8         1.8  virginica   big
128          6.1         3.0          4.9         1.8  virginica   big
129          6.4         2.8          5.6         2.1  virginica   big
130          7.2         3.0          5.8         1.6  virginica   big
131          7.4         2.8          6.1         1.9  virginica   big
132          7.9         3.8          6.4         2.0  virginica   big
133          6.4         2.8          5.6         2.2  virginica   big
134          6.3         2.8          5.1         1.5  virginica   big
135          6.1         2.6          5.6         1.4  virginica   big
136          7.7         3.0          6.1         2.3  virginica   big
137          6.3         3.4          5.6         2.4  virginica   big
138          6.4         3.1          5.5         1.8  virginica   big
139          6.0         3.0          4.8         1.8  virginica   big
140          6.9         3.1          5.4         2.1  virginica   big
141          6.7         3.1          5.6         2.4  virginica   big
142          6.9         3.1          5.1         2.3  virginica   big
143          5.8         2.7          5.1         1.9  virginica   big
144          6.8         3.2          5.9         2.3  virginica   big
145          6.7         3.3          5.7         2.5  virginica   big
146          6.7         3.0          5.2         2.3  virginica   big
147          6.3         2.5          5.0         1.9  virginica   big
148          6.5         3.0          5.2         2.0  virginica   big
149          6.2         3.4          5.4         2.3  virginica   big
150          5.9         3.0          5.1         1.8  virginica   big

Bar chart comparing size in species

library(ggplot2)
data("iris")

iris.new <- 
iris %>% 
  mutate(size=ifelse(Sepal.Length < median(Sepal.Length), # creates a new dataset where a new column size is added. the size column categorizes rows on if they are less than the median.
                     "small", "big"))
ggplot(iris.new, aes(x = Species,
                     fill = size)) + # initializes a ggplot mapping species to x-axis and size to the fill color 
  geom_bar(position = "dodge") + # draws bars for each size are placed next to each other for comparison 
  scale_fill_brewer(palette = "Dark2") # applies a specific color palette to the fill aesthetic

The bar chart shows the distribution of iris flowers by species and sepal length, classified as either “small” or “big.” Setosa flowers are mostly small, versicolor has a balanced mix of small and big, and virginica flowers are mostly big. This highlights size differences in sepal length across species.