DATA110-Homework#2

Author

Cristian Mendez

Open Libaries and Dataset

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
data("heights")

##Scatterplot

ggplot(heights, aes(x = sex, y = height, fill = sex)) + 
  geom_boxplot(alpha = 0.8, outlier.color = "#9F9F9F") +
  labs(title = "Distribution of Heights by Sex",
       subtitle = "Boxplot comparison of male and female heights",
       x = "Sex",
       y = "Height (inches)",
       fill = "Sex") +
  theme_light(base_size = 10) +
  scale_color_manual(values = c("Male" = "#E6705B", "Female" = "#AF86EC"))
Warning: No shared levels found between `names(values)` of the manual scale and the
data's colour values.

The dataset used for this visualization is called heights. This dataset includes the heights and the sex of different individuals. This dataset is ideal for exploring differences in height between sexes and demonstrating basic comparative visualization techniques. To visualize the distribution of heights by sex, I chose to present a boxplot. The x-axis represents the sex, while the y-axis shows the corresponding height values. Each box summarizes the height data for that sex group by displaying the median and any potential outliers. This allows for a clear visual comparison of height distributions between males and females.