Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions

# replace "#FFFFFF" with your own colors
colors <- c("#F70101", "#0116F7", "#F701C3", "#F7F701", "#000")

swatchplot(colors)

Problem 2: We will be using this dataset

NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 Ă— 10
##   Plural   Sex MomAge Weeks Gained Smoke BirthWeightGm   Low Premie Marital
##    <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>         <dbl> <dbl>  <dbl>   <dbl>
## 1      1     1     32    40     38     0         3147.     0      0       0
## 2      1     2     32    37     34     0         3289.     0      0       0
## 3      1     1     27    39     12     0         3912.     0      0       0
## 4      1     1     27    39     15     0         3856.     0      0       0
## 5      1     1     25    39     32     0         3430.     0      0       0
## 6      1     1     28    43     32     0         3317.     0      0       0

Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?

To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.

Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.

Hints:

`scale_x_discrete(
    name = "Mother",
    labels = c("non-smoker", "smoker")
geom_bar(
    position = position_stack(reverse = TRUE)

Introduction: The NCbirths dataset gives us information on births in North Carolina and details about the mother, baby, and pregnancy. In this analysis I try to find out whether a mother’s smoking status affects her baby’s birth weight and if it affects the likelihood of having a singleton, twin, or triplet birth. The columns I used in this analyses were Smoking, Plural, and BirthWeightGm.

Approach: The first boxplot compares birth weight distributions between smokers and non-smokers, using different categories and color. The second bar plot shows the distribution of singleton, twin, and triplet births within each group, with faceting to separate smokers and non-smokers. The third side-by-side bar plot removes faceting and uses position_dodge() to directly compare birth birth types between the two groups.

Analysis:

ggplot(NCbirths, aes(x = factor(Smoke), y = BirthWeightGm, fill = factor(Smoke))) +
  geom_boxplot() +
  scale_x_discrete(
    name = "Mother",
    labels = c("Non-Smoker", "Smoker"),
  ) +
  scale_fill_manual(values = colors[1:2]) + 
  labs(y = "Birth Weight (grams)", title = "Birth Weight Distribution by Smoking Status") +
  theme_minimal()

ggplot(NCbirths, aes(x = factor(Plural), fill = factor(Plural))) +
  geom_bar(position = position_stack(reverse = TRUE)) +
  scale_fill_manual(values = colors[3:5]) +
  facet_wrap(~Smoke, labeller = as_labeller(c(`0` = "Non-Smoker", `1` = "Smoker"))) +
  labs(x = "Number of Babies", y = "Count", fill = "Births", title = "Birth Counts by Smoking Status") +
  theme_minimal()

ggplot(NCbirths, aes(x = factor(Plural), fill = factor(Smoke))) +
  geom_bar(position = position_dodge()) +
  scale_fill_manual(values = colors[3:4], labels = c("Non-Smoker", "Smoker")) +
  labs(x = "Number of Babies", 
       y = "Count", 
       fill = "Smoking Status", 
       title = "Birth Counts by Smoking Status (Side-by-Side)") +
  theme_minimal()

Discussion: The first boxplot that compares birth weight between smokers and non-smokers shows an important trend that babies born to mothers who smoke have lower birth weights than those born to non-smoking mothers. This also applies with already established medical research, as smoking during pregnancy is known to reduce oxygen supply to the baby, leading to lower birth weight. The second faceted bar plot shows the distribution of birth plurality like singletons, twins, and triplets among smokers and non-smokers. Since the heights of the bars are so different, the barplot shows that singleton births dominate both categories. In the last side-by-side barplot it shows that singleton births are most common for both smokers and non-smokers. This tells us that smoking doesn’t have a big impact on the likelihood of multiple births.