Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions

# replace "#FFFFFF" with your own colors
colors <- c("#671E5A", "#155EF2", "#EC648F", "#D0D63B", "#FB3231")

swatchplot(colors)

Problem 2: We will be using this dataset

NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
##   Plural   Sex MomAge Weeks Gained Smoke BirthWeightGm   Low Premie Marital
##    <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>         <dbl> <dbl>  <dbl>   <dbl>
## 1      1     1     32    40     38     0         3147.     0      0       0
## 2      1     2     32    37     34     0         3289.     0      0       0
## 3      1     1     27    39     12     0         3912.     0      0       0
## 4      1     1     27    39     15     0         3856.     0      0       0
## 5      1     1     25    39     32     0         3430.     0      0       0
## 6      1     1     28    43     32     0         3317.     0      0       0

Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?

To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.

Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.

Hints:

`scale_x_discrete(
    name = "Mother",
    labels = c("non-smoker", "smoker")
geom_bar(
    position = position_stack(reverse = TRUE)

Introduction: This data is being used to find the relationship between maternal smoking status and birth weight, using data from North Carolina births. The goal is to determine if smoking during pregnancy affects the weight of the baby at birth

Approach: To analyze this relationship, we will create a boxplot to compare the distribution of birth weights between smokers and non-smokers. We will also use a bar plot to see the count of smokers and non-smokers.

Analysis:

ggplot(NCbirths, aes(x = factor(Smoke), y = BirthWeightGm)) + 
  geom_boxplot(fill = c(colors[1], colors[2])) +
  scale_x_discrete(name = "Mother", labels = c("Non-smoker", "Smoker")) +
  labs(y = "Birth Weight in grams", title = "Relationship between Birth Weight and Smoking Status") 

ggplot(NCbirths, aes(x = factor(Plural), fill = factor(Plural))) +
  geom_bar(position = position_stack(reverse = TRUE)) +
  scale_fill_manual(values = c(colors[3], colors[4], colors[5])) + 
  facet_wrap(~ factor(Smoke, labels = c("Non-smoker", "Smoker"))) + 
  labs(x = "Number of Babies (Singleton, Twin, Triplet)", y = "Count",
       title = "Distribution of Births by Smoking Status and Plural Status")

Discussion: The boxplot reveals that non-smokers tend to have higher average birth weights compared to smokers which suggests that smoking during pregnancy may negatively impact birth weight and the overall health of the child. Additionally, the bar plot shows a higher number of non-smokers than smokers in the dataset, which provides further context, indicating that the general population in this study has a higher proportion of non-smokers.