Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions

# replace "#FFFFFF" with your own colors
colors <- c("#EBA741", "#8FEB41", "#4190EB", "#7941EB", "#EB4141")

swatchplot(colors)

Problem 2: We will be using this dataset

NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
##   Plural   Sex MomAge Weeks Gained Smoke BirthWeightGm   Low Premie Marital
##    <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>         <dbl> <dbl>  <dbl>   <dbl>
## 1      1     1     32    40     38     0         3147.     0      0       0
## 2      1     2     32    37     34     0         3289.     0      0       0
## 3      1     1     27    39     12     0         3912.     0      0       0
## 4      1     1     27    39     15     0         3856.     0      0       0
## 5      1     1     25    39     32     0         3430.     0      0       0
## 6      1     1     28    43     32     0         3317.     0      0       0

Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?

To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.

Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.

Hints:

`scale_x_discrete(
    name = "Mother",
    labels = c("non-smoker", "smoker")
geom_bar(
    position = position_stack(reverse = TRUE)

Introduction: The dataset used in this analysis is data from 2001 on a sample of 1409 birth records from the North Carolina state center for Health and Environmental Statistics. To investigate the relationship between whether a mother smokes or not and her baby’s weight at birth I will use three variables of the dataset. The first I’ll be using is ‘Smoke’(smoker mom: ‘1’ = yes, ‘0’ = no) to differentiate between smoking and non-smoking mothers, the second one is ‘BirthWeightGm’(birth weight in grams), which is the dependent variable that I am analizing to see if it differs between the two groups of moms. And lastly, ‘Plural’, which indicates if it was a single birth, twins, or triplets.

Approach: To answer the question of whether a mother’s smoking status affects birth weight I will perform a boxplot, and a stacked and side-by-side bar plots.

Analysis:

ggplot(NCbirths, aes(x = factor(Smoke), y = BirthWeightGm, fill = factor(Smoke))) +
  geom_boxplot() +
  scale_x_discrete(name = "Mother", labels = c("Non-Smoker", "Smoker")) + # Rename the x-axis  to "Mother" and set the category labels
  scale_y_continuous(name = "Birth Weight (grams)") +
  labs(title = "Relationship Between Birth Weight and Smoking Status") +
  scale_fill_manual(name = "Smoke", labels = c("No", "Yes"), values = c("#EBA741", "#EB4141")) +
  # Set colors for smoking status categories (orange for non-smokers, red for smokers)
  # Rename the legend title to "Smoke" and specify labels
  theme_minimal()

ggplot(NCbirths, aes(x = factor(Smoke), fill = factor(Plural))) +
  geom_bar() +
  scale_x_discrete(name = "Mother",labels = c("Non-Smoker", "Smoker")) +
  scale_y_continuous(name = "Count of Births") +
  scale_fill_manual(name = "Birth Type", values = c("#7941EB", "#4190EB", "#8FEB41"),
                    labels = c("Single", "Twins", "Triplets")) +
  labs(title = "Birth Type Distribution by Smoking Status") +
  facet_wrap(~Smoke, labeller = as_labeller(c("0" = "", "1" = ""))) + # The labels are set to empty strings to remove the facet titles
  theme_minimal()

ggplot(NCbirths, aes(x = factor(Plural), fill = factor(Plural))) +
  geom_bar() +
  scale_x_discrete(name = "Number of Births",labels = c("Single", "Twins", "Triplets")) +
  scale_y_continuous(name = "Count of Births") +
  scale_fill_manual(name = "Birth Type", values = c("#7941EB", "#4190EB", "#8FEB41"),
                    labels = c("Single", "Twins", "Triplets")) +
  labs(title = "Birth Type Distribution by Smoking Status") +
  facet_wrap(~Smoke, labeller = as_labeller(c("0" = "Non-Smoker", "1" = "Smoker"))) +
  theme_minimal() 

Discussion: The boxplot illustrates that babies from smoking mothers tend to have lower birth weights compared to those born to non-smoking mothers, with the median birth weight for non-smokers being a little bit higher. There are outliers in both groups. The bar charts show that more births come from non-smoking mothers, with single births being more common. Twins and triplets also appears to be higher among non-smokers. In conclusion, the data shows that smoking during pregnancy is associated with lower birth weights, lower total births, and lower likelihood of multiple births. The data may looks like that because smoking can increase complications during pregnancy and reduced fertility, but other factors can also impact birth weights.