Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions
# replace "#FFFFFF" with your own colors
colors <- c("#671E5A", "#155EF2", "#EC648F", "#D0D63B", "#FB3231")
swatchplot(colors)
Problem 2: We will be using this dataset
NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
## Plural Sex MomAge Weeks Gained Smoke BirthWeightGm Low Premie Marital
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 32 40 38 0 3147. 0 0 0
## 2 1 2 32 37 34 0 3289. 0 0 0
## 3 1 1 27 39 12 0 3912. 0 0 0
## 4 1 1 27 39 15 0 3856. 0 0 0
## 5 1 1 25 39 32 0 3430. 0 0 0
## 6 1 1 28 43 32 0 3317. 0 0 0
Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?
To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.
Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.
Hints:
labels =
to provide explicit labels so ggplot2 doesn’t write 0 and 1. like:`scale_x_discrete(
name = "Mother",
labels = c("non-smoker", "smoker")
Use scale_fill_manual to fill the colors. Your
colors need to be two colors from the 5 colors you picked in problem
1.
For the second part with the bar plot, use
position = position_stack(reverse = TRUE) This will stack
in reverse order so singletons come first, then twins, then triplets.
Your colors should be the other three colors from the previously picked
colors in problem 1. Finally, facet by Smoke
geom_bar(
position = position_stack(reverse = TRUE)
Introduction: This data is being used to find the relationship between maternal smoking status and birth weight, using data from North Carolina births. The goal is to determine if smoking during pregnancy affects the weight of the baby at birth
Approach: To analyze this relationship, we will create a boxplot to compare the distribution of birth weights between smokers and non-smokers. We will also use a bar plot to see the count of smokers and non-smokers.
Analysis:
ggplot(NCbirths, aes(x = factor(Smoke), y = BirthWeightGm)) +
geom_boxplot(fill = c(colors[1], colors[2])) +
scale_x_discrete(name = "Mother", labels = c("Non-smoker", "Smoker")) +
labs(y = "Birth Weight in grams", title = "Relationship between Birth Weight and Smoking Status")
ggplot(NCbirths, aes(x = factor(Plural), fill = factor(Plural))) +
geom_bar(position = position_stack(reverse = TRUE)) +
scale_fill_manual(values = c(colors[3], colors[4], colors[5])) +
facet_wrap(~ factor(Smoke, labels = c("Non-smoker", "Smoker"))) +
labs(x = "Number of Babies (Singleton, Twin, Triplet)", y = "Count",
title = "Distribution of Births by Smoking Status and Plural Status")
Discussion: The boxplot reveals that non-smokers tend to have higher average birth weights compared to smokers which suggests that smoking during pregnancy may negatively impact birth weight and the overall health of the child. Additionally, the bar plot shows a higher number of non-smokers than smokers in the dataset, which provides further context, indicating that the general population in this study has a higher proportion of non-smokers.