Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions
# replace "#FFFFFF" with your own colors
colors <- c("#F1948A", "#F5B041", "#F4D03F", "#27AE60", "#5DADE2")
swatchplot(colors)
Problem 2: We will be using this dataset
NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
## Plural Sex MomAge Weeks Gained Smoke BirthWeightGm Low Premie Marital
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 32 40 38 0 3147. 0 0 0
## 2 1 2 32 37 34 0 3289. 0 0 0
## 3 1 1 27 39 12 0 3912. 0 0 0
## 4 1 1 27 39 15 0 3856. 0 0 0
## 5 1 1 25 39 32 0 3430. 0 0 0
## 6 1 1 28 43 32 0 3317. 0 0 0
Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?
To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.
Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.
Hints:
labels =
to provide explicit labels so ggplot2 doesn’t write 0 and 1. like:`scale_x_discrete(
name = "Mother",
labels = c("non-smoker", "smoker")
Use scale_fill_manual to fill the colors. Your
colors need to be two colors from the 5 colors you picked in problem
1.
For the second part with the bar plot, use
position = position_stack(reverse = TRUE) This will stack
in reverse order so singletons come first, then twins, then triplets.
Your colors should be the other three colors from the previously picked
colors in problem 1. Finally, facet by Smoke
geom_bar(
position = position_stack(reverse = TRUE)
Introduction: In this project I am using the NCbirths dataset which uses birth records from North Carolina State Center for Health and Environmental Statistics, providing detailed information about the newborn babies and their mothers. This data-set is valuable for analyzing factors that influence birth outcomes, such as birth weight and health conditions related to maternal behaviors/factors.
For this project, I am investigating the impact of maternal smoking on the babies birth weight. The question we seek to answer is if there is a relationship between whether a mother smokes or not and her baby’s weight at birth?
To answer this question, I will use the columns “BirthWeightGm” and “Smoke” from the data-set. The column BirthWeightGm represents the birth weight of each newborn in grams from each mother. This variable it a dependent variable. The second column used in my project is Smoke, a binary double variable (0,1) showing whether the mother smoked during pregnancy or not. Smoke is an independent numerical variable in this NCbirths data-set. The third column, Plural, this variable represents the plurality of each birth, with values such as 1 for single births, 2 for twins, and 3 for triplets. These are the necessary columns used to answer the question from The NCbirths data-set.
Approach: My approach to investigate the impact of maternal smoking on babies birth weight; I will first create a box plot comparing babies birth weights between non-smoking and smoking mothers. A higher average birth weight among non-smokers would suggest smoking negatively affects neonatal weight. Secondly, I will create a bar graph displaying the count of smoking versus non-smoking mothers, then filling by the number of children single, twins, or triplets.
Analysis:
p1 <- NCbirths |>
ggplot(aes(x = as.factor(Smoke), y = BirthWeightGm, fill = as.factor(Smoke))) + # Defining the x, y, and fill. Here I must change the Variable Smoke to a factor.
geom_boxplot() +
labs(x = "Mother's Smoked Status", # Using labs to label the graph.
y = "Birth Weight",
fill = "Smoked Status",
title = "Boxplot of Birth Weight by the Mother's Smoking Satatus.",
caption = "North Carolina State Center for Health and Environmental Statistics (cleaned)") +
scale_x_discrete(
labels = c("0" = "Non-smoker", "1" = "Smoker")) + # Changing the factored label of Smoke, 0 and 1, to label them non-smoker and smoker
scale_fill_manual(
values = c("0" = "#F4D03F", "1" = "#5DADE2"), # Assigning colors to non-smoker and smoker.
labels = c("0" = "Non-smoker", "1" = "Smoker")) # Labeling 0 and 1 under smoked status to Non-smoker and smoker.
p1
p2 <- NCbirths |>
ggplot(aes(x= as.factor(Smoke), fill = as.factor(Plural))) +
geom_bar(position = position_stack(reverse = TRUE)) +
scale_x_discrete(labels = c("Non-Smoker", "Smoker")) +
facet_wrap(~ as.factor(Smoke), labeller = as_labeller (c("0" = "Non-smoker", "1" = "Smoker"))) +
labs(x = "Smoke",
y = " Number of Mother's",
fill = "Birth type's ",
title = "Bar Plot of Smoke Status by Birth Type. ") +
scale_fill_manual(values = c("1" = "#F1948A", "2" = "#F5B041", "3" = "#27AE60" ),
labels = c("1" = "Single", "2" = "Twins", "3" = "Triplets"))+
theme_bw()
p2
extra_credit <- NCbirths |>
ggplot(aes(x = as.factor(Plural), fill = factor(Plural))) +
geom_bar() +
scale_x_discrete(labels = c("Single", "Twins", "Triplets")) +
facet_wrap(~ as.factor(Smoke), labeller = as_labeller(c("0" = "Non-Smoker", "1" = "Smoker"))) +
labs(x = "Births Types",
y = " Count of Birth type",
fill = "Birth Type's",
title = " Bar Plot of Birth Type by Smoking Status") +
scale_fill_manual(values = c("1" = "#F1948A", "2" = "#F5B041", "3" = "#27AE60" ),
labels = c("1" = "Single", "2" = "Twins", "3" = "Triplets")) +
theme_bw()
extra_credit
Discussion: The boxplot suggests that babies born to smoking mothers tend to have lower birth weights than those born to non-smoker mother’s. Also, there is a good amount of outliers for each group (non-smokers and smokers). Since there are more non-smoking mothers, this could mean a possibilty of a statistical error making this data not representative. The bar charts show that single births are more common, and twins or triplets appear more frequent among non-smokers than with smokers. In the end, the data suggests that smoking during a pregnancy is associated with lower birth weights and fewer twins or triplet births.