Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions
# replace "#FFFFFF" with your own colors
colors <- c("#FFFFFF", "#FFFFFF", "#FFFFFF", "#FFFFFF", "#FFFFFF")
swatchplot(colors)
Problem 2: We will be using this dataset
NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
## Plural Sex MomAge Weeks Gained Smoke BirthWeightGm Low Premie Marital
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 32 40 38 0 3147. 0 0 0
## 2 1 2 32 37 34 0 3289. 0 0 0
## 3 1 1 27 39 12 0 3912. 0 0 0
## 4 1 1 27 39 15 0 3856. 0 0 0
## 5 1 1 25 39 32 0 3430. 0 0 0
## 6 1 1 28 43 32 0 3317. 0 0 0
Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?
To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.
Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.
Hints:
labels =
to provide explicit labels so ggplot2 doesn’t write 0 and 1. like:`scale_x_discrete(
name = "Mother",
labels = c("non-smoker", "smoker")
Use scale_fill_manual to fill the colors. Your
colors need to be two colors from the 5 colors you picked in problem
1.
For the second part with the bar plot, use
position = position_stack(reverse = TRUE) This will stack
in reverse order so singletons come first, then twins, then triplets.
Your colors should be the other three colors from the previously picked
colors in problem 1. Finally, facet by Smoke
geom_bar(
position = position_stack(reverse = TRUE)
Introduction:
We will be using the NCbirths data set which contains 1409 observations. The state of North Carolina released to the public a data set containing information about births in the state, this information has been useful to medical researchers who are studying the practices and habits of expecting mothers and their newly born children. This data set includes different variables, today we will be using Smoke, a categorical variable which indicates whether the mother smoked during pregnancy. BirthWeightGm is the Birth Weight in Grams. Plural can be 1, 2, or 3 to represent the amount of babies born with 1 representing a single baby, 2 twins and 3 triplets. The variables describing birthweight and whether the mother smoked are necessary to answer our question because we want to find whether there is a relationship between these two variables.
The question we want to investigate is: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?
Approach: Your approach here.
Analysis:
# Convert Smoke to a factor with levels "No" (for 0) and "Yes" (for 1)
NCbirths$Smoke <- factor(NCbirths$Smoke, levels = c(0, 1), labels = c("No", "Yes"))
# Checking to see the conversion worked
levels(NCbirths$Smoke)
## [1] "No" "Yes"
str(NCbirths)
## spc_tbl_ [1,409 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Plural : num [1:1409] 1 1 1 1 1 1 1 1 1 1 ...
## $ Sex : num [1:1409] 1 2 1 1 1 1 2 2 2 2 ...
## $ MomAge : num [1:1409] 32 32 27 27 25 28 25 15 21 27 ...
## $ Weeks : num [1:1409] 40 37 39 39 39 43 39 42 39 40 ...
## $ Gained : num [1:1409] 38 34 12 15 32 32 75 25 28 37 ...
## $ Smoke : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ BirthWeightGm: num [1:1409] 3147 3289 3912 3856 3430 ...
## $ Low : num [1:1409] 0 0 0 0 0 0 0 0 0 0 ...
## $ Premie : num [1:1409] 0 0 0 0 0 0 0 0 0 0 ...
## $ Marital : num [1:1409] 0 0 0 0 0 0 0 1 0 1 ...
## - attr(*, "spec")=
## .. cols(
## .. Plural = col_double(),
## .. Sex = col_double(),
## .. MomAge = col_double(),
## .. Weeks = col_double(),
## .. Gained = col_double(),
## .. Smoke = col_double(),
## .. BirthWeightGm = col_double(),
## .. Low = col_double(),
## .. Premie = col_double(),
## .. Marital = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
# Birth Weight by Smoking Status Boxplots
ggplot(NCbirths, aes(x = Smoke, y = BirthWeightGm, fill = Smoke)) +
geom_boxplot() +
labs(title = "Birth Weight Distribution by Maternal Smoking Status",
x = "Smoking Status",
y = "Birth Weight (grams)",
fill = "Smoking Status") +
scale_x_discrete(labels = c("No", "Yes")) + # X axis Labels
theme_minimal()
# Bar Plot of the number of mothers who are smokers vs non-smokers
ggplot(NCbirths, aes(x = Smoke, fill = Smoke)) +
geom_bar() +
labs(title = "Number of Mothers who are Smokers and Non-Smokers",
x = "Smoking Status",
y = "Number of Mothers",
fill = "Smoking Status") +
scale_x_discrete(labels = c("No", "Yes")) +
theme_minimal()
Discussion: Your discussion of results here.
Our analysis shows a trend that mothers who smoke during pregnancy tend to have a lower birth weight on average than mothers who don’t smoke, this is in line with current scientific conclusions about smoking during pregnancy and it’s affects on the child’s birth weight.
The Bar plot also shows how common smoking is among pregnant mothers, the bar plot showed that the majority of the mothers of the data set do no smoke during pregnancy but there is a small amount who do, which is under 250 out of 1409 observations in this data set. The results of the bar graph also make sense as a majority of mother don’t smoke during pregnancy to prevent harm to their child. These are just visual trends from the graphs and are we did not use any methods to analyze the data.