Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions
# replace "#FFFFFF" with your own colors
colors <- c("#1B998B", "#2D3047", "#FFFD82", "#FF9B71", "#E84855")
swatchplot(colors)
Problem 2: We will be using this dataset
NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
## Plural Sex MomAge Weeks Gained Smoke BirthWeightGm Low Premie Marital
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 32 40 38 0 3147. 0 0 0
## 2 1 2 32 37 34 0 3289. 0 0 0
## 3 1 1 27 39 12 0 3912. 0 0 0
## 4 1 1 27 39 15 0 3856. 0 0 0
## 5 1 1 25 39 32 0 3430. 0 0 0
## 6 1 1 28 43 32 0 3317. 0 0 0
Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?
To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.
Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.
Hints:
labels =
to provide explicit labels so ggplot2 doesn’t write 0 and 1. like:`scale_x_discrete(
name = "Mother",
labels = c("non-smoker", "smoker")
Use scale_fill_manual to fill the colors. Your
colors need to be two colors from the 5 colors you picked in problem
1.
For the second part with the bar plot, use
position = position_stack(reverse = TRUE) This will stack
in reverse order so singletons come first, then twins, then triplets.
Your colors should be the other three colors from the previously picked
colors in problem 1. Finally, facet by Smoke
geom_bar(
position = position_stack(reverse = TRUE)
Introduction: NC births is a dataset published in 2004 by the state of North Carolina. It contains medical information about births that occurred within the state. There are 1409 observations of 10 different variables.
Approach: I chose to use the variables “Smoke”, “BirthWeightGm”, “Plural”, and “Premie”. “Smoke is a binary variable with a 1 indicating smoking behavior and a 0 indicating no smoking behavior.”BirthWeightGm” is a continuous numerical variable measuring the weight of infants at birth in grams. “Plural” is a discrete numerical variable measuring the number of infants born at once from one mother.Finally, “Premie” is a binary variable with a 1 indicating a premature birth and a 0 indicating an on-time birth.
Analysis:
NCbirths$Smoke=as.factor(NCbirths$Smoke)
ggplot(NCbirths, aes(x=BirthWeightGm, y=Smoke))+
geom_boxplot(aes(fill=Smoke))+
scale_y_discrete(
name="Mother",
labels=c("Non-Smoker","Smoker"))+
labs(x="Weight at Birth(g)", title="Smoking Effect on Birth Weight")+
scale_color_manual(values=c("#E84855","#1B998B"))
This is a boxplot comparing the differences in birth weight between mothers who smoke and mothers who do not. The Y axis represents the two possible outcomes in the Smoke variable, those being 1, and 0, here named Smoker and Non-Smoker respectively. The x axis indicates each observation’s birth weight in grams as a continuous scale.
NCbirths$Premie=as.factor(NCbirths$Premie)
NCbirths$Plural=as.character(NCbirths$Plural)
ggplot(NCbirths,aes(x=Premie, fill = Plural))+
geom_bar(position = position_stack(reverse = TRUE))+
scale_color_manual(values=c("#E84855","#1B998B","#FFFD82"))+
scale_x_discrete(
labels=c("On-Time Birth", "Premature Birth"),
name="Premature Vs On-time Births"
)+
labs(y="Frequency of Occurrence",
title="Frequency of Plural births")+
facet_wrap(~NCbirths$Smoke)
This is a “Faceted” barplot comparing the frequency of each type of birth count(i.e Red represents 1 child, Green represents twins, and Blue represents triplets) against whether the birth was premature or not. It is then faceted by whether the mother smoked or not.
NCbirths$Plural=as.character(NCbirths$Plural)
ggplot(NCbirths,aes(x=Premie, fill = Plural))+
geom_bar(position= "dodge")+
scale_color_manual(values=c("#E84855","#1B998B","#FFFD82"))+
scale_x_discrete(
labels=c("On-Time Birth", "Premature Birth"),
name="Premature Vs On-time Births"
)+
labs(y="Frequency of Occurrence",
title="Frequency of Plural births")+
facet_wrap(~NCbirths$Smoke)
This is another barplot with the same data and labels as before. However, this plot places the sections for different birth counts side by side.
Discussion: From the graphs presented above, it can be observed that babies born by mothers who smoke are on average a lower weight than those of non-smokers. It can also be seen that mothers who smoke have higher rates of having premature births and lower rates of having plural births. Additionally, plural births are more likely to be premature than single births.