Problem 1: Use the color picker app from the colorspace package (colorspace::choose_color()) to create a qualitative color scale containing five colors. You will be using these colors in the next questions

# replace "#FFFFFF" with your own colors
colors <- c("#1B998B", "#2D3047", "#FFFD82", "#FF9B71", "#E84855")

swatchplot(colors)

Problem 2: We will be using this dataset

NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
##   Plural   Sex MomAge Weeks Gained Smoke BirthWeightGm   Low Premie Marital
##    <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>         <dbl> <dbl>  <dbl>   <dbl>
## 1      1     1     32    40     38     0         3147.     0      0       0
## 2      1     2     32    37     34     0         3289.     0      0       0
## 3      1     1     27    39     12     0         3912.     0      0       0
## 4      1     1     27    39     15     0         3856.     0      0       0
## 5      1     1     25    39     32     0         3430.     0      0       0
## 6      1     1     28    43     32     0         3317.     0      0       0

Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?

To answer this question, you will plot the distribution of birth weight by smoking status, and we will also plot the number of mothers that are smokers and non-smokers, respectively.

Use a boxplot for the first part of the question and faceted bar plot for the second question part of the question.

Hints:

`scale_x_discrete(
    name = "Mother",
    labels = c("non-smoker", "smoker")
geom_bar(
    position = position_stack(reverse = TRUE)

Introduction: NC births is a dataset published in 2004 by the state of North Carolina. It contains medical information about births that occurred within the state. There are 1409 observations of 10 different variables.

Approach: I chose to use the variables “Smoke”, “BirthWeightGm”, “Plural”, and “Premie”. “Smoke is a binary variable with a 1 indicating smoking behavior and a 0 indicating no smoking behavior.”BirthWeightGm” is a continuous numerical variable measuring the weight of infants at birth in grams. “Plural” is a discrete numerical variable measuring the number of infants born at once from one mother.Finally, “Premie” is a binary variable with a 1 indicating a premature birth and a 0 indicating an on-time birth.

Analysis:

NCbirths$Smoke=as.factor(NCbirths$Smoke)
ggplot(NCbirths, aes(x=BirthWeightGm, y=Smoke))+
  geom_boxplot(aes(fill=Smoke))+
  scale_y_discrete(
    name="Mother",
    labels=c("Non-Smoker","Smoker"))+
  labs(x="Weight at Birth(g)", title="Smoking Effect on Birth Weight")+
  scale_color_manual(values=c("#E84855","#1B998B"))

This is a boxplot comparing the differences in birth weight between mothers who smoke and mothers who do not. The Y axis represents the two possible outcomes in the Smoke variable, those being 1, and 0, here named Smoker and Non-Smoker respectively. The x axis indicates each observation’s birth weight in grams as a continuous scale.

NCbirths$Premie=as.factor(NCbirths$Premie)
NCbirths$Plural=as.character(NCbirths$Plural)
ggplot(NCbirths,aes(x=Premie, fill = Plural))+
  geom_bar(position = position_stack(reverse = TRUE))+
  scale_color_manual(values=c("#E84855","#1B998B","#FFFD82"))+
  scale_x_discrete(
    labels=c("On-Time Birth", "Premature Birth"),
    name="Premature Vs On-time Births"
  )+
  labs(y="Frequency of Occurrence", 
       title="Frequency of Plural births")+
  facet_wrap(~NCbirths$Smoke)

This is a “Faceted” barplot comparing the frequency of each type of birth count(i.e Red represents 1 child, Green represents twins, and Blue represents triplets) against whether the birth was premature or not. It is then faceted by whether the mother smoked or not.

NCbirths$Plural=as.character(NCbirths$Plural)
ggplot(NCbirths,aes(x=Premie, fill = Plural))+
  geom_bar(position= "dodge")+
  scale_color_manual(values=c("#E84855","#1B998B","#FFFD82"))+
  scale_x_discrete(
  labels=c("On-Time Birth", "Premature Birth"),
    name="Premature Vs On-time Births"
  )+
  labs(y="Frequency of Occurrence", 
       title="Frequency of Plural births")+
  facet_wrap(~NCbirths$Smoke)

This is another barplot with the same data and labels as before. However, this plot places the sections for different birth counts side by side.

Discussion: From the graphs presented above, it can be observed that babies born by mothers who smoke are on average a lower weight than those of non-smokers. It can also be seen that mothers who smoke have higher rates of having premature births and lower rates of having plural births. Additionally, plural births are more likely to be premature than single births.