# replace "#FFFFFF" with your own colors
colors <- c("#5c004e", "#494982", "#5c5a66", "#824949", "#8a967a")

swatchplot(colors)

NCbirths <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
## Rows: 1409 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): Plural, Sex, MomAge, Weeks, Gained, Smoke, BirthWeightGm, Low, Pre...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NCbirths)
## # A tibble: 6 × 10
##   Plural   Sex MomAge Weeks Gained Smoke BirthWeightGm   Low Premie Marital
##    <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>         <dbl> <dbl>  <dbl>   <dbl>
## 1      1     1     32    40     38     0         3147.     0      0       0
## 2      1     2     32    37     34     0         3289.     0      0       0
## 3      1     1     27    39     12     0         3912.     0      0       0
## 4      1     1     27    39     15     0         3856.     0      0       0
## 5      1     1     25    39     32     0         3430.     0      0       0
## 6      1     1     28    43     32     0         3317.     0      0       0
??NCbirths

Question: Is there a relationship between whether a mother smokes or not and her baby’s weight at birth?

In this project, I will be investigating whether or not there is a relationship between whether a mother smokes or not and her baby’s health. I will be looking specifically into the baby’s weight to determine health. To help me with this project, I will be using a dataset which consists of 1,000 births from North Carolina in 2004. The dataset includes 10 variables, however, I will be focusing on “Smoke”, which is the mother’s smoking status, and “BirthWeightGm”, the baby’s weight at birth in grams. Additionally, I have incorporated the use of the variable “Plural”, which represents the amount of children the mother had at once.

To approach this project, I first read through the problem and then read through my dataset. I wanted to get a feel of what my variables looked like, how the code should be specific to them, and what they may appear as on a graph. Using previous practice problems, I compared my problems to come up with the correct code. Using help from other resources, I was able to break down and understand any errors I got or issues I saw in my graphs.

ggplot(NCbirths, aes(x=factor(Smoke), y=BirthWeightGm, fill=factor(Smoke))) +
    geom_boxplot() +
    scale_x_discrete(
        name="Mother's Smoking Status",
        labels=c("Non-smoker", "Smoker")) +
  scale_fill_manual(values=c("#5c004e", "#5c5a66")) +
  labs(y="Baby's Birth Weight (gm)", fill="Smoking Status", title="Baby's Birth Weight vs. Mother's Smoking Status")

ggplot(NCbirths, aes(x ="", fill = factor(Plural))) +
    geom_bar(position=position_stack(reverse=TRUE)) +  
    facet_wrap(~Smoke) +
    scale_fill_manual(values=c("#494982", "#824949", "#8a967a")) +
    labs(x="Plural", y="Count", fill = "Number of Children")

ggplot(NCbirths, aes(x=factor(Plural), fill=factor(Plural))) +
    geom_bar() +
    facet_wrap(~Smoke) +
    scale_fill_manual(values = c("#494982", "#824949", "#8a967a")) +
    labs(x="Plural", y="Count", fill="Number of Children") 

The results of my analysis were mostly clear. From the first graph it is apparent that the average weight of babies of mothers who did not smoke was higher than the babies of mothers who did smoke. Although, this may not be completely reliable in determining whether the babies are healthier or not. The second and third graphs display the number of children each mother had during the birth. This ranged from 1 to 3 children. It is obvious that there is an abundant amount of mothers that had one child during their birth. Again, this does not truly indicate the health of the babies. There is also a larger chunk of mother which did not smoke, which could be misleading.