infant_data_base <- read.csv("infant.csv")
library(FSA)
## Warning: package 'FSA' was built under R version 4.5.2
## ## FSA v0.10.1. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2

A - Is this an observational study or an experiment? Explain.

It is an observational study, the researchers didn’t tell the women to smoke or not to smoke during pregnancy they just observed who had already choose to or not.

B - Use R to compute descriptive statistics for the birthweights of babies born to mothers who smoked during pregnancy and for babies born to mothers who did not smoke during pregnancy. Use the Summarize() function from the FSA package (see page 4 of the Introduction to R document). Comment on the differences between the two groups, paying particular attention to the mean and standard deviation of birthweights.

Summarize(birthweight ~ mother, data = infant_data_base)
##      mother   n     mean       sd min  Q1 median  Q3 max
## 1 nonsmoker 742 123.0472 17.39869  55 113    123 134 176
## 2    smoker 484 114.1095 18.09895  58 102    115 126 163

The average weight of a baby born to a mother who doesn’t smoke is 9 ounces greater than a baby whose mother smokes. The standard deviation is similar but slightly lower for babies whose mothers didn’t smoke. The similarities in the SD indicates that the babies weight variability is very similar in both groups.

C - Create a side-by-side boxplot of the distribution of birthweight by maternal smoking status (see pages 7–8 of the Introduction to R document). Note that the boxes for the two groups have roughly the same width (IQR). What is the primary difference between the two distributions?

boxplot(birthweight ~ mother,
        data = infant_data_base,
        xlab = "Maternal Smoking Status",
        ylab = "Birthweight (ounces)",
        main = "Birthweight by Smoking Status")

The difference is where the distributions are centered. The IQR is similar but the median for smokers is lower so the box plot is shifted down

D - Create overlayed histograms of the birthweight distributions by maternal smoking status (see pages 6–7 of the Introduction to R document). Use 20 bins. How would you describe the shape of each distribution? Compare the center and variability of the two distributions.

ggplot(infant_data_base, aes(x = birthweight, fill = mother)) + 
  geom_histogram(alpha = 0.5, position = "identity", bins = 20) +
  labs(
      title = "Birthweight by Maternal Smoking",
      x = "Birthweight (ounces)",
      y = "Count")

Both of the distributions appear to not have significant skew. The smoker group is centered lower than the non smoker group and also is less steep in the center.

E - Create a new categorical variable called weight_type that classifies babies as under or normal based on birthweight. Define weight_type as under if birthweight is less than 88 ounces and as normal if birthweight is 88 ounces or greater. See page 14 of the Introduction to R document for instructions.

infant_data_base$weight_type <- "normal"
infant_data_base$weight_type[infant_data_base$birthweight < 88] <- "under"

F - Create a two-way table called weight_table that summarizes the number of normal and underweight babies by maternal smoking status. Smoking status should appear in the rows, and weight type should appear in the columns. See page 11 of the Introduction to R document. Be sure to print the table.

weight_table <- table(infant_data_base$mother,
                      infant_data_base$weight_type)

weight_table
##            
##             normal under
##   nonsmoker    720    22
##   smoker       448    36

G - Compute the conditional distribution of low-birth-weight incidence given maternal smoking status (see page 12 of the Introduction to R document). Write a few sentences summarizing the conditional distribution and describing what you observe about the effect of smoking status on the likelihood of an underweight baby.

prop.table(weight_table, 1)
##            
##                 normal      under
##   nonsmoker 0.97035040 0.02964960
##   smoker    0.92561983 0.07438017

The proportion of underweight babies (7.43%) is higher for the mothers who smoked during pregnancy than those who didn’t (2.96%). Smoking while pregnant increase the likelihood of having a under weight baby.

H - Based on your data, would you say that a woman who smokes during pregnancy is sure to have a low birthweight baby? Explain.

No, it’s not a sure thing but on average a women is more likely to birth an under weight baby.

I - (True or False) According to this data, women who smoked during pregnancy tended to have a lower birth weight baby at a higher rate than nonsmokers. Explain your answer.

True, in the table from question G you can see this increased likelihood for an under weight baby.

J - Given your answer to part (a), what type of conclusions can be drawn from the data? Discuss possible confounding.

Because this is an observational study we can’t make any conclusions about what causes the lower birth weight, the data shows association, not causation.