Practice with ggplot2

Exploratory analysis

Load the nc data set into your workspace:

download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")

1000 obs. of 9 variables

The nc dataset has 1000 observations on 13 different variables, some discrete and some continuous. The meaning of each variable is as follows.

variable	description
`fage`	father’s age in years.
`mage`	mother’s age in years.
`mature`	maturity status of mother.
`weeks`	length of pregnancy in weeks.
`premie`	whether the birth was classified as premature (premie) or full-term.
`visits`	number of hospital visits during pregnancy.
`marital`	whether mother is `married` or `not married` at birth.
`gained`	weight gained by mother during pregnancy in pounds.
`weight`	weight of the baby at birth in pounds.
`lowbirthweight`	whether baby was classified as low birthweight (`low`) or not (`not low`).
`gender`	gender of the baby, `female` or `male`.
`habit`	status of the mother as a `nonsmoker` or a `smoker`.
`whitemom`	whether mom is `white` or `not white`.

Use ggplot2 to create a histogram of mother’s age. Enter two things below: the command you used to create the histogram, and a sentence or two describing what you see. (remember, you need to install and load ggplot2 before you can use it. See the slides for more detail)

 ggplot(nc, aes(x =mage))+geom_histogram()
This shows that the mother's age and pregency is the most at age 20. Many of them are 20 but this may be rounded because people don't want to tell you what they think is un-nice numbers.

download.file(“http://www.openintro.org/stat/data/nc.Rdata”, destfile = nc.Rdata”)

Use ggplot2 to make a graph (scatterplot) of mother’s age versus father’s age, where every point represents one couple. Again, enter two things below: the command you used to create the graph, and a sentence or two describing what you see.

 ggplot(nc, aes(x = mage, y = fage)) +geom_point()
```It should show a scatter plot that has an incrasing slope. Some will have male and female ages being the same for the couple and others have larger differences in age.Mother's age is at the bottome on the x-axis and the father's age is on the y-axis.

3. One problem with the graph from question 2 is that many points lie on top of one another. For example, all observations where mother and father were both 20 years old lie on top of one another. Do the graph from question 2 again, but replace `geom_point()` with `geom_jitter()`. Enter the new command you used to create the graph, and a sentence or two describing what you see.

```ggplot(data = nc, aes(x = mage, y = fage)) + 
  geom_jitter()
This will show a postive correlation. The only difference is that this one has more data points across the graph with a large range of outliers and overlapped points. 
4. Now repeat the graph from question 3, but make separate graphs for mothers who were smokers and mothers who were nonsmokers. Enter the command you used to create the graph, and a sentence or two describing what you see.

ggplot(data = nc, aes(x = mage, y = fage)) + geom_jitter() + facet_wrap(~ habit)
```The nonsmoker graph has a lot more plots. Showing that more people don't smoke rather than they do. There is also another plot that has non-applicable data. This means that people had not added any data. R studios will do this because it is unable to fill in the missing data.

Practice with ggplot2

Alexis Redfairn-Ogunyemi

North Carolina births

Exploratory analysis