This notes file is a place for you to practice writing and executing the code found in the lecture notes. This is also a spot where you can (and should) write your own notes and thoughts. Explain what you are doing in each code chunk in your own words.
Before class, run the following code chunk to make sure it works. If it does not, try to understand what the error message is telling you. Refer to the FAQ page for assistance.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
ncbirths <- openintro::ncbirths
head(ncbirths)
## # A tibble: 6 × 13
## fage mage mature weeks premie visits marital gained weight lowbirthweight
## <int> <int> <fct> <int> <fct> <int> <fct> <int> <dbl> <fct>
## 1 NA 13 younger … 39 full … 10 not ma… 38 7.63 not low
## 2 NA 14 younger … 42 full … 15 not ma… 20 7.88 not low
## 3 19 15 younger … 37 full … 11 not ma… 38 6.63 not low
## 4 21 15 younger … 41 full … 6 not ma… 34 8 not low
## 5 NA 15 younger … 39 full … 9 not ma… 27 6.38 not low
## 6 NA 15 younger … 38 full … 19 not ma… 22 5.38 low
## # ℹ 3 more variables: gender <fct>, habit <fct>, whitemom <fct>
mean(ncbirths$fage)
## [1] NA
ggplot(ncbirths, aes(premie))+geom_bar()
table(ncbirths$habit, useNA="always")
##
## nonsmoker smoker <NA>
## 873 126 1
summary(ncbirths$fage)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 14.00 25.00 30.00 30.26 35.00 55.00 171
x <- c("green", NA, 3)
is.na(x)
## [1] FALSE TRUE FALSE
sum(is.na(ncbirths$fage))
## [1] 171
two common way to do this is with table() and summary()
Create a frequency table for whether or not the baby was born underweight.
table(ncbirths$lowbirthweight, useNA = "ifany")
##
## low not low
## 111 889
Do it again but show if there are any missing.
table(ncbirths$lowbirthweight, useNA = "always")
##
## low not low <NA>
## 111 889 0
Summary statistics for the number of visits.
summary(ncbirths$visits)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 10.0 12.0 12.1 15.0 30.0 9
data[data$variable==value] # this is an example code to show how you would input this information
Example 1: Too low birth weight
Set all records where weight=1 to missing.
summary(ncbirths$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 6.380 7.310 7.101 8.060 11.750
ncbirths$weight[ncbirths$weight==1] <- NA
Confirm it worked by creating a box plot of weight.
boxplot(ncbirths$weight)
ncbirths$weight[ncbirths$weight < 4] <- NA
boxplot(ncbirths$weight)
ncbirths$new_variable <- ncbirths$gained # how to add more vairiables
Create a new variable wtgain_mom the weight gained by
the mother, that is not due to the baby by subtracting
weight from gained.
ncbirths$wtgain_mom <- ncbirths$gained - ncbirths$weight
Confirm this variable was created correctly
head(ncbirths[,c('gained', 'weight', 'wtgain_mom')])
## # A tibble: 6 × 3
## gained weight wtgain_mom
## <int> <dbl> <dbl>
## 1 38 7.63 30.4
## 2 20 7.88 12.1
## 3 38 6.63 31.4
## 4 34 8 26
## 5 27 6.38 20.6
## 6 22 5.38 16.6
Make a new variable underage on the
NCbirths data set. If mage is under 18, then
the value of this new variable is underage, else it is
labeled as adult.
ncbirths$underage <- ifelse(ncbirths$mage < 18, "underage", "adult")
Confirm it worked.
table(ncbirths$underage, useNA="always")
##
## adult underage <NA>
## 963 37 0
ncbirths[ncbirths$mage %in% c(17,18),c('mage', 'underage')]
## # A tibble: 57 × 2
## mage underage
## <int> <chr>
## 1 17 underage
## 2 17 underage
## 3 17 underage
## 4 17 underage
## 5 17 underage
## 6 17 underage
## 7 17 underage
## 8 17 underage
## 9 17 underage
## 10 17 underage
## # ℹ 47 more rows
table(ncbirths$mature)
##
## mature mom younger mom
## 133 867
ncbirths$mature %>% table()
## .
## mature mom younger mom
## 133 867
ncbirths$mage %>% mean()
## [1] 27