Exercises: Week 4

Data and Computing Fundamentals

Data Cleaning

Implement the steps for cleaning the OrdwayBirdsOrig data, including

MistakeSubset = subset(namesCleaned, SpeciesNameCleaned = !SpeciesName)
## Error: object 'namesCleaned' not found
MistakeSubset
## Error: object 'MistakeSubset' not found

Using the Cleaned Data

I have no idea how to approach this problem.

Using Grouped Data for Individual Cases

Using the Cleaned Ordway Bird data:

Country Data

The countrySynonyms data file (you can load it with data(countrySynonmyms)) gives word synonyms for each country listed in the World-Map software, together with an official ISO 3-letter country code. This data set is in “wide” format. To turn it into narrow format, you can do this:

data(countrySynonyms)
foo <- melt(countrySynonyms, id.vars = c("ID", "ISO3"), value.name = "Country", 
    measure.vars = names(countrySynonyms)[-(1:2)], variable.name = "whence")
countrySynonymsLong <- subset(foo, !is.na(Country))
countrySynonymsLong$whence <- NULL

The countryRegions data file (you can load it with data(countryRegions) or get documentation with help(countryRegions) let's you aggregate countries in various ways.

Model Fitting

Height versus Age

data(nhanes)
nhanes = transform(nhanes, sex = ifelse(sex == 2, "F", "M"))
small = sample(nhanes, size = 250)
ggplot(data = small, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 41 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-4

tenAndUnder = subset(small, age <= 10)
ggplot(data = tenAndUnder, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 30 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-5

mScatter(tenAndUnder)
## Loading required package: manipulate
## Warning: there is no package called 'manipulate'
## Error: could not find function "error"
* Do the data given any reason to think that boys and girls grow differently up to age 10.
early on it seems that boys might be taller, but the difference if there actually is any is mostly gone if not reversed by eight or so.

* Is there any evidence for nonlinear growth in height for these kids?
The graph looks pretty linear to me, but it is possible that it isn't. It 

might just be a few outliers, but it seems that the rate of growth is increasing by the age of ten. It might be helpful to examine more of the graph.

male25andLess = subset(small, age <= 25, sex = "M")
ggplot(data = male25andLess, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 32 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-6

Further Exploration:

Height

Look at people aged greater than 20 years, first with a very small sample (about 100 people) and then with the entire data set.

MoreThan20Small = sample(nhanes, size = 100)
ggplot(data = MoreThan20Small, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 21 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-7

ggplot(data = nhanes, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 4609 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-7

BMI

ggplot(data = nhanes, aes(x = age, y = bmi)) + geom_point() + aes(colour = sex)
## Warning: Removed 4967 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-8

For Math 135 graduates …

The BMI models weight as being proportional to height squared. This should give a straight line on a graph of log-weight against log-height. Is this a reasonable model for adults?