Implement the steps for cleaning the OrdwayBirdsOrig data, including
WingChord.Data Entry Person makes more mistakes in entering the SpeciesName. Hint: compare SpeciesName to SpeciesNameCleaned.MistakeSubset = subset(namesCleaned, SpeciesNameCleaned = !SpeciesName)
## Error: object 'namesCleaned' not found
MistakeSubset
## Error: object 'MistakeSubset' not found
I have no idea how to approach this problem.
The countrySynonyms data file (you can load it with data(countrySynonmyms)) gives word synonyms for each country listed in the World-Map software, together with an official ISO 3-letter country code. This data set is in “wide” format. To turn it into narrow format, you can do this:
data(countrySynonyms)
foo <- melt(countrySynonyms, id.vars = c("ID", "ISO3"), value.name = "Country",
measure.vars = names(countrySynonyms)[-(1:2)], variable.name = "whence")
countrySynonymsLong <- subset(foo, !is.na(Country))
countrySynonymsLong$whence <- NULL
countrySynonymsLong, find the names of any countries in the Gapminder data that don't have corresponding names in countrySynonymsLong. Hint: join() and subset() are good tools.countrySynonymsLong that is the non-exact match to the name and fix the Gapminder data accordingly, so that every possible country in the Gapminder data shows up in the map.The countryRegions data file (you can load it with data(countryRegions) or get documentation with help(countryRegions) let's you aggregate countries in various ways.
data(nhanes)
nhanes = transform(nhanes, sex = ifelse(sex == 2, "F", "M"))
small = sample(nhanes, size = 250)
ggplot(data = small, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 41 rows containing missing values (geom_point).
tenAndUnder = subset(small, age <= 10)
ggplot(data = tenAndUnder, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 30 rows containing missing values (geom_point).
mScatter(tenAndUnder)
## Loading required package: manipulate
## Warning: there is no package called 'manipulate'
## Error: could not find function "error"
* Do the data given any reason to think that boys and girls grow differently up to age 10.
early on it seems that boys might be taller, but the difference if there actually is any is mostly gone if not reversed by eight or so.
* Is there any evidence for nonlinear growth in height for these kids?
The graph looks pretty linear to me, but it is possible that it isn't. It
might just be a few outliers, but it seems that the rate of growth is increasing by the age of ten. It might be helpful to examine more of the graph.
male25andLess = subset(small, age <= 25, sex = "M")
ggplot(data = male25andLess, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 32 rows containing missing values (geom_point).
Look at people aged greater than 20 years, first with a very small sample (about 100 people) and then with the entire data set.
MoreThan20Small = sample(nhanes, size = 100)
ggplot(data = MoreThan20Small, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 21 rows containing missing values (geom_point).
ggplot(data = nhanes, aes(x = age, y = hgt)) + geom_point() + aes(colour = sex)
## Warning: Removed 4609 rows containing missing values (geom_point).
Does the small data set clearly show a decrease in height with age? How about the entire data?
The small set debateably shows a decrease in height with age (it might just be due to some outliers), but the entire data makes the trend very clear.
Any evidence for nonlinear trends in height with age for those greater than 20 years old?
Here again it might just be outliers, but it seems that the rate of height decrease is increasing at the point that the data ends. It is questionable though, and since shrinking is allegedly not as big an effect as I thought, I cannot think of a very good reason for this to be the case.
ggplot(data = nhanes, aes(x = age, y = bmi)) + geom_point() + aes(colour = sex)
## Warning: Removed 4967 rows containing missing values (geom_point).
The BMI models weight as being proportional to height squared. This should give a straight line on a graph of log-weight against log-height. Is this a reasonable model for adults?