Data and Computing Fundamentals
Week 1: Out-of-Class Activity
Data on the Web
Look at the various data sets available at http://www.gapminder.org/data/.
Based on the names of the data sets available, construct several hypotheses that you would like to test using this sort of country-by-country data. Later in the course, you'll learn how to assemble such data in a form you can use for your analysis.
Look at one of the data sets (by pressing the little “View” magnifying-glass icon). What are the cases? What are the variables?
Cases are countries and years, variables are employment rate .
Body Shape Data
Read the the simplified version of the NHANES dataset using
nhanes = data(nhane)
## Warning: data set 'nhane' not found
You can use help(nhanes) to see t Look at the names of the variables and make sure you understand what is the meaning of each of them.
You're going to be making scatter plots using mScatter(). To generate the graphics in this document, remember to cut-and-paste the command output of mScatter() into a fenced R command in this document.
Create a Sample
It takes several seconds to generate a graph using this number of data points. To speed things up, take a random sample of 2000 people and develop your graphs with that.
small = sample(nhanes, 2000)
## Error: cannot take a sample larger than the population when 'replace =
## FALSE'
To start up the scatter-plot making process, use mScatter()
mScatter(small)
Then, when you know exactly what you want, you can translate your commands to use the whole data set, if appropriate.
QUESTION:
Describe the relationship between height and weight. Is there reason to think that it's different for the two sexes?
A: Weight generally increases with height, although it is far from perfectly linear. There is reason to think that it is different for the two sexes, based on the reality, but it is harder to tell if this is actually born out by the data.
QUESTION:
Describe the relationship between weight and BMI. Is it different from the two sexes? Where do the people with diabetes show up?
A: There is a relationship between weight an BMI. BMI generally increases with weight, but the trend is not nearly as clear as the relationship between weight and height. It is slightly different for both sexes. The people with diabetes show up towards the higher extremes on both axes.
QUESTION:
Is there a relationship between BMI and age? Where do the people with diabetes show up?
A: There is a relationship between age an BMI. BMI seems to increase with age to a degree. Here again the people with diabetes show up towards the higher extreme, on only one axis this time.
QUESTION:
Is cholesterol level a good predictor of the development of diabetes?
A: I am not sure if it would be of value in real-life prediction of diabetes, but there does seem to be a link in the data between cholesterol and diabetes.
TASK:
Calculate body-mass index according to the formula mh2 and see how it corresponds to the body mass index in the data. (Hint: You can plot out one versus the other.)
TASK:
Come up with a hypothesis of your own and address it with a graphic.