Feedback on Initial Design of Island Project

General Comments applicable to all groups

Try to organize your data in a case/variable format. Each row is a case, each column is a variable. For instance:

Town Year Births Population
Macondo 1960 4 549
Alta 1960 2 350

If you are using spreadsheet software for data analysis, put your raw data in one sheet and your analysis in another.

Note that each person has a unique ID number. To find it, look at the URL that appears for the person's detailed history. It's something like http://island.maths.uq.edu.au/show.php?id=are47 The id is the part after id=. You can use this to find a person quickly if you need to.

Don't be shy about asking me for help in analyzing your data. Knowing how to program a computer is a big help.

Emily Murphy, Rachele Higgs, Michelle Kiang, Sam Gonzalez

You wrote, “We want to include causes of death but we are not sure how to organize cause of death by year and break the data down by age.”

Try something like this …

Year Age Cause Sex
1928 17 Summer Pain M
1928 64 Twilight F

While you are at it, you might record other data such as whether the person was a smoker, what their parents died of, etc. That would give you a head start on the analytic phase of the project.

Kelsey Woida, Olivia Starkie, Claire James, Aryssa Burton, Selja Vassnes, Gabriela Norton

You're looking at smoking rates on a town-by-town basis. I'd suggest that you extend this to look at age-specific smoking rates. An appropriate data format will have an individual person as a case and might look like this:

Town Age Smoker Sex
Alta 31 No F
Alto 23 Yes M

and so on.

If you want to set things up for your analytic project, perhaps take the extra time to record some things that might relate to smoking, e.g. parents smoke.

Then you can easily graph out smoking fraction versus age and sex.

David Lopez, Miriam Magana, Johnna O'Keefe, Alexia Malaga, Ayda Alemayehu

I'll need a bit more detail on how you are going to collect your data. From your spreadsheet, it looks like you are using the town as a unit of analysis, counting up the number of females and the number of teens, and also counting births to teens. But I'm not sure what a “second generation female” is.

Let's arrange a time to talk about this in some detail, just to make sure I understand what you're looking for.

Lucas Asher, Oliver Kendall, Perry Campbell, Andrew Notaras

You're interested in tracking deaths from epidemics across the years. I gather that you are going through the death records, counting the number of people who died in a given year and lumping that into major outbreaks.

I'll suggest another organization: by month

Year Month Twilight Summers Pain Town
June 1960 3 5 Macondo
July 1960 1 0 Macondo
June 1960 2 0 Alta

and so on. I can help you to produce plots based on information in this format.

Even better, if you are interested, is to record the individual deaths. If you are going to do this, I'd encourage you to find a friend who knows about computer science and who can re-arrange the data for you from the XML-like HTML format in the town-hall death records.

Lauren MacNeill, Antonina Storniolo, Mark Skopec

Your project is described as looking for a correlation between diabetes and twilight fever. That's an analytic project, but there's no reason why you can't set yourself up for that while doing your descriptive project.

I'd suggest that you describe the age- and sex-specific distribution of diabetes and Twilight Fever. Make the cases a single individual, as on your printout, but include the age of onset of diabetes.

Town Name (or ID) Sex Age of TF Age of Diabetes Onset
Alta are232 F 19 NA
Alta djs653 M 45 25

and so on.

Katherine Ehrenreich, Rachel Lochner, Nadejda Orlowski, Rosie Glenn-Finer, Nancy Rocha

I think your approach is reasonable. But there are others.

The approach you are taking is town by town, looking at the official demographics to determine the number of women of reproductive age and at the number of births in 2012. Given this, your data might look like

Town Age 10-19 Age 20-29 30-39 Age 40-49 Births 2012 Births 2011
Eden 1 6 2 0 2 1
Macondo 237 140 107 115 38 42

and so on. In such data, the towns are the cases.

In processing these data, you can add together the age specific counts, perhaps dividing the number in Age 10-19 by 2 and similarly for Age 40-49.

It would take about an hour for a couple of people to collect such data for the entire island.

Another approach would be to sample women at random and use the woman as a case.

Who YOB Town B1 B2 B3 B4 B5 B6 B7 B8
are323 1960 Alta 1984 1989

and so on.

I can show you how to process such data so that you can calculate how likely a woman of a given age was to have a baby. In essence, each woman would give you data on population of women for many years, and on the number of births for the years that her babies (if any) were born.

I'd prefer the 2nd format, even though it's more complicated. It let's you look at other things as well, such as the spacing between babies, number of babies per woman, etc. Looking forward to the analytic phase of the project, it would let you examine “risk factors” for having a baby.

Irene Gibson, Glenna Gransee, Mariah Geiger, Ben Eagan-Van Meter

Your spreadsheet is a mixture of analysis and data. That's a common practice, but I want to encourage you to separate the two. The base data that you've collected could be organized like this:

Town Year Age
Fairhaven 1988 19
Fairhaven 1988 28

and so on. Everything else in the spreadsheet is analysis.

For the purpose of calculating average age at death as a function of year or town, this would be adequate. Of course, average age at death is not quite the same thing as life expectancy. If it's life expectancy you're after, you'll need to know the population at each age and the age-specific death rate.

Another way to get at the issue is to look at birth records, and then trace through to the eventual death year.

Town Birth Death Sex Cause
Alta 1945 2007 M TF
Alta 1983 NA F NA

With lots of records of this sort, I can show you how to reconstruct the age-specific death rates.

Ryan Sutley, Patrick Murphy, Hally Chaffin

You're looking at deaths at different ages and the number of deaths from Summer's Pain. The “case” is a town, “Alta”, “Riverside”, etc. and you have recorded the number of deaths in each of 8 age groups.

I'll suggest a different organization. Just record the deaths:

Town Age Sex Cause
Alta 35 F Summer's Pain
Riverside 13 M Drowning

and so on. Then we can turn this into the various other quantities you are interested in, such as the age-specific count and fraction of summer's pain deaths.

Elizaveta Bekmanis, Charles Kilian, Yuris Martinez, Ingrid Korsgard

You're comparing mortality rates pre- and post-1960. I gather that you're interested in the ages at death, but I think you should also examine the causes of death and see if they are different in the two eras. I suggest that you pick two clearly separate eras, say 1920-1940 and 1990-2010. Recording, year, age at death and cause let you tabulate the data to examine difference in age at death as well as differences in causes.

Year AgeAtDeath Cause Sex Town
1991 67 Summers Pain M Alta
1922 43 Twilight Fever F Macondo

and so on.