DATA 621 Week #1 Textbook Exercises

LMR 1.1

The dataset teengamb concers a study of teenage gambline in Britain. Make a numerical and graphical summary of the data, commenting on any features that you find interesting. Limit the output you present to a quantity that a busy reader would find sufficient to get a basic understanding of the data.

Estimates show teens in Britain gamble between 0 and 150 pounds annually. Males gamble more than females. They gamble on average nearly 8 times as much.

sex Min. 25% 50% Mean 75% Max. Count
Female 0 0.100 1.70 3.865789 6.000 19.6 19
Male 0 2.775 14.25 29.775000 42.175 156.0 28

Teens earned between 31 and 780 pounds annually. Those who earned more were more likely to gamble more.

LMR 1.3

The dataset prostate is from a study on 97 men with prostate conacer who were due to receive a radical prostatectomy. Make a numerical and graphical summary of the data as in the first question.

These men are generally older. Their age ranges from 41 to 79 years. The median age is 65.

Min. 25% 50% Mean 75% Max.
41 60 65 63.86598 68 79

There is a direct relationship between the prostate cancer volume and the capsular penetration.

Min. 25% 50% Mean 75% Max.
-1.38629 -1.38629 -0.79851 -0.1793637 1.17865 2.90417

The volume of the cancer is also positively correlated with the PSA.

Min. 25% 50% Mean 75% Max.
-0.43078 1.73166 2.59152 2.478387 3.05636 5.58293

LMR 1.4

The dataset sat comes from a study entitled “Getting What You Pay For: The Debate Over Equity in Public School Expenditures.” Make a numerical and graphical summary of the data as in the first question.

SAT scores range from 844 to 1107 with an average arround 966. These data have considerable variability.

Min. 25% 50% Mean 75% Max. Count
844 897.25 945.5 965.92 1032 1107 50

At first blush it appears that high schools that spend less per pupil do better on the SAT than those who spend more. Strange!?!

But when you examine the popularity of the SAT test a different pattern emerges. In certain states the ACT is the college enterance test of choice. Thus the SAT scores in these state are only for the students who want to apply to a school back East.

LMR 1.5

The dataset divusa contains data on divorces in the United States from 1920 to 1996. Make a numerical and graphical summary of the data as in the first question.

The US divorce rate has generally been increasing over time. There was a spike in the 1940s and it has been trending down since the 1980.

Decade Min. 25% 50% Mean 75% Max. Count
1920 6.6 7.200 7.35 7.44000 7.800 8.0 10
1930 6.1 7.200 7.65 7.60000 8.375 8.7 10
1940 8.8 10.225 11.10 11.90000 13.200 17.9 10
1950 8.9 9.300 9.45 9.58000 9.900 10.3 10
1960 9.2 9.600 10.30 10.63000 11.125 13.4 10
1970 14.9 17.300 19.80 19.24000 21.100 22.8 10
1980 20.4 20.900 21.40 21.45000 21.700 22.6 10
1990 19.5 20.150 20.50 20.47143 20.900 21.2 7

There are a couple of things that have a similar spike. For example military personnel per 1000…

…the marriage rate…

…and most strikingly, the female labor force participation rate.

Mike Silva

2019-09-07