1

Use the built-in swiss data set for this problem.

1.1

Find the average fertility rate for the 47 French-speaking provinces, and find the number of provinces with fertility rate above 60%.

head(swiss)
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
Porrentruy        76.1        35.3           9         7    90.57
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6
Porrentruy               26.6
mean(swiss$Fertility)
[1] 70.14255
which(swiss$Fertility>60)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 20 21 22 25 26 27
[24] 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

1.2

Use abbreviate to shorten the names of the 47 provinces in the ‘swiss’ data set. Compare the results to using substr to extract 4 characters from the names of provinces. Which of the two functions gives better shortened names?

ANS: abbreviate與substr的不同處在於abbreviate創造一個比minlenth要長,而且不重複的縮寫,而substr會直接取特定位置的字元,不論裡面是否具有符號、空格,或是取了重複的字元。因此我認為abbreviate比較好。

abbreviate(rownames(swiss))
  Courtelary     Delemont Franches-Mnt      Moutier   Neuveville 
      "Crtl"       "Dlmn"       "Fr-M"       "Motr"       "Nvvl" 
  Porrentruy        Broye        Glane      Gruyere       Sarine 
      "Prrn"       "Broy"       "Glan"       "Gryr"       "Sarn" 
     Veveyse        Aigle      Aubonne     Avenches     Cossonay 
      "Vvys"       "Aigl"       "Abnn"       "Avnc"       "Cssn" 
   Echallens     Grandson     Lausanne    La Vallee       Lavaux 
      "Echl"       "Grnd"       "Lsnn"       "LVll"       "Lavx" 
      Morges       Moudon        Nyone         Orbe         Oron 
      "Mrgs"       "Modn"       "Nyon"       "Orbe"       "Oron" 
     Payerne Paysd'enhaut        Rolle        Vevey      Yverdon 
      "Pyrn"       "Pys'"       "Roll"       "Vevy"       "Yvrd" 
     Conthey    Entremont       Herens     Martigwy      Monthey 
      "Cnth"       "Entr"       "Hrns"       "Mrtg"       "Mnth" 
  St Maurice       Sierre         Sion       Boudry La Chauxdfnd 
      "StMr"       "Sirr"       "Sion"       "Bdry"       "LChx" 
    Le Locle    Neuchatel   Val de Ruz ValdeTravers V. De Geneve 
      "LLcl"       "Ncht"       "VldR"       "VldT"       "V.DG" 
 Rive Droite  Rive Gauche 
      "RvDr"       "RvGc" 
substr(rownames(swiss),1,4)
 [1] "Cour" "Dele" "Fran" "Mout" "Neuv" "Porr" "Broy" "Glan" "Gruy" "Sari"
[11] "Veve" "Aigl" "Aubo" "Aven" "Coss" "Echa" "Gran" "Laus" "La V" "Lava"
[21] "Morg" "Moud" "Nyon" "Orbe" "Oron" "Paye" "Pays" "Roll" "Veve" "Yver"
[31] "Cont" "Entr" "Here" "Mart" "Mont" "St M" "Sier" "Sion" "Boud" "La C"
[41] "Le L" "Neuc" "Val " "Vald" "V. D" "Rive" "Rive"

1.3

How many provinces have over 50% Catholic? Define these provinces as Catholic and the other provinces as Protestant. Which kind of provinces has a higher average fertility rate? Which kind of provinces has a higher average education rate beyond primary school for draftees?

ANS: 有18個province的天主教徒佔了50%以上,Catholic的平均fertility較高,Protestant的役男平均教育程度較高。

length(which(swiss$Catholic>50))
[1] 18
Catholic<-which(swiss$Catholic>50)
Protestant<-which(swiss$Catholic<=50)
mean(swiss$Fertility[Catholic])-mean(swiss$Fertility[Protestant])
[1] 10.24042
mean(swiss$Education[Catholic])-mean(swiss$Education[Protestant])
[1] -3.02682

3

Download the data file in junior school project and read it into your currect R session. Assign the data set to a data frame object called jsp.

setwd('K:\\Dropbox\\1042_dataM')
jsp<-read.table('juniorSchools.txt',h=T)
head(jsp)
  school class sex soc ravens pupil english math year
1     S1    C1   G   9     23    P1      72   23    0
2     S1    C1   G   9     23    P1      80   24    1
3     S1    C1   G   9     23    P1      39   23    2
4     S1    C1   B   2     15    P2       7   14    0
5     S1    C1   B   2     15    P2      17   11    1
6     S1    C1   B   2     22    P3      88   36    0

3.5

Re-label the values of the variable ‘junior school year’: One = 1, Two = 2, Three = 3.

jsp$year<-jsp$year+1
head(jsp)
  school class sex soc ravens pupil english math year
1     S1    C1   G   9     23    P1      72   23    1
2     S1    C1   G   9     23    P1      80   24    2
3     S1    C1   G   9     23    P1      39   23    3
4     S1    C1   B   2     15    P2       7   14    1
5     S1    C1   B   2     15    P2      17   11    2
6     S1    C1   B   2     22    P3      88   36    1

3.6

Re-name the variable ‘sex’ as ‘gender’.

names(jsp)[names(jsp)=="sex"] <- "gender"
head(jsp)
  school class gender soc ravens pupil english math year
1     S1    C1      G   9     23    P1      72   23    1
2     S1    C1      G   9     23    P1      80   24    2
3     S1    C1      G   9     23    P1      39   23    3
4     S1    C1      B   2     15    P2       7   14    1
5     S1    C1      B   2     15    P2      17   11    2
6     S1    C1      B   2     22    P3      88   36    1

3.7

Move the variable ‘student ID’ from the 6th column to the third column and shift the rest down one column.

jsp<-jsp[c(1,2,6,3,4,5,7,8,9)]
head(jsp)
  school class pupil gender soc ravens english math year
1     S1    C1    P1      G   9     23      72   23    1
2     S1    C1    P1      G   9     23      80   24    2
3     S1    C1    P1      G   9     23      39   23    3
4     S1    C1    P2      B   2     15       7   14    1
5     S1    C1    P2      B   2     15      17   11    2
6     S1    C1    P3      B   2     22      88   36    1

3.8

Write jsp out as a csv file.

ANS: write.csv(jsp,‘jsp.csv’)

4

Solve the problem of data type conversion in the following R script.

y <- MASS::minn38
str(y$phs)
 Factor w/ 4 levels "C","E","N","O": 1 1 1 1 1 1 1 3 3 3 ...
y$phs <- as.numeric(y$phs)
y$phs<-factor(y$phs,labels=c("c","E","N","O"))
str(y$phs)
 Factor w/ 4 levels "c","E","N","O": 1 1 1 1 1 1 1 3 3 3 ...

5

Chatterjee and Hadi (Regression by Examples, 2006) provided a link to the right to work data set on their web page. Read the data into an R session.

p<-read.table('P005.txt',h=T, sep="\t")
head(p)
         City COL   PD URate     Pop Taxes Income RTWL
1     Atlanta 169  414  13.6 1790128  5128   2961    1
2      Austin 143  239  11.0  396891  4303   1711    1
3 Bakersfield 339   43  23.7  349874  4166   2122    0
4   Baltimore 173  951  21.0 2147850  5001   4654    0
5 Baton Rouge  99  255  16.0  411725  3965   1620    1
6      Boston 363 1257  24.4 3914071  4928   5634    0

6

The AAUP2 data set is a comma-delimited fixed column format text file with ’*‘for missing value. Import the file into R and indicate missing values by ’NA’.

aaup <-  read.fwf("aaup2.dat.txt",widths=c(6,31,3,4,5,rep(4,6),5,5,rep(4,4)))
#head(aaup)
aaup<-sapply(aaup,gsub,pattern='\\s+',replacement='')
aaup<-apply(aaup,2,gsub,pattern='\\*',replacement=NA)
aaup<-as.data.frame(aaup)
head(aaup,50)
      V1                           V2 V3  V4   V5   V6  V7  V8   V9  V10
1   1061      AlaskaPacificUniversity AK IIB  454  382 362 382  567  485
2   1063        Univ.Alaska-Fairbanks AK   I  686  560 432 508  914  753
3   1065        Univ.Alaska-Southeast AK IIA  533  494 329 415  716  663
4  11462        Univ.Alaska-Anchorage AK IIA  612  507 414 498  825  681
5   1002      AlabamaAgri.&Mech.Univ. AL IIA  442  369 310 350  530  444
6   1004       UniversityofMontevallo AL IIA  441  385 310 388  542  473
7   1008           AthensStateCollege AL IIB  466  394 351 396  558  476
8   1009        AuburnUniversity-Main AL   I  580  437 374 455  692  527
9   1012    BirminghamSouthernCollege AL IIB  498  379 322 401  655  501
10  1016          Univ.ofNorthAlabama AL IIB  506  412 359 411  607  508
11  1019            HuntingdonCollege AL IIB  339  303 287 301  421  371
12  1020       JacksonvilleStateUniv. AL IIA  461  389 338 386  585  496
13  1024         LivingstonUniversity AL IIB  360  304 258 300  433  369
14  1029           UniversityofMobile AL IIB  354  321 277 291  436  401
15  1033               OakwoodCollege AL IIB  301  290 283 290  375  363
16  1036            SamfordUniversity AL IIA  565  425 363 449  710  556
17  1041            SpringHillCollege AL IIB  431  352 311 373  518  425
18  1044              StillmanCollege AL IIB  321  288 251 272  388  354
19  1047     TroyStateUniversity-Main AL IIA  462  385 322 350  560  467
20  1050           TuskegeeUniversity AL IIA  410  352 306 327  487  415
21  1051          UniversityofAlabama AL   I  605  447 382 463  746  563
22  1052     Univ.AlabamaatBirmingham AL   I  633  445 366 461  786  569
23  1055     Univ.AlabamainHuntsville AL IIA  636  443 375 451  771  540
24  1057     UniversityofSouthAlabama AL IIA  542  426 370 418  645  514
25  8310      AuburnUniv.atMontgomery AL IIA  519  422 343 403  621  507
26  1085        Univ.Ark.atMonticello AR IIB  453  363 330 349  561  448
27  1086     Univ.ArkansasatPineBluff AR IIB  406  366 332 335  505  457
28  1088 ArkansasCollege(LyonCollege) AR IIB  499 <NA> 330 399  618 <NA>
29  1089       ArkansasTechUniversity AR IIB  439  381 337 374  550  479
30  1090      ArkansasStateUniv.-Main AR IIA  520  433 342 398  646  541
31  1092       Univ.ofCentralArkansas AR IIA  521  416 339 395  632  507
32  1094        UniversityoftheOzarks AR IIB  309  280 274 285  428  383
33  1098     HendersonStateUniversity AR IIB  447  375 341 375  561  480
34  1099               HendrixCollege AR IIB  485  399 362 421  664  534
35  1100          JohnBrownUniversity AR IIB  372  338 304 341  478  414
36  1101        Univ.Ark.atLittleRock AR IIA  536  416 372 415  651  506
37  1102    OuachitaBaptistUniversity AR IIB  422  355 289 342  529  445
38  1106       WilliamsBaptistCollege AR IIB <NA> <NA> 264 266 <NA> <NA>
39  1107       SouthernArk.Univ.-Main AR IIA  461  389 321 362  575  484
40  1108   Univ.Arkansas-Fayetteville AR   I  563  436 379 452  686  531
41 10311            HardingUniversity AR IIB  424  374 335 380  512  461
42  1074        GrandCanyonUniversity AZ IIB  358  328 274 299  439  395
43  1081       ArizonaStateUniversity AZ   I  603  449 399 489  723  549
44  1082    NorthernArizonaUniversity AZ   I  511  422 359 412  619  518
45  1083          UniversityofArizona AZ   I  648  462 411 537  771  561
46  1117       AzusaPacificUniversity CA IIA  430  375 318 368  573  494
47  1122              BiolaUniversity CA IIA  455  369 303 370  582  462
48  1131       CaliforniaInst.ofTech. CA   I  970  733 576 866 1204  909
49  1133      CaliforniaLutheranUniv. CA IIB  477  392 335 398  600  501
50  1137        Cal.St.Univ-Fullerton CA IIA  605  493 387 544  760  637
   V11  V12 V13 V14 V15 V16  V17
1  471  487   6  11   9   4   32
2  572  677  74 125 118  40  404
3  442  559   9  26  20   9   70
4  557  670 115 124 101  21  392
5  376  423  59  77 102  24  262
6  383  477  57  33  35   2  127
7  427  478  20  18  30   0   68
8  451  546 366 354 301  66 1109
9  404  523  34  25  27   3   89
10 445  503  67  40  66  27  200
11 347  366   8  15  19   2   44
12 436  493 106  42  66  58  272
13 313  363  27  25  33   4   89
14 346  363  17  19  31  19   86
15 355  362  18  28  28   3   77
16 476  578  83  46  77   9  215
17 374  449  23  17  14   1   55
18 312  335  13  18  18  10   59
19 392  426  25  59 100  19  204
20 362  387  57  65  85  45  254
21 483  580 267 206 206  76  762
22 475  587 106 163 107  19  406
23 455  548  72  87  98   7  282
24 450  504 119 103 142  64  434
25 411  483  56  54  63  28  201
26 402  428  23  24  48  22  117
27 415  419  40  33  71  46  192
28 402  488  14   5  21   2   42
29 427  471  44  71  52  13  180
30 431  498 103  87 141  63  394
31 416  482  93  89  82  76  340
32 376  391  14   9  15   2   40
33 435  477  55  42  33  25  155
34 453  562  24  26  14   1   65
35 375  426  23  22  10   3   58
36 450  504 128 117  72  63  398
37 375  435  36  14  28  19   97
38 327  331   4   4  12   4   24
39 402  453  27  31  34  23  115
40 461  550 314 198 225  54  806
41 401  460  80  46  41   8  179
42 332  363  20  18  24  12   74
43 489  593 576 445 251   7 1383
44 446  507 173 192 175  33  608
45 498  645 647 377 272   2 1349
46 428  488  43  62  32  12  156
47 371  464  32  57  31   6  126
48 717 1075 173  40  44   0  257
49 433  507  32  21  33   6   92
50 506  690 393 120 105   5  623

7

The titanic data set is the survival of Titanic passengers in an R data file format. Import the file into an R session and examine the file contents.

load('titanic.raw.rdata')
head(titanic.raw)
  Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No