Use the built-in swiss data set for this problem.
Find the average fertility rate for the 47 French-speaking provinces, and find the number of provinces with fertility rate above 60%.
head(swiss)
Fertility Agriculture Examination Education Catholic
Courtelary 80.2 17.0 15 12 9.96
Delemont 83.1 45.1 6 9 84.84
Franches-Mnt 92.5 39.7 5 5 93.40
Moutier 85.8 36.5 12 7 33.77
Neuveville 76.9 43.5 17 15 5.16
Porrentruy 76.1 35.3 9 7 90.57
Infant.Mortality
Courtelary 22.2
Delemont 22.2
Franches-Mnt 20.2
Moutier 20.3
Neuveville 20.6
Porrentruy 26.6
mean(swiss$Fertility)
[1] 70.14255
which(swiss$Fertility>60)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 21 22 25 26 27
[24] 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
Use abbreviate to shorten the names of the 47 provinces in the ‘swiss’ data set. Compare the results to using substr to extract 4 characters from the names of provinces. Which of the two functions gives better shortened names?
ANS: abbreviate與substr的不同處在於abbreviate創造一個比minlenth要長,而且不重複的縮寫,而substr會直接取特定位置的字元,不論裡面是否具有符號、空格,或是取了重複的字元。因此我認為abbreviate比較好。
abbreviate(rownames(swiss))
Courtelary Delemont Franches-Mnt Moutier Neuveville
"Crtl" "Dlmn" "Fr-M" "Motr" "Nvvl"
Porrentruy Broye Glane Gruyere Sarine
"Prrn" "Broy" "Glan" "Gryr" "Sarn"
Veveyse Aigle Aubonne Avenches Cossonay
"Vvys" "Aigl" "Abnn" "Avnc" "Cssn"
Echallens Grandson Lausanne La Vallee Lavaux
"Echl" "Grnd" "Lsnn" "LVll" "Lavx"
Morges Moudon Nyone Orbe Oron
"Mrgs" "Modn" "Nyon" "Orbe" "Oron"
Payerne Paysd'enhaut Rolle Vevey Yverdon
"Pyrn" "Pys'" "Roll" "Vevy" "Yvrd"
Conthey Entremont Herens Martigwy Monthey
"Cnth" "Entr" "Hrns" "Mrtg" "Mnth"
St Maurice Sierre Sion Boudry La Chauxdfnd
"StMr" "Sirr" "Sion" "Bdry" "LChx"
Le Locle Neuchatel Val de Ruz ValdeTravers V. De Geneve
"LLcl" "Ncht" "VldR" "VldT" "V.DG"
Rive Droite Rive Gauche
"RvDr" "RvGc"
substr(rownames(swiss),1,4)
[1] "Cour" "Dele" "Fran" "Mout" "Neuv" "Porr" "Broy" "Glan" "Gruy" "Sari"
[11] "Veve" "Aigl" "Aubo" "Aven" "Coss" "Echa" "Gran" "Laus" "La V" "Lava"
[21] "Morg" "Moud" "Nyon" "Orbe" "Oron" "Paye" "Pays" "Roll" "Veve" "Yver"
[31] "Cont" "Entr" "Here" "Mart" "Mont" "St M" "Sier" "Sion" "Boud" "La C"
[41] "Le L" "Neuc" "Val " "Vald" "V. D" "Rive" "Rive"
How many provinces have over 50% Catholic? Define these provinces as Catholic and the other provinces as Protestant. Which kind of provinces has a higher average fertility rate? Which kind of provinces has a higher average education rate beyond primary school for draftees?
ANS: 有18個province的天主教徒佔了50%以上,Catholic的平均fertility較高,Protestant的役男平均教育程度較高。
length(which(swiss$Catholic>50))
[1] 18
Catholic<-which(swiss$Catholic>50)
Protestant<-which(swiss$Catholic<=50)
mean(swiss$Fertility[Catholic])-mean(swiss$Fertility[Protestant])
[1] 10.24042
mean(swiss$Education[Catholic])-mean(swiss$Education[Protestant])
[1] -3.02682
Download the data file in junior school project and read it into your currect R session. Assign the data set to a data frame object called jsp.
setwd('K:\\Dropbox\\1042_dataM')
jsp<-read.table('juniorSchools.txt',h=T)
head(jsp)
school class sex soc ravens pupil english math year
1 S1 C1 G 9 23 P1 72 23 0
2 S1 C1 G 9 23 P1 80 24 1
3 S1 C1 G 9 23 P1 39 23 2
4 S1 C1 B 2 15 P2 7 14 0
5 S1 C1 B 2 15 P2 17 11 1
6 S1 C1 B 2 22 P3 88 36 0
Re-label the values of the variable ‘junior school year’: One = 1, Two = 2, Three = 3.
jsp$year<-jsp$year+1
head(jsp)
school class sex soc ravens pupil english math year
1 S1 C1 G 9 23 P1 72 23 1
2 S1 C1 G 9 23 P1 80 24 2
3 S1 C1 G 9 23 P1 39 23 3
4 S1 C1 B 2 15 P2 7 14 1
5 S1 C1 B 2 15 P2 17 11 2
6 S1 C1 B 2 22 P3 88 36 1
Re-name the variable ‘sex’ as ‘gender’.
names(jsp)[names(jsp)=="sex"] <- "gender"
head(jsp)
school class gender soc ravens pupil english math year
1 S1 C1 G 9 23 P1 72 23 1
2 S1 C1 G 9 23 P1 80 24 2
3 S1 C1 G 9 23 P1 39 23 3
4 S1 C1 B 2 15 P2 7 14 1
5 S1 C1 B 2 15 P2 17 11 2
6 S1 C1 B 2 22 P3 88 36 1
Move the variable ‘student ID’ from the 6th column to the third column and shift the rest down one column.
jsp<-jsp[c(1,2,6,3,4,5,7,8,9)]
head(jsp)
school class pupil gender soc ravens english math year
1 S1 C1 P1 G 9 23 72 23 1
2 S1 C1 P1 G 9 23 80 24 2
3 S1 C1 P1 G 9 23 39 23 3
4 S1 C1 P2 B 2 15 7 14 1
5 S1 C1 P2 B 2 15 17 11 2
6 S1 C1 P3 B 2 22 88 36 1
Write jsp out as a csv file.
ANS: write.csv(jsp,‘jsp.csv’)
Solve the problem of data type conversion in the following R script.
y <- MASS::minn38
str(y$phs)
Factor w/ 4 levels "C","E","N","O": 1 1 1 1 1 1 1 3 3 3 ...
y$phs <- as.numeric(y$phs)
y$phs<-factor(y$phs,labels=c("c","E","N","O"))
str(y$phs)
Factor w/ 4 levels "c","E","N","O": 1 1 1 1 1 1 1 3 3 3 ...
Chatterjee and Hadi (Regression by Examples, 2006) provided a link to the right to work data set on their web page. Read the data into an R session.
p<-read.table('P005.txt',h=T, sep="\t")
head(p)
City COL PD URate Pop Taxes Income RTWL
1 Atlanta 169 414 13.6 1790128 5128 2961 1
2 Austin 143 239 11.0 396891 4303 1711 1
3 Bakersfield 339 43 23.7 349874 4166 2122 0
4 Baltimore 173 951 21.0 2147850 5001 4654 0
5 Baton Rouge 99 255 16.0 411725 3965 1620 1
6 Boston 363 1257 24.4 3914071 4928 5634 0
The AAUP2 data set is a comma-delimited fixed column format text file with ’*‘for missing value. Import the file into R and indicate missing values by ’NA’.
aaup <- read.fwf("aaup2.dat.txt",widths=c(6,31,3,4,5,rep(4,6),5,5,rep(4,4)))
#head(aaup)
aaup<-sapply(aaup,gsub,pattern='\\s+',replacement='')
aaup<-apply(aaup,2,gsub,pattern='\\*',replacement=NA)
aaup<-as.data.frame(aaup)
head(aaup,50)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1061 AlaskaPacificUniversity AK IIB 454 382 362 382 567 485
2 1063 Univ.Alaska-Fairbanks AK I 686 560 432 508 914 753
3 1065 Univ.Alaska-Southeast AK IIA 533 494 329 415 716 663
4 11462 Univ.Alaska-Anchorage AK IIA 612 507 414 498 825 681
5 1002 AlabamaAgri.&Mech.Univ. AL IIA 442 369 310 350 530 444
6 1004 UniversityofMontevallo AL IIA 441 385 310 388 542 473
7 1008 AthensStateCollege AL IIB 466 394 351 396 558 476
8 1009 AuburnUniversity-Main AL I 580 437 374 455 692 527
9 1012 BirminghamSouthernCollege AL IIB 498 379 322 401 655 501
10 1016 Univ.ofNorthAlabama AL IIB 506 412 359 411 607 508
11 1019 HuntingdonCollege AL IIB 339 303 287 301 421 371
12 1020 JacksonvilleStateUniv. AL IIA 461 389 338 386 585 496
13 1024 LivingstonUniversity AL IIB 360 304 258 300 433 369
14 1029 UniversityofMobile AL IIB 354 321 277 291 436 401
15 1033 OakwoodCollege AL IIB 301 290 283 290 375 363
16 1036 SamfordUniversity AL IIA 565 425 363 449 710 556
17 1041 SpringHillCollege AL IIB 431 352 311 373 518 425
18 1044 StillmanCollege AL IIB 321 288 251 272 388 354
19 1047 TroyStateUniversity-Main AL IIA 462 385 322 350 560 467
20 1050 TuskegeeUniversity AL IIA 410 352 306 327 487 415
21 1051 UniversityofAlabama AL I 605 447 382 463 746 563
22 1052 Univ.AlabamaatBirmingham AL I 633 445 366 461 786 569
23 1055 Univ.AlabamainHuntsville AL IIA 636 443 375 451 771 540
24 1057 UniversityofSouthAlabama AL IIA 542 426 370 418 645 514
25 8310 AuburnUniv.atMontgomery AL IIA 519 422 343 403 621 507
26 1085 Univ.Ark.atMonticello AR IIB 453 363 330 349 561 448
27 1086 Univ.ArkansasatPineBluff AR IIB 406 366 332 335 505 457
28 1088 ArkansasCollege(LyonCollege) AR IIB 499 <NA> 330 399 618 <NA>
29 1089 ArkansasTechUniversity AR IIB 439 381 337 374 550 479
30 1090 ArkansasStateUniv.-Main AR IIA 520 433 342 398 646 541
31 1092 Univ.ofCentralArkansas AR IIA 521 416 339 395 632 507
32 1094 UniversityoftheOzarks AR IIB 309 280 274 285 428 383
33 1098 HendersonStateUniversity AR IIB 447 375 341 375 561 480
34 1099 HendrixCollege AR IIB 485 399 362 421 664 534
35 1100 JohnBrownUniversity AR IIB 372 338 304 341 478 414
36 1101 Univ.Ark.atLittleRock AR IIA 536 416 372 415 651 506
37 1102 OuachitaBaptistUniversity AR IIB 422 355 289 342 529 445
38 1106 WilliamsBaptistCollege AR IIB <NA> <NA> 264 266 <NA> <NA>
39 1107 SouthernArk.Univ.-Main AR IIA 461 389 321 362 575 484
40 1108 Univ.Arkansas-Fayetteville AR I 563 436 379 452 686 531
41 10311 HardingUniversity AR IIB 424 374 335 380 512 461
42 1074 GrandCanyonUniversity AZ IIB 358 328 274 299 439 395
43 1081 ArizonaStateUniversity AZ I 603 449 399 489 723 549
44 1082 NorthernArizonaUniversity AZ I 511 422 359 412 619 518
45 1083 UniversityofArizona AZ I 648 462 411 537 771 561
46 1117 AzusaPacificUniversity CA IIA 430 375 318 368 573 494
47 1122 BiolaUniversity CA IIA 455 369 303 370 582 462
48 1131 CaliforniaInst.ofTech. CA I 970 733 576 866 1204 909
49 1133 CaliforniaLutheranUniv. CA IIB 477 392 335 398 600 501
50 1137 Cal.St.Univ-Fullerton CA IIA 605 493 387 544 760 637
V11 V12 V13 V14 V15 V16 V17
1 471 487 6 11 9 4 32
2 572 677 74 125 118 40 404
3 442 559 9 26 20 9 70
4 557 670 115 124 101 21 392
5 376 423 59 77 102 24 262
6 383 477 57 33 35 2 127
7 427 478 20 18 30 0 68
8 451 546 366 354 301 66 1109
9 404 523 34 25 27 3 89
10 445 503 67 40 66 27 200
11 347 366 8 15 19 2 44
12 436 493 106 42 66 58 272
13 313 363 27 25 33 4 89
14 346 363 17 19 31 19 86
15 355 362 18 28 28 3 77
16 476 578 83 46 77 9 215
17 374 449 23 17 14 1 55
18 312 335 13 18 18 10 59
19 392 426 25 59 100 19 204
20 362 387 57 65 85 45 254
21 483 580 267 206 206 76 762
22 475 587 106 163 107 19 406
23 455 548 72 87 98 7 282
24 450 504 119 103 142 64 434
25 411 483 56 54 63 28 201
26 402 428 23 24 48 22 117
27 415 419 40 33 71 46 192
28 402 488 14 5 21 2 42
29 427 471 44 71 52 13 180
30 431 498 103 87 141 63 394
31 416 482 93 89 82 76 340
32 376 391 14 9 15 2 40
33 435 477 55 42 33 25 155
34 453 562 24 26 14 1 65
35 375 426 23 22 10 3 58
36 450 504 128 117 72 63 398
37 375 435 36 14 28 19 97
38 327 331 4 4 12 4 24
39 402 453 27 31 34 23 115
40 461 550 314 198 225 54 806
41 401 460 80 46 41 8 179
42 332 363 20 18 24 12 74
43 489 593 576 445 251 7 1383
44 446 507 173 192 175 33 608
45 498 645 647 377 272 2 1349
46 428 488 43 62 32 12 156
47 371 464 32 57 31 6 126
48 717 1075 173 40 44 0 257
49 433 507 32 21 33 6 92
50 506 690 393 120 105 5 623
The titanic data set is the survival of Titanic passengers in an R data file format. Import the file into an R session and examine the file contents.
load('titanic.raw.rdata')
head(titanic.raw)
Class Sex Age Survived
1 3rd Male Child No
2 3rd Male Child No
3 3rd Male Child No
4 3rd Male Child No
5 3rd Male Child No
6 3rd Male Child No