Asad Zaidi
Implement the steps for cleaning the OrdwayBirdsOrig data, including
WingChord.Data Entry Person makes more mistakes in entering the SpeciesName. Hint: compare SpeciesName to SpeciesNameCleaned.data(OrdwayBirdsOrig)
groupBy(OrdwayBirdsOrig, by = Year)
## Year count
## 1 4
## 2 1968 24
## 3 1969 9
## 4 1970 25
## 5 1971 63
## 6 1972 1500
## 7 1973 2434
## 8 1974 120
## 9 1975 1882
## 10 1976 2214
## 11 1978 1402
## 12 1979 928
## 13 1980 1138
## 14 1981 1031
## 15 1982 934
## 16 1983 1231
## 17 1984 811
## 18 1985 56
## 19 1994 22
## 20 2979 1
## 21 Year (19xx) 0
snames = with(OrdwayBirdsOrig, levels(SpeciesName))
namesCleaned = fetchGoogle("https://docs.google.com/spreadsheet/pub?key=0Av2C2RiwUxpVdFlPWUp6NERSQzhld3o4QklQd1p6d2c&single=true&gid=0&output=csv")
## Loading required package: RCurl
## Loading required package: bitops
birds = join(OrdwayBirdsOrig, namesCleaned)
## Joining by: SpeciesName
groupBy(birds, by = SpeciesNameCleaned)
## SpeciesNameCleaned count
## 1 Acadian Flycatcher 1
## 2 American Goldfinch 1204
## 3 Baltimore Oriole 208
## 4 Black and White Warbler 10
## 5 Black-billed Cookoo 16
## 6 Black-capped Chickadee 1327
## 7 Black-throat Sparrow 62
## 8 Brown-headed Cowbird 4
## 9 Cardinal 77
## 10 Carolina Chickadee 1
## 11 Catbird 554
## 12 Cedar Waxwing 59
## 13 Chestnut-backed Chickadee 3
## 14 Chestnut-sided Warbler 1
## 15 Chickadee 3
## 16 Chipping Sparrow 319
## 17 Clay-colored Sparrow 14
## 18 Cowbird 90
## 19 Curve-billed Thrasher 14
## 20 Eastern Phoebe 12
## 21 Eastern Wood Pewee 6
## 22 Field Sparrow 1164
## 23 Golden-Crowned Kinglet 9
## 24 Gray - cheeked Thrush 2
## 25 Great Crested Flycatcher 3
## 26 Harris's Sparrow 16
## 27 House Wren 460
## 28 Kestrel 2
## 29 Least Flycatcher 372
## 30 Lincoln's Sparrow 790
## 31 Lost 5
## 32 Mourning Dove 29
## 33 Mourning Warbler 4
## 34 Myrtle Warbler 454
## 35 N/A 9
## 36 Nashville Warbler 160
## 37 Northern Shrike 11
## 38 Northern Waterthrush 13
## 39 Northern Yellowthroat 7
## 40 Olive-sided Flycatcher 22
## 41 Orange-Crowned Warbler 57
## 42 Orchard Oriole 8
## 43 Oregon Junco 3
## 44 Ovenbird 24
## 45 Palm Warbler 127
## 46 Partly-cloudy; light winds 1
## 47 Pectoral Sandpiper 1
## 48 Pewee 9
## 49 Phainopepla 1
## 50 Philadelphia Vireo 21
## 51 Phoebe 19
## 52 Pine Siskin 35
## 53 Purple Finch 122
## 54 Pyrrhuloxia 1
## 55 Red-Bellied Woodpecker 9
## 56 Red-Breast Grosbeak 20
## 57 Red-Winged Blackbird 53
## 58 Red-bellied Sapsucker 1
## 59 Red-eyed Cowbird 1
## 60 Red-eyed Viero 43
## 61 Red-headed Woodpecker 1
## 62 Red-tailed Hawk 1
## 63 Redstart 3
## 64 Robin 608
## 65 Rose Breasted Grosbeak 201
## 66 Ruby-Crested Kinglet 21
## 67 Ruby-crown Kinglet 112
## 68 Ruby-throated Hummingbird 5
## 69 Rufous-sided Towhee 10
## 70 Savannah Sparrow 1
## 71 Slate-colored Junco 2732
## 72 Solitary Vireo 14
## 73 Song Sparrow 512
## 74 Sparrow Hawk 1
## 75 Starling 37
## 76 Steller's Jay 11
## 77 Swainson's Thrush 103
## 78 Swamp Sparrow 83
## 79 Tennessee Warbler 86
## 80 Traill's Flycatcher 47
## 81 Tree L 2
## 82 Tree Swallow 1537
## 83 Tufted Titmouse 2
## 84 Unknown 1
## 85 Varied Thrush 2
## 86 Veery 6
## 87 Vesper Sparrow 2
## 88 Warbling Vireo 1
## 89 White-Crested Sparrow 1
## 90 White-Fronted Dove 1
## 91 White-breasted Nuthatch 281
## 92 White-crowned Sparrow 95
## 93 White-eyed Vireo 1
## 94 White-throat Sparrow 328
## 95 White-winged Junco 2
## 96 Wilson's Warbler 26
## 97 Winter Wren 1
## 98 Wood Pewee 37
## 99 Wood Thrush 3
## 100 Woodcock 1
## 101 Wren 2
## 102 Yellow Shafted Flicker 17
## 103 Yellow Warbler 19
## 104 Yellow-bellied Flycatcher 7
## 105 Yellow-bellied Sapsucker 3
## 106 Yellow-tailed Oriole 1
## 107 Yellowthroat 107
## 108 none 2
with(birds, class(Year))
## [1] "factor"
with(birds, levels(Year))
## [1] "" "1968" "1969" "1970" "1971"
## [6] "1972" "1973" "1974" "1975" "1976"
## [11] "1978" "1979" "1980" "1981" "1982"
## [16] "1983" "1984" "1985" "1994" "2979"
## [21] "Year (19xx)"
groupBy(birds, by = Year)
## Year count
## 1 4
## 2 1968 24
## 3 1969 9
## 4 1970 25
## 5 1971 64
## 6 1972 1674
## 7 1973 2706
## 8 1974 120
## 9 1975 1999
## 10 1976 2289
## 11 1978 1464
## 12 1979 1014
## 13 1980 1222
## 14 1981 1088
## 15 1982 1044
## 16 1983 1383
## 17 1984 912
## 18 1985 56
## 19 1994 22
## 20 2979 1
## 21 Year (19xx) 0
birds = transform(birds, Year = as.numeric(as.character(Year)))
birds = subset(birds, Year %in% 1960:2020)
groupBy(birds, by = Year)
## Year count
## 1 1968 24
## 2 1969 9
## 3 1970 25
## 4 1971 64
## 5 1972 1674
## 6 1973 2706
## 7 1974 120
## 8 1975 1999
## 9 1976 2289
## 10 1978 1464
## 11 1979 1014
## 12 1980 1222
## 13 1981 1088
## 14 1982 1044
## 15 1983 1383
## 16 1984 912
## 17 1985 56
## 18 1994 22
groupBy(birds, by = Month)
## Month count
## 1 0
## 2 1 660
## 3 10 3549
## 4 11 1166
## 5 12 554
## 6 2 601
## 7 25 1
## 8 3 906
## 9 4 1667
## 10 5 2780
## 11 6 1124
## 12 7 1159
## 13 8 875
## 14 9 2073
## 15 Month 0
birds = transform(birds, month = as.numeric(as.character(Month)))
groupBy(birds, by = month)
## month count
## 1 1 660
## 2 2 601
## 3 3 906
## 4 4 1667
## 5 5 2780
## 6 6 1124
## 7 7 1159
## 8 8 875
## 9 9 2073
## 10 10 3549
## 11 11 1166
## 12 12 554
## 13 25 1
birds = subset(birds, month %in% 1:12)
groupBy(birds, by = month)
## month count
## 1 1 660
## 2 2 601
## 3 3 906
## 4 4 1667
## 5 5 2780
## 6 6 1124
## 7 7 1159
## 8 8 875
## 9 9 2073
## 10 10 3549
## 11 11 1166
## 12 12 554
groupBy(birds, by = Day)
## Day count
## 1 0
## 2 1 566
## 3 10 544
## 4 11 516
## 5 12 475
## 6 13 693
## 7 14 608
## 8 15 600
## 9 16 579
## 10 17 494
## 11 18 573
## 12 19 539
## 13 1975 0
## 14 2 583
## 15 20 542
## 16 21 498
## 17 22 467
## 18 23 580
## 19 24 587
## 20 25 515
## 21 26 633
## 22 27 551
## 23 28 508
## 24 29 459
## 25 3 663
## 26 30 507
## 27 31 317
## 28 4 637
## 29 5 619
## 30 6 566
## 31 7 569
## 32 8 598
## 33 80 1
## 34 9 527
## 35 Day 0
birds = transform(birds, Day = as.numeric(as.character(Day)))
birds = subset(birds, Day %in% 1:31)
groupBy(birds, Day)
## Day count
## 1 1 566
## 2 2 583
## 3 3 663
## 4 4 637
## 5 5 619
## 6 6 566
## 7 7 569
## 8 8 598
## 9 9 527
## 10 10 544
## 11 11 516
## 12 12 475
## 13 13 693
## 14 14 608
## 15 15 600
## 16 16 579
## 17 17 494
## 18 18 573
## 19 19 539
## 20 20 542
## 21 21 498
## 22 22 467
## 23 23 580
## 24 24 587
## 25 25 515
## 26 26 633
## 27 27 551
## 28 28 508
## 29 29 459
## 30 30 507
## 31 31 317
birds = transform(birds, weight = ifelse(Weight == "", NA, Weight))
birds = transform(birds, weight = gsub("grams", "", as.character(Weight), fixed = TRUE))
birds = transform(birds, weight = as.numeric(as.character(Weight)))
## Warning: NAs introduced by coercion
groupBy(birds, is.na(Weight))
## is.na(Weight) count
## 1 FALSE 17113
groupBy(birds, is.na(weight))
## is.na(weight) count
## 1 FALSE 11943
## 2 TRUE 5170
densityplot(~weight, data = birds)
birds = transform(birds, weight = ifelse(weight < 200, NA, weight))
with(birds, class(WingChord))
## [1] "factor"
with(birds, levels(WingChord))
## [1] "" "10" "10.2" "100" "101"
## [6] "102" "103" "104" "105" "106"
## [11] "107" "108" "109" "11.5" "110"
## [16] "111" "112" "113" "114" "115"
## [21] "116" "117" "118" "119" "12.1"
## [26] "12.6" "12.9" "120" "121" "122"
## [31] "123" "124" "125" "125 mm" "126"
## [36] "126 mm" "127" "128" "129" "129 mm"
## [41] "13.1" "13.5" "130" "130 (84.9)" "131"
## [46] "131 mm" "132" "133" "133 (86.3)" "134"
## [51] "135" "136" "137" "138" "139"
## [56] "140" "140 mm" "141" "142" "143"
## [61] "144" "145" "146" "147" "148"
## [66] "149" "15" "150" "151" "152"
## [71] "153" "154" "155" "156" "158"
## [76] "159" "160" "163" "170" "181"
## [81] "187" "19.4" "195" "197" "21"
## [86] "222" "26" "28" "38" "39"
## [91] "41" "44" "46" "47" "48"
## [96] "49" "50" "51" "52" "53"
## [101] "54" "55" "56" "57" "58"
## [106] "59" "6" "60" "61" "62"
## [111] "62 mm" "63" "64" "64 mm" "65"
## [116] "65 mm" "66" "67" "68" "68 mm"
## [121] "69" "69 mm" "7.3" "70" "70 mm"
## [126] "71" "71 mm" "71.5" "72" "72 mm"
## [131] "73" "73 mm" "74" "74 mm" "75"
## [136] "75 mm" "76" "76 mm" "77" "77 mm"
## [141] "78" "78 mm" "79" "79 mm" "80"
## [146] "80 mm" "81" "81 mm" "82" "83"
## [151] "84" "85" "85 mm" "86" "87"
## [156] "88" "89" "90" "91" "91.6"
## [161] "92" "93" "93 mm" "94" "95"
## [166] "96" "96 mm" "97" "98" "99"
## [171] "N/A" "none" "p/s 10 63" "p/s 11 57" "p/s 11 62"
## [176] "p/s 12 59" "p/s 9 57" "Wing chord"
birds = transform(birds, WC = ifelse(WingChord == "", NA, WingChord))
birds = transform(birds, WC = gsub("mm", "", as.character(WingChord), fixed = TRUE))
birds = transform(birds, WC = as.numeric(as.character(WC)))
## Warning: NAs introduced by coercion
groupBy(birds, is.na(WingChord))
## is.na(WingChord) count
## 1 FALSE 17113
groupBy(birds, is.na(WC))
## is.na(WC) count
## 1 FALSE 9788
## 2 TRUE 7325
densityplot(~WC, data = birds)
birds = transform(birds, WC = ifelse(WC < 20, NA, WC))
The countrySynonyms data file (you can load it with data(countrySynonmyms)) gives word synonyms for each country listed in the World-Map software, together with an official ISO 3-letter country code. This data set is in “wide” format. To turn it into narrow format, you can do this:
data(countrySynonyms)
foo <- melt(countrySynonyms, id.vars = c("ID", "ISO3"), value.name = "Country",
measure.vars = names(countrySynonyms)[-(1:2)], variable.name = "whence")
countrySynonymsLong <- subset(foo, !is.na(Country))
countrySynonymsLong$whence <- NULL
countrySynonymsLong, find the names of any countries in the Gapminder data that don't have corresponding names in countrySynonymsLong. Hint: join() and subset() are good tools.countrySynonymsLong that is the non-exact match to the name and fix the Gapminder data accordingly, so that every possible country in the Gapminder data shows up in the map.The countryRegions data file (you can load it with data(countryRegions) or get documentation with help(countryRegions) let's you aggregate countries in various ways.
Compare boys and girls aged 10 years and less.
Looking at boys between 10 and 25 …
Look at people aged greater than 20 years, first with a very small sample (about 100 people) and then with the entire data set.
The BMI models weight as being proportional to height squared. This should give a straight line on a graph of log-weight against log-height. Is this a reasonable model for adults?