Due: Friday, 26, in class.
Problem 1
1.a There is an exciting data set included with the default R installation, but I can’t remember what it’s called. I think it has something to do with chicken. Find it by keyword search and load it into your workspace.
help.search("chicken")
datachk <- chickwts
1.b Add a new column that gives the weight in kilograms, rather than grams. Make sure the column has a descriptive name, kg.
datachk$kg <- c(datachk$weight / 1000)
1.c Create 2 weight categories, light and heavy. If weight is less than 150g, assign light. Otherwise, heavy. Make sure the column has a descriptive name, weightcat.
datachk$weightcat <- ifelse(datachk$weight < 150, c("light"), c("heavy"))
datachk$weightcat
## [1] "heavy" "heavy" "light" "heavy" "heavy" "heavy" "light" "light"
## [9] "light" "light" "heavy" "heavy" "heavy" "light" "heavy" "heavy"
## [17] "light" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [25] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [33] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [41] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [49] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [57] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [65] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
datachk
## weight feed kg weightcat
## 1 179 horsebean 0.179 heavy
## 2 160 horsebean 0.160 heavy
## 3 136 horsebean 0.136 light
## 4 227 horsebean 0.227 heavy
## 5 217 horsebean 0.217 heavy
## 6 168 horsebean 0.168 heavy
## 7 108 horsebean 0.108 light
## 8 124 horsebean 0.124 light
## 9 143 horsebean 0.143 light
## 10 140 horsebean 0.140 light
## 11 309 linseed 0.309 heavy
## 12 229 linseed 0.229 heavy
## 13 181 linseed 0.181 heavy
## 14 141 linseed 0.141 light
## 15 260 linseed 0.260 heavy
## 16 203 linseed 0.203 heavy
## 17 148 linseed 0.148 light
## 18 169 linseed 0.169 heavy
## 19 213 linseed 0.213 heavy
## 20 257 linseed 0.257 heavy
## 21 244 linseed 0.244 heavy
## 22 271 linseed 0.271 heavy
## 23 243 soybean 0.243 heavy
## 24 230 soybean 0.230 heavy
## 25 248 soybean 0.248 heavy
## 26 327 soybean 0.327 heavy
## 27 329 soybean 0.329 heavy
## 28 250 soybean 0.250 heavy
## 29 193 soybean 0.193 heavy
## 30 271 soybean 0.271 heavy
## 31 316 soybean 0.316 heavy
## 32 267 soybean 0.267 heavy
## 33 199 soybean 0.199 heavy
## 34 171 soybean 0.171 heavy
## 35 158 soybean 0.158 heavy
## 36 248 soybean 0.248 heavy
## 37 423 sunflower 0.423 heavy
## 38 340 sunflower 0.340 heavy
## 39 392 sunflower 0.392 heavy
## 40 339 sunflower 0.339 heavy
## 41 341 sunflower 0.341 heavy
## 42 226 sunflower 0.226 heavy
## 43 320 sunflower 0.320 heavy
## 44 295 sunflower 0.295 heavy
## 45 334 sunflower 0.334 heavy
## 46 322 sunflower 0.322 heavy
## 47 297 sunflower 0.297 heavy
## 48 318 sunflower 0.318 heavy
## 49 325 meatmeal 0.325 heavy
## 50 257 meatmeal 0.257 heavy
## 51 303 meatmeal 0.303 heavy
## 52 315 meatmeal 0.315 heavy
## 53 380 meatmeal 0.380 heavy
## 54 153 meatmeal 0.153 heavy
## 55 263 meatmeal 0.263 heavy
## 56 242 meatmeal 0.242 heavy
## 57 206 meatmeal 0.206 heavy
## 58 344 meatmeal 0.344 heavy
## 59 258 meatmeal 0.258 heavy
## 60 368 casein 0.368 heavy
## 61 390 casein 0.390 heavy
## 62 379 casein 0.379 heavy
## 63 260 casein 0.260 heavy
## 64 404 casein 0.404 heavy
## 65 318 casein 0.318 heavy
## 66 352 casein 0.352 heavy
## 67 359 casein 0.359 heavy
## 68 216 casein 0.216 heavy
## 69 222 casein 0.222 heavy
## 70 283 casein 0.283 heavy
## 71 332 casein 0.332 heavy
1.d Compute the minimum, maximum, mean and median of weight.
summary(datachk$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 108.0 204.5 258.0 261.3 323.5 423.0
1.e Make a box-and-whiskers plot showing the distribution of chicken weight according to feed type. Make sure to label the axes appropriately.
library(ggplot2)
ggplot(data = datachk, aes(x= feed, y = weight, fill = feed)) +
geom_boxplot() + ggtitle("Box plot of chicken weight by feed type")
Problem 2
2.a Create a numeric vector \(x\) of length 60 that ranges from \(-\pi\) to \(\pi\).
x <- seq(from = -pi, to = pi, length.out = 60)
2.b Create a numeric vector \(y\) that is the sine of \(x\) (in radians).
y <- sin(x)
2.c Create a vector \(z\) that is the cosine of \(x\).
z <- cos(x)
2.d Plot \(y\) vs. \(x\) as a series of points joined by lines. sort1]
plot(x,y,type = "o")
2.e On the same graph, add red-colored points for \(z\) vs. \(x\).
plot(x,y,type = "o")
points(x,z,type = "o", col = "red")
2.f Add a legend.
plot(x,y,type = "o")
points(x,z,type = "o", col = "red")
legend("right", legend = c("black:Sine", "Red:cosine"))
Problem 3
3.a Download the crab spreadsheet to your local machine. Now, import the spreadsheet as a data frame into your R workspace, naming the resulting object crab_dat. Briefly inspect the data.
crab_dat <- read.table("crab.txt", header = TRUE, fill = TRUE)
str(crab_dat)
## 'data.frame': 173 obs. of 8 variables:
## $ color : int 3 4 2 4 4 3 2 4 3 4 ...
## $ spine : int 3 3 1 3 3 3 1 2 1 3 ...
## $ width : num 28.3 22.5 26 24.8 26 23.8 26.5 24.7 23.7 25.6 ...
## $ nSatellites: int 8 0 9 0 4 0 0 0 0 0 ...
## $ weight : int 3050 1550 2300 2100 2600 2100 2350 1900 1950 2150 ...
## $ x1 : logi NA NA NA NA NA NA ...
## $ x2 : logi NA NA NA NA NA NA ...
## $ x3 : logi NA NA NA NA NA NA ...
3.b Confirm that the last three columns are useless, and remove them. Convert the first column to character type.
names(crab_dat)
## [1] "color" "spine" "width" "nSatellites" "weight"
## [6] "x1" "x2" "x3"
crab_dat$x1 <- NULL
crab_dat$x2 <- NULL
crab_dat$x3 <- NULL
crab_dat$color <- as.character(crab_dat$color)
str(crab_dat)
## 'data.frame': 173 obs. of 5 variables:
## $ color : chr "3" "4" "2" "4" ...
## $ spine : int 3 3 1 3 3 3 1 2 1 3 ...
## $ width : num 28.3 22.5 26 24.8 26 23.8 26.5 24.7 23.7 25.6 ...
## $ nSatellites: int 8 0 9 0 4 0 0 0 0 0 ...
## $ weight : int 3050 1550 2300 2100 2600 2100 2350 1900 1950 2150 ...
3.c Change the name of the fifth column to weight1. Create weight2 column which has two weight categories. if weight is less than the median of weight1, assign light, otherwise heavy.
colnames(crab_dat)[5] <- "weight1"
crab_dat$weight2 <-ifelse(crab_dat$weight1 < median(crab_dat$weight1), c("light"), c("heavy"))
crab_dat
## color spine width nSatellites weight1 weight2
## 1 3 3 28.3 8 3050 heavy
## 2 4 3 22.5 0 1550 light
## 3 2 1 26.0 9 2300 light
## 4 4 3 24.8 0 2100 light
## 5 4 3 26.0 4 2600 heavy
## 6 3 3 23.8 0 2100 light
## 7 2 1 26.5 0 2350 heavy
## 8 4 2 24.7 0 1900 light
## 9 3 1 23.7 0 1950 light
## 10 4 3 25.6 0 2150 light
## 11 4 3 24.3 0 2150 light
## 12 3 3 25.8 0 2650 heavy
## 13 3 3 28.2 11 3050 heavy
## 14 5 2 21.0 0 1850 light
## 15 3 1 26.0 14 2300 light
## 16 2 1 27.1 8 2950 heavy
## 17 3 3 25.2 1 2000 light
## 18 3 3 29.0 1 3000 heavy
## 19 5 3 24.7 0 2200 light
## 20 3 3 27.4 3 2700 heavy
## 21 3 2 23.2 4 1950 light
## 22 2 2 25.0 3 2300 light
## 23 3 1 22.5 1 1600 light
## 24 4 3 26.7 2 2600 heavy
## 25 5 3 25.8 3 2000 light
## 26 5 3 26.2 0 1300 light
## 27 3 3 28.7 3 2800 heavy
## 28 3 1 26.8 5 2700 heavy
## 29 5 3 27.5 0 2600 heavy
## 30 3 3 24.9 0 2100 light
## 31 2 1 29.3 4 3200 heavy
## 32 2 3 25.8 0 2600 heavy
## 33 3 2 25.7 0 2000 light
## 34 3 1 25.7 8 2000 light
## 35 3 1 26.7 5 2700 heavy
## 36 5 3 23.7 0 1850 light
## 37 3 3 26.8 0 2650 heavy
## 38 3 3 27.5 6 3150 heavy
## 39 5 3 23.4 0 1900 light
## 40 3 3 27.9 6 2800 heavy
## 41 4 3 27.5 3 3100 heavy
## 42 2 1 26.1 5 2800 heavy
## 43 2 1 27.7 6 2500 heavy
## 44 3 1 30.0 5 3300 heavy
## 45 4 1 28.5 9 3250 heavy
## 46 4 3 28.9 4 2800 heavy
## 47 3 3 28.2 6 2600 heavy
## 48 3 3 25.0 4 2100 light
## 49 3 3 28.5 3 3000 heavy
## 50 3 1 30.3 3 3600 heavy
## 51 5 3 24.7 5 2100 light
## 52 3 3 27.7 5 2900 heavy
## 53 2 1 27.4 6 2700 heavy
## 54 3 3 22.9 4 1600 light
## 55 3 1 25.7 5 2000 light
## 56 3 3 28.3 15 3000 heavy
## 57 3 3 27.2 3 2700 heavy
## 58 4 3 26.2 3 2300 light
## 59 3 1 27.8 0 2750 heavy
## 60 5 3 25.5 0 2250 light
## 61 4 3 27.1 0 2550 heavy
## 62 4 3 24.5 5 2050 light
## 63 4 1 27.0 3 2450 heavy
## 64 3 3 26.0 5 2150 light
## 65 3 3 28.0 1 2800 heavy
## 66 3 3 30.0 8 3050 heavy
## 67 3 3 29.0 10 3200 heavy
## 68 3 3 26.2 0 2400 heavy
## 69 3 1 26.5 0 1300 light
## 70 3 3 26.2 3 2400 heavy
## 71 4 3 25.6 7 2800 heavy
## 72 4 3 23.0 1 1650 light
## 73 4 3 23.0 0 1800 light
## 74 3 3 25.4 6 2250 light
## 75 4 3 24.2 0 1900 light
## 76 3 2 22.9 0 1600 light
## 77 4 2 26.0 3 2200 light
## 78 3 3 25.4 4 2250 light
## 79 4 3 25.7 0 1200 light
## 80 3 3 25.1 5 2100 light
## 81 4 2 24.5 0 2250 light
## 82 5 3 27.5 0 2900 heavy
## 83 4 3 23.1 0 1650 light
## 84 4 1 25.9 4 2550 heavy
## 85 3 3 25.8 0 2300 light
## 86 5 3 27.0 3 2250 light
## 87 3 3 28.5 0 3050 heavy
## 88 5 1 25.5 0 2750 heavy
## 89 5 3 23.5 0 1900 light
## 90 3 2 24.0 0 1700 light
## 91 3 1 29.7 5 3850 heavy
## 92 3 1 26.8 0 2550 heavy
## 93 5 3 26.7 0 2450 heavy
## 94 3 1 28.7 0 3200 heavy
## 95 4 3 23.1 0 1550 light
## 96 3 1 29.0 1 2800 heavy
## 97 4 3 25.5 0 2250 light
## 98 4 3 26.5 1 1967 light
## 99 4 3 24.5 1 2200 light
## 100 4 3 28.5 1 3000 heavy
## 101 3 3 28.2 1 2867 heavy
## 102 3 3 24.5 1 1600 light
## 103 3 3 27.5 1 2550 heavy
## 104 3 2 24.7 4 2550 heavy
## 105 3 1 25.2 1 2000 light
## 106 4 3 27.3 1 2900 heavy
## 107 3 3 26.3 1 2400 heavy
## 108 3 3 29.0 1 3100 heavy
## 109 3 3 25.3 2 1900 light
## 110 3 3 26.5 4 2300 light
## 111 3 3 27.8 3 3250 heavy
## 112 3 3 27.0 6 2500 heavy
## 113 4 3 25.7 0 2100 light
## 114 3 3 25.0 2 2100 light
## 115 3 3 31.9 2 3325 heavy
## 116 5 3 23.7 0 1800 light
## 117 5 3 29.3 12 3225 heavy
## 118 4 3 22.0 0 1400 light
## 119 3 3 25.0 5 2400 heavy
## 120 4 3 27.0 6 2500 heavy
## 121 4 3 23.8 6 1800 light
## 122 2 1 30.2 2 3275 heavy
## 123 4 3 26.2 0 2225 light
## 124 3 3 24.2 2 1650 light
## 125 3 3 27.4 3 2900 heavy
## 126 3 2 25.4 0 2300 light
## 127 4 3 28.4 3 3200 heavy
## 128 5 3 22.5 4 1475 light
## 129 3 3 26.2 2 2025 light
## 130 3 1 24.9 6 2300 light
## 131 2 2 24.5 6 1950 light
## 132 3 3 25.1 0 1800 light
## 133 3 1 28.0 4 2900 heavy
## 134 5 3 25.8 10 2250 light
## 135 3 3 27.9 7 3050 heavy
## 136 3 3 24.9 0 2200 light
## 137 3 1 28.4 5 3100 heavy
## 138 4 3 27.2 5 2400 heavy
## 139 3 2 25.0 6 2250 light
## 140 3 3 27.5 6 2625 heavy
## 141 3 1 33.5 7 5200 heavy
## 142 3 3 30.5 3 3325 heavy
## 143 4 3 29.0 3 2925 heavy
## 144 3 1 24.3 0 2000 light
## 145 3 3 25.8 0 2400 heavy
## 146 5 3 25.0 8 2100 light
## 147 3 1 31.7 4 3725 heavy
## 148 3 3 29.5 4 3025 heavy
## 149 4 3 24.0 10 1900 light
## 150 3 3 30.0 9 3000 heavy
## 151 3 3 27.6 4 2850 heavy
## 152 3 3 26.2 0 2300 light
## 153 3 1 23.1 0 2000 light
## 154 3 1 22.9 0 1600 light
## 155 5 3 24.5 0 1900 light
## 156 3 3 24.7 4 1950 light
## 157 3 3 28.3 0 3200 heavy
## 158 3 3 23.9 2 1850 light
## 159 4 3 23.8 0 1800 light
## 160 4 2 29.8 4 3500 heavy
## 161 3 3 26.5 4 2350 heavy
## 162 3 3 26.0 3 2275 light
## 163 3 3 28.2 8 3050 heavy
## 164 5 3 25.7 0 2150 light
## 165 3 3 26.5 7 2750 heavy
## 166 3 3 25.8 0 2200 light
## 167 4 3 24.1 0 1800 light
## 168 4 3 26.2 2 2175 light
## 169 4 3 26.1 3 2750 heavy
## 170 4 3 29.0 4 3275 heavy
## 171 2 1 28.0 0 2625 heavy
## 172 5 3 27.0 0 2625 heavy
## 173 3 2 24.5 0 2000 light
3.d Now that you have cleaned up the clin object, save it for later use, both as an R object (clin.rda) and also as a CSV file (clin.csv).
write.csv(crab_dat, "clin.csv")
save(crab_dat,file = "clin.Rda")