HW1: Data manipulation

Due: Friday, 26, in class.

Problem 1

1.a There is an exciting data set included with the default R installation, but I can’t remember what it’s called. I think it has something to do with chicken. Find it by keyword search and load it into your workspace.

help.search("chicken")
datachk <- chickwts

1.b Add a new column that gives the weight in kilograms, rather than grams. Make sure the column has a descriptive name, kg.

datachk$kg <- c(datachk$weight / 1000)

1.c Create 2 weight categories, light and heavy. If weight is less than 150g, assign light. Otherwise, heavy. Make sure the column has a descriptive name, weightcat.

datachk$weightcat <- ifelse(datachk$weight < 150, c("light"), c("heavy"))
datachk$weightcat
##  [1] "heavy" "heavy" "light" "heavy" "heavy" "heavy" "light" "light"
##  [9] "light" "light" "heavy" "heavy" "heavy" "light" "heavy" "heavy"
## [17] "light" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [25] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [33] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [41] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [49] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [57] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
## [65] "heavy" "heavy" "heavy" "heavy" "heavy" "heavy" "heavy"
datachk
##    weight      feed    kg weightcat
## 1     179 horsebean 0.179     heavy
## 2     160 horsebean 0.160     heavy
## 3     136 horsebean 0.136     light
## 4     227 horsebean 0.227     heavy
## 5     217 horsebean 0.217     heavy
## 6     168 horsebean 0.168     heavy
## 7     108 horsebean 0.108     light
## 8     124 horsebean 0.124     light
## 9     143 horsebean 0.143     light
## 10    140 horsebean 0.140     light
## 11    309   linseed 0.309     heavy
## 12    229   linseed 0.229     heavy
## 13    181   linseed 0.181     heavy
## 14    141   linseed 0.141     light
## 15    260   linseed 0.260     heavy
## 16    203   linseed 0.203     heavy
## 17    148   linseed 0.148     light
## 18    169   linseed 0.169     heavy
## 19    213   linseed 0.213     heavy
## 20    257   linseed 0.257     heavy
## 21    244   linseed 0.244     heavy
## 22    271   linseed 0.271     heavy
## 23    243   soybean 0.243     heavy
## 24    230   soybean 0.230     heavy
## 25    248   soybean 0.248     heavy
## 26    327   soybean 0.327     heavy
## 27    329   soybean 0.329     heavy
## 28    250   soybean 0.250     heavy
## 29    193   soybean 0.193     heavy
## 30    271   soybean 0.271     heavy
## 31    316   soybean 0.316     heavy
## 32    267   soybean 0.267     heavy
## 33    199   soybean 0.199     heavy
## 34    171   soybean 0.171     heavy
## 35    158   soybean 0.158     heavy
## 36    248   soybean 0.248     heavy
## 37    423 sunflower 0.423     heavy
## 38    340 sunflower 0.340     heavy
## 39    392 sunflower 0.392     heavy
## 40    339 sunflower 0.339     heavy
## 41    341 sunflower 0.341     heavy
## 42    226 sunflower 0.226     heavy
## 43    320 sunflower 0.320     heavy
## 44    295 sunflower 0.295     heavy
## 45    334 sunflower 0.334     heavy
## 46    322 sunflower 0.322     heavy
## 47    297 sunflower 0.297     heavy
## 48    318 sunflower 0.318     heavy
## 49    325  meatmeal 0.325     heavy
## 50    257  meatmeal 0.257     heavy
## 51    303  meatmeal 0.303     heavy
## 52    315  meatmeal 0.315     heavy
## 53    380  meatmeal 0.380     heavy
## 54    153  meatmeal 0.153     heavy
## 55    263  meatmeal 0.263     heavy
## 56    242  meatmeal 0.242     heavy
## 57    206  meatmeal 0.206     heavy
## 58    344  meatmeal 0.344     heavy
## 59    258  meatmeal 0.258     heavy
## 60    368    casein 0.368     heavy
## 61    390    casein 0.390     heavy
## 62    379    casein 0.379     heavy
## 63    260    casein 0.260     heavy
## 64    404    casein 0.404     heavy
## 65    318    casein 0.318     heavy
## 66    352    casein 0.352     heavy
## 67    359    casein 0.359     heavy
## 68    216    casein 0.216     heavy
## 69    222    casein 0.222     heavy
## 70    283    casein 0.283     heavy
## 71    332    casein 0.332     heavy

1.d Compute the minimum, maximum, mean and median of weight.

summary(datachk$weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   108.0   204.5   258.0   261.3   323.5   423.0

1.e Make a box-and-whiskers plot showing the distribution of chicken weight according to feed type. Make sure to label the axes appropriately.

library(ggplot2)
ggplot(data = datachk, aes(x= feed, y = weight, fill = feed)) +
   geom_boxplot() + ggtitle("Box plot of chicken weight by feed type")

Problem 2

2.a Create a numeric vector \(x\) of length 60 that ranges from \(-\pi\) to \(\pi\).

x <- seq(from = -pi, to = pi, length.out = 60)

2.b Create a numeric vector \(y\) that is the sine of \(x\) (in radians).

y <- sin(x)

2.c Create a vector \(z\) that is the cosine of \(x\).

z <- cos(x)

2.d Plot \(y\) vs. \(x\) as a series of points joined by lines. sort1]

plot(x,y,type = "o")

2.e On the same graph, add red-colored points for \(z\) vs. \(x\).

plot(x,y,type = "o")
points(x,z,type = "o", col = "red")

2.f Add a legend.

plot(x,y,type = "o")
points(x,z,type = "o", col = "red")
legend("right", legend = c("black:Sine", "Red:cosine"))

Problem 3

3.a Download the crab spreadsheet to your local machine. Now, import the spreadsheet as a data frame into your R workspace, naming the resulting object crab_dat. Briefly inspect the data.

crab_dat <- read.table("crab.txt", header = TRUE, fill = TRUE)
str(crab_dat)
## 'data.frame':    173 obs. of  8 variables:
##  $ color      : int  3 4 2 4 4 3 2 4 3 4 ...
##  $ spine      : int  3 3 1 3 3 3 1 2 1 3 ...
##  $ width      : num  28.3 22.5 26 24.8 26 23.8 26.5 24.7 23.7 25.6 ...
##  $ nSatellites: int  8 0 9 0 4 0 0 0 0 0 ...
##  $ weight     : int  3050 1550 2300 2100 2600 2100 2350 1900 1950 2150 ...
##  $ x1         : logi  NA NA NA NA NA NA ...
##  $ x2         : logi  NA NA NA NA NA NA ...
##  $ x3         : logi  NA NA NA NA NA NA ...

3.b Confirm that the last three columns are useless, and remove them. Convert the first column to character type.

names(crab_dat)
## [1] "color"       "spine"       "width"       "nSatellites" "weight"     
## [6] "x1"          "x2"          "x3"
crab_dat$x1 <- NULL
crab_dat$x2 <- NULL
crab_dat$x3 <- NULL
crab_dat$color <- as.character(crab_dat$color)
str(crab_dat)
## 'data.frame':    173 obs. of  5 variables:
##  $ color      : chr  "3" "4" "2" "4" ...
##  $ spine      : int  3 3 1 3 3 3 1 2 1 3 ...
##  $ width      : num  28.3 22.5 26 24.8 26 23.8 26.5 24.7 23.7 25.6 ...
##  $ nSatellites: int  8 0 9 0 4 0 0 0 0 0 ...
##  $ weight     : int  3050 1550 2300 2100 2600 2100 2350 1900 1950 2150 ...

3.c Change the name of the fifth column to weight1. Create weight2 column which has two weight categories. if weight is less than the median of weight1, assign light, otherwise heavy.

colnames(crab_dat)[5] <- "weight1" 
crab_dat$weight2 <-ifelse(crab_dat$weight1 < median(crab_dat$weight1),  c("light"),  c("heavy"))
crab_dat
##     color spine width nSatellites weight1 weight2
## 1       3     3  28.3           8    3050   heavy
## 2       4     3  22.5           0    1550   light
## 3       2     1  26.0           9    2300   light
## 4       4     3  24.8           0    2100   light
## 5       4     3  26.0           4    2600   heavy
## 6       3     3  23.8           0    2100   light
## 7       2     1  26.5           0    2350   heavy
## 8       4     2  24.7           0    1900   light
## 9       3     1  23.7           0    1950   light
## 10      4     3  25.6           0    2150   light
## 11      4     3  24.3           0    2150   light
## 12      3     3  25.8           0    2650   heavy
## 13      3     3  28.2          11    3050   heavy
## 14      5     2  21.0           0    1850   light
## 15      3     1  26.0          14    2300   light
## 16      2     1  27.1           8    2950   heavy
## 17      3     3  25.2           1    2000   light
## 18      3     3  29.0           1    3000   heavy
## 19      5     3  24.7           0    2200   light
## 20      3     3  27.4           3    2700   heavy
## 21      3     2  23.2           4    1950   light
## 22      2     2  25.0           3    2300   light
## 23      3     1  22.5           1    1600   light
## 24      4     3  26.7           2    2600   heavy
## 25      5     3  25.8           3    2000   light
## 26      5     3  26.2           0    1300   light
## 27      3     3  28.7           3    2800   heavy
## 28      3     1  26.8           5    2700   heavy
## 29      5     3  27.5           0    2600   heavy
## 30      3     3  24.9           0    2100   light
## 31      2     1  29.3           4    3200   heavy
## 32      2     3  25.8           0    2600   heavy
## 33      3     2  25.7           0    2000   light
## 34      3     1  25.7           8    2000   light
## 35      3     1  26.7           5    2700   heavy
## 36      5     3  23.7           0    1850   light
## 37      3     3  26.8           0    2650   heavy
## 38      3     3  27.5           6    3150   heavy
## 39      5     3  23.4           0    1900   light
## 40      3     3  27.9           6    2800   heavy
## 41      4     3  27.5           3    3100   heavy
## 42      2     1  26.1           5    2800   heavy
## 43      2     1  27.7           6    2500   heavy
## 44      3     1  30.0           5    3300   heavy
## 45      4     1  28.5           9    3250   heavy
## 46      4     3  28.9           4    2800   heavy
## 47      3     3  28.2           6    2600   heavy
## 48      3     3  25.0           4    2100   light
## 49      3     3  28.5           3    3000   heavy
## 50      3     1  30.3           3    3600   heavy
## 51      5     3  24.7           5    2100   light
## 52      3     3  27.7           5    2900   heavy
## 53      2     1  27.4           6    2700   heavy
## 54      3     3  22.9           4    1600   light
## 55      3     1  25.7           5    2000   light
## 56      3     3  28.3          15    3000   heavy
## 57      3     3  27.2           3    2700   heavy
## 58      4     3  26.2           3    2300   light
## 59      3     1  27.8           0    2750   heavy
## 60      5     3  25.5           0    2250   light
## 61      4     3  27.1           0    2550   heavy
## 62      4     3  24.5           5    2050   light
## 63      4     1  27.0           3    2450   heavy
## 64      3     3  26.0           5    2150   light
## 65      3     3  28.0           1    2800   heavy
## 66      3     3  30.0           8    3050   heavy
## 67      3     3  29.0          10    3200   heavy
## 68      3     3  26.2           0    2400   heavy
## 69      3     1  26.5           0    1300   light
## 70      3     3  26.2           3    2400   heavy
## 71      4     3  25.6           7    2800   heavy
## 72      4     3  23.0           1    1650   light
## 73      4     3  23.0           0    1800   light
## 74      3     3  25.4           6    2250   light
## 75      4     3  24.2           0    1900   light
## 76      3     2  22.9           0    1600   light
## 77      4     2  26.0           3    2200   light
## 78      3     3  25.4           4    2250   light
## 79      4     3  25.7           0    1200   light
## 80      3     3  25.1           5    2100   light
## 81      4     2  24.5           0    2250   light
## 82      5     3  27.5           0    2900   heavy
## 83      4     3  23.1           0    1650   light
## 84      4     1  25.9           4    2550   heavy
## 85      3     3  25.8           0    2300   light
## 86      5     3  27.0           3    2250   light
## 87      3     3  28.5           0    3050   heavy
## 88      5     1  25.5           0    2750   heavy
## 89      5     3  23.5           0    1900   light
## 90      3     2  24.0           0    1700   light
## 91      3     1  29.7           5    3850   heavy
## 92      3     1  26.8           0    2550   heavy
## 93      5     3  26.7           0    2450   heavy
## 94      3     1  28.7           0    3200   heavy
## 95      4     3  23.1           0    1550   light
## 96      3     1  29.0           1    2800   heavy
## 97      4     3  25.5           0    2250   light
## 98      4     3  26.5           1    1967   light
## 99      4     3  24.5           1    2200   light
## 100     4     3  28.5           1    3000   heavy
## 101     3     3  28.2           1    2867   heavy
## 102     3     3  24.5           1    1600   light
## 103     3     3  27.5           1    2550   heavy
## 104     3     2  24.7           4    2550   heavy
## 105     3     1  25.2           1    2000   light
## 106     4     3  27.3           1    2900   heavy
## 107     3     3  26.3           1    2400   heavy
## 108     3     3  29.0           1    3100   heavy
## 109     3     3  25.3           2    1900   light
## 110     3     3  26.5           4    2300   light
## 111     3     3  27.8           3    3250   heavy
## 112     3     3  27.0           6    2500   heavy
## 113     4     3  25.7           0    2100   light
## 114     3     3  25.0           2    2100   light
## 115     3     3  31.9           2    3325   heavy
## 116     5     3  23.7           0    1800   light
## 117     5     3  29.3          12    3225   heavy
## 118     4     3  22.0           0    1400   light
## 119     3     3  25.0           5    2400   heavy
## 120     4     3  27.0           6    2500   heavy
## 121     4     3  23.8           6    1800   light
## 122     2     1  30.2           2    3275   heavy
## 123     4     3  26.2           0    2225   light
## 124     3     3  24.2           2    1650   light
## 125     3     3  27.4           3    2900   heavy
## 126     3     2  25.4           0    2300   light
## 127     4     3  28.4           3    3200   heavy
## 128     5     3  22.5           4    1475   light
## 129     3     3  26.2           2    2025   light
## 130     3     1  24.9           6    2300   light
## 131     2     2  24.5           6    1950   light
## 132     3     3  25.1           0    1800   light
## 133     3     1  28.0           4    2900   heavy
## 134     5     3  25.8          10    2250   light
## 135     3     3  27.9           7    3050   heavy
## 136     3     3  24.9           0    2200   light
## 137     3     1  28.4           5    3100   heavy
## 138     4     3  27.2           5    2400   heavy
## 139     3     2  25.0           6    2250   light
## 140     3     3  27.5           6    2625   heavy
## 141     3     1  33.5           7    5200   heavy
## 142     3     3  30.5           3    3325   heavy
## 143     4     3  29.0           3    2925   heavy
## 144     3     1  24.3           0    2000   light
## 145     3     3  25.8           0    2400   heavy
## 146     5     3  25.0           8    2100   light
## 147     3     1  31.7           4    3725   heavy
## 148     3     3  29.5           4    3025   heavy
## 149     4     3  24.0          10    1900   light
## 150     3     3  30.0           9    3000   heavy
## 151     3     3  27.6           4    2850   heavy
## 152     3     3  26.2           0    2300   light
## 153     3     1  23.1           0    2000   light
## 154     3     1  22.9           0    1600   light
## 155     5     3  24.5           0    1900   light
## 156     3     3  24.7           4    1950   light
## 157     3     3  28.3           0    3200   heavy
## 158     3     3  23.9           2    1850   light
## 159     4     3  23.8           0    1800   light
## 160     4     2  29.8           4    3500   heavy
## 161     3     3  26.5           4    2350   heavy
## 162     3     3  26.0           3    2275   light
## 163     3     3  28.2           8    3050   heavy
## 164     5     3  25.7           0    2150   light
## 165     3     3  26.5           7    2750   heavy
## 166     3     3  25.8           0    2200   light
## 167     4     3  24.1           0    1800   light
## 168     4     3  26.2           2    2175   light
## 169     4     3  26.1           3    2750   heavy
## 170     4     3  29.0           4    3275   heavy
## 171     2     1  28.0           0    2625   heavy
## 172     5     3  27.0           0    2625   heavy
## 173     3     2  24.5           0    2000   light

3.d Now that you have cleaned up the clin object, save it for later use, both as an R object (clin.rda) and also as a CSV file (clin.csv).

write.csv(crab_dat, "clin.csv")
save(crab_dat,file = "clin.Rda")

Home