2.1 read a.Download the Cereal.csv file from the Canvas page and use the read.csv command to read in the csv file into R and assign it to the object called cereal.

Solution:

cereal = read.csv("cereal.csv", head=T)

2.2 Data frames a.There should be a default dataset in R called cereal. Use the head function to inspect the first few lines of the data frame and use class to check that cereal is in fact a data frame.

Solution:

head(cereal)
##                        name mfr type calories protein fat sodium fiber carbo
## 1                 100%_Bran   N    C       70       4   1    130  10.0   5.0
## 2         100%_Natural_Bran   Q    C      120       3   5     15   2.0   8.0
## 3                  All-Bran   K    C       70       4   1    260   9.0   7.0
## 4 All-Bran_with_Extra_Fiber   K    C       50       4   0    140  14.0   8.0
## 5            Almond_Delight   R    C      110       2   2    200   1.0  14.0
## 6   Apple_Cinnamon_Cheerios   G    C      110       2   2    180   1.5  10.5
##   sugars potass vitamins shelf weight cups   rating
## 1      6    280       25     3      1 0.33 68.40297
## 2      8    135        0     3      1 1.00 33.98368
## 3      5    320       25     3      1 0.33 59.42551
## 4      0    330       25     3      1 0.50 93.70491
## 5      8     -1       25     3      1 0.75 34.38484
## 6     10     70       25     1      1 0.75 29.50954
class(cereal)
## [1] "data.frame"

b.What are the column names of the cereal data frame? How many rows are there? (dim and nrow)

Solution:

colnames(cereal)
##  [1] "name"     "mfr"      "type"     "calories" "protein"  "fat"     
##  [7] "sodium"   "fiber"    "carbo"    "sugars"   "potass"   "vitamins"
## [13] "shelf"    "weight"   "cups"     "rating"
dim(cereal)
## [1] 77 16
nrow(cereal)
## [1] 77

c.Extract the calories column using the $ operator and using the [[ operator.

Solution:

cereal$calories
##  [1]  70 120  70  50 110 110 110 130  90  90 120 110 120 110 110 110 100 110 110
## [20] 110 100 110 100 100 110 110 100 120 120 110 100 110 100 110 120 120 110 110
## [39] 110 140 110 100 110 100 150 150 160 100 120 140  90 130 120 100  50  50 100
## [58] 100 120 100  90 110 110  80  90  90 110 110  90 110 140 100 110 110 100 100
## [77] 110
cereal[['calories']]
##  [1]  70 120  70  50 110 110 110 130  90  90 120 110 120 110 110 110 100 110 110
## [20] 110 100 110 100 100 110 110 100 120 120 110 100 110 100 110 120 120 110 110
## [39] 110 140 110 100 110 100 150 150 160 100 120 140  90 130 120 100  50  50 100
## [58] 100 120 100  90 110 110  80  90  90 110 110  90 110 140 100 110 110 100 100
## [77] 110

d.Extract rows 1 to 10 from the cereal data frame.

Solution:

cereal[1:10, ]
##                         name mfr type calories protein fat sodium fiber carbo
## 1                  100%_Bran   N    C       70       4   1    130  10.0   5.0
## 2          100%_Natural_Bran   Q    C      120       3   5     15   2.0   8.0
## 3                   All-Bran   K    C       70       4   1    260   9.0   7.0
## 4  All-Bran_with_Extra_Fiber   K    C       50       4   0    140  14.0   8.0
## 5             Almond_Delight   R    C      110       2   2    200   1.0  14.0
## 6    Apple_Cinnamon_Cheerios   G    C      110       2   2    180   1.5  10.5
## 7                Apple_Jacks   K    C      110       2   0    125   1.0  11.0
## 8                    Basic_4   G    C      130       3   2    210   2.0  18.0
## 9                  Bran_Chex   R    C       90       2   1    200   4.0  15.0
## 10               Bran_Flakes   P    C       90       3   0    210   5.0  13.0
##    sugars potass vitamins shelf weight cups   rating
## 1       6    280       25     3   1.00 0.33 68.40297
## 2       8    135        0     3   1.00 1.00 33.98368
## 3       5    320       25     3   1.00 0.33 59.42551
## 4       0    330       25     3   1.00 0.50 93.70491
## 5       8     -1       25     3   1.00 0.75 34.38484
## 6      10     70       25     1   1.00 0.75 29.50954
## 7      14     30       25     2   1.00 1.00 33.17409
## 8       8    100       25     3   1.33 0.75 37.03856
## 9       6    125       25     1   1.00 0.67 49.12025
## 10      5    190       25     3   1.00 0.67 53.31381

e.Make a new data frame called Kelloggs which only contains rows that belongs to manufacturer, Kellogs (when mfr takes the value “K”).

Solution:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Kelloggs <- cereal %>% filter(mfr == "K") 
print(Kelloggs)
##                           name mfr type calories protein fat sodium fiber carbo
## 1                     All-Bran   K    C       70       4   1    260     9     7
## 2    All-Bran_with_Extra_Fiber   K    C       50       4   0    140    14     8
## 3                  Apple_Jacks   K    C      110       2   0    125     1    11
## 4                  Corn_Flakes   K    C      100       2   0    290     1    21
## 5                    Corn_Pops   K    C      110       1   0     90     1    13
## 6           Cracklin'_Oat_Bran   K    C      110       3   3    140     4    10
## 7                      Crispix   K    C      110       2   0    220     1    21
## 8                  Froot_Loops   K    C      110       2   1    125     1    11
## 9               Frosted_Flakes   K    C      110       1   0    200     1    14
## 10         Frosted_Mini-Wheats   K    C      100       3   0      0     3    14
## 11               Fruitful_Bran   K    C      120       3   0    240     5    14
## 12 Just_Right_Crunchy__Nuggets   K    C      110       2   1    170     1    17
## 13      Just_Right_Fruit_&_Nut   K    C      140       3   1    170     2    20
## 14        Mueslix_Crispy_Blend   K    C      160       3   2    150     3    17
## 15            Nut&Honey_Crunch   K    C      120       2   1    190     0    15
## 16   Nutri-Grain_Almond-Raisin   K    C      140       3   2    220     3    21
## 17           Nutri-grain_Wheat   K    C       90       3   0    170     3    18
## 18                  Product_19   K    C      100       3   0    320     1    20
## 19                 Raisin_Bran   K    C      120       3   1    210     5    14
## 20              Raisin_Squares   K    C       90       2   0      0     2    15
## 21               Rice_Krispies   K    C      110       2   0    290     0    22
## 22                      Smacks   K    C      110       2   1     70     1     9
## 23                   Special_K   K    C      110       6   0    230     1    16
##    sugars potass vitamins shelf weight cups   rating
## 1       5    320       25     3   1.00 0.33 59.42551
## 2       0    330       25     3   1.00 0.50 93.70491
## 3      14     30       25     2   1.00 1.00 33.17409
## 4       2     35       25     1   1.00 1.00 45.86332
## 5      12     20       25     2   1.00 1.00 35.78279
## 6       7    160       25     3   1.00 0.50 40.44877
## 7       3     30       25     3   1.00 1.00 46.89564
## 8      13     30       25     2   1.00 1.00 32.20758
## 9      11     25       25     1   1.00 0.75 31.43597
## 10      7    100       25     2   1.00 0.80 58.34514
## 11     12    190       25     3   1.33 0.67 41.01549
## 12      6     60      100     3   1.00 1.00 36.52368
## 13      9     95      100     3   1.30 0.75 36.47151
## 14     13    160       25     3   1.50 0.67 30.31335
## 15      9     40       25     2   1.00 0.67 29.92429
## 16      7    130       25     3   1.33 0.67 40.69232
## 17      2     90       25     3   1.00 1.00 59.64284
## 18      3     45      100     3   1.00 1.00 41.50354
## 19     12    240       25     2   1.33 0.75 39.25920
## 20      6    110       25     3   1.00 0.50 55.33314
## 21      3     35       25     1   1.00 1.00 40.56016
## 22     15     40       25     2   1.00 0.75 31.23005
## 23      3     55       25     1   1.00 1.00 53.13132

2.3 Factors a.Load the Cereal data again with the read.csv command again. This time, use the optional argument, stringsAsFactors = TRUE.

Solution:

cereal = read.csv("cereal.csv", stringsAsFactors = TRUE)

b.The mfr and type columns are now factors. Check that this is true.

Solution:

class(cereal$mfr)
## [1] "factor"
class(cereal$type)
## [1] "factor"

c.How many levels are there in mfr and type? (use the functions levels or nlevels)

Solution:

levels(cereal$mfr)
## [1] "A" "G" "K" "N" "P" "Q" "R"
levels(cereal$type)
## [1] "C" "H"

2.4 Vectors a.Extract the calories into a new vector called cereal.calories.

Solution:

cereal.calories <- cereal$calories

b.How many elements are there in cereal.calories? (length)

Solution:

length(cereal.calories)
## [1] 77

c.Extract the 5th to the 10th element from cereal.calories.

Solution:

cereal.calories[5:10]
## [1] 110 110 110 130  90  90

d.Add one more element to cereal.calories using c().

Solution:

cereal.calories <- c(cereal.calories,100)
print(cereal.calories)
##  [1]  70 120  70  50 110 110 110 130  90  90 120 110 120 110 110 110 100 110 110
## [20] 110 100 110 100 100 110 110 100 120 120 110 100 110 100 110 120 120 110 110
## [39] 110 140 110 100 110 100 150 150 160 100 120 140  90 130 120 100  50  50 100
## [58] 100 120 100  90 110 110  80  90  90 110 110  90 110 140 100 110 110 100 100
## [77] 110 100

2.5 Matrix a.Can you force the cereal data frame to be a Matrix? (as.matrix(cereal)). Check that the elements have been forced into the character type.

Solution:

cereal_matrix <- as.matrix(cereal)
class(cereal_matrix[1,9])
## [1] "character"

b.Now do this again, but this time leave out the mfr, name and type columns. Check that the elements are now numeric.

Solution:

cereal_abc <- cereal %>% select(-mfr,-name,-type)
str(cereal_abc)
## 'data.frame':    77 obs. of  13 variables:
##  $ calories: int  70 120 70 50 110 110 110 130 90 90 ...
##  $ protein : int  4 3 4 4 2 2 2 3 2 3 ...
##  $ fat     : int  1 5 1 0 2 2 0 2 1 0 ...
##  $ sodium  : int  130 15 260 140 200 180 125 210 200 210 ...
##  $ fiber   : num  10 2 9 14 1 1.5 1 2 4 5 ...
##  $ carbo   : num  5 8 7 8 14 10.5 11 18 15 13 ...
##  $ sugars  : int  6 8 5 0 8 10 14 8 6 5 ...
##  $ potass  : int  280 135 320 330 -1 70 30 100 125 190 ...
##  $ vitamins: int  25 0 25 25 25 25 25 25 25 25 ...
##  $ shelf   : int  3 3 3 3 3 1 2 3 1 3 ...
##  $ weight  : num  1 1 1 1 1 1 1 1.33 1 1 ...
##  $ cups    : num  0.33 1 0.33 0.5 0.75 0.75 1 0.75 0.67 0.67 ...
##  $ rating  : num  68.4 34 59.4 93.7 34.4 ...

3 Numerical summary 3.1 Summary Use the summary function to extract the median, 1st quartile and 3rd quartile data from the sodium column.

Solution:

summary(cereal$sodium)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   130.0   180.0   159.7   210.0   320.0

3.2 Basic statistics a.Find the max, min, standard deviation and mean of the sodium (max(), min(), sd(), mean())

Solution:

max(cereal$sodium)
## [1] 320
min(cereal$sodium)
## [1] 0
sd(cereal$sodium)
## [1] 83.8323
mean(cereal$sodium)
## [1] 159.6753

b.Find the mean sodium of each mfr.

Solution:

mean.sodium.by.mfr <- aggregate(sodium ~ mfr, data = cereal, FUN = mean, na.rm = TRUE)
print(mean.sodium.by.mfr)
##   mfr   sodium
## 1   A   0.0000
## 2   G 200.4545
## 3   K 174.7826
## 4   N  37.5000
## 5   P 146.1111
## 6   Q  92.5000
## 7   R 198.1250

4 Graphical summary 4.1 Boxplot a.Make a boxplot of the sodium against mfr using boxplot()

Solution:

boxplot(sodium ~ mfr, data=cereal, horizontal = TRUE,
        main="Sodium by Manufacturer", 
        xlab = "Sodium", 
        ylab = "Manufacturer")

4.2 Scatterplot a.Plot calories against sodium using plot().

Solution:

plot(calories ~ sodium, data=cereal, main="calories against sodium")

5 Write Data to File b.Write data frame with only the Kellogg’s observations to a file called kelloggs.csv. Use the write.csv command.

Solution:

write.csv(Kelloggs,file='kelloggs.csv')
head(Kelloggs)
##                        name mfr type calories protein fat sodium fiber carbo
## 1                  All-Bran   K    C       70       4   1    260     9     7
## 2 All-Bran_with_Extra_Fiber   K    C       50       4   0    140    14     8
## 3               Apple_Jacks   K    C      110       2   0    125     1    11
## 4               Corn_Flakes   K    C      100       2   0    290     1    21
## 5                 Corn_Pops   K    C      110       1   0     90     1    13
## 6        Cracklin'_Oat_Bran   K    C      110       3   3    140     4    10
##   sugars potass vitamins shelf weight cups   rating
## 1      5    320       25     3      1 0.33 59.42551
## 2      0    330       25     3      1 0.50 93.70491
## 3     14     30       25     2      1 1.00 33.17409
## 4      2     35       25     1      1 1.00 45.86332
## 5     12     20       25     2      1 1.00 35.78279
## 6      7    160       25     3      1 0.50 40.44877