Cereal.csv data file from Canvas for this
lab.R
markdown file, run some simple code. The questions here aim to get you
familiar with the R syntax.Before you start the tutorial, create a new RStudio project in a new directory. Then, start your tutorial by writing your answers in a new R Markdown file.
Cereal.csv file from the Canvas page and
use the read.csv command to read in the csv file into
R and assign it to the object called
cereal.setwd("/Users/zixuan/Desktop/2024S1/stat5003 /w1")
cereal <- read.csv("Cereal.csv")
R called
cereal. Use the head function to inspect the
first few lines of the data frame and use class to check
that cereal is in fact a data frame.head(cereal,10)
## name mfr type calories protein fat sodium fiber carbo
## 1 100%_Bran N C 70 4 1 130 10.0 5.0
## 2 100%_Natural_Bran Q C 120 3 5 15 2.0 8.0
## 3 All-Bran K C 70 4 1 260 9.0 7.0
## 4 All-Bran_with_Extra_Fiber K C 50 4 0 140 14.0 8.0
## 5 Almond_Delight R C 110 2 2 200 1.0 14.0
## 6 Apple_Cinnamon_Cheerios G C 110 2 2 180 1.5 10.5
## 7 Apple_Jacks K C 110 2 0 125 1.0 11.0
## 8 Basic_4 G C 130 3 2 210 2.0 18.0
## 9 Bran_Chex R C 90 2 1 200 4.0 15.0
## 10 Bran_Flakes P C 90 3 0 210 5.0 13.0
## sugars potass vitamins shelf weight cups rating
## 1 6 280 25 3 1.00 0.33 68.40297
## 2 8 135 0 3 1.00 1.00 33.98368
## 3 5 320 25 3 1.00 0.33 59.42551
## 4 0 330 25 3 1.00 0.50 93.70491
## 5 8 -1 25 3 1.00 0.75 34.38484
## 6 10 70 25 1 1.00 0.75 29.50954
## 7 14 30 25 2 1.00 1.00 33.17409
## 8 8 100 25 3 1.33 0.75 37.03856
## 9 6 125 25 1 1.00 0.67 49.12025
## 10 5 190 25 3 1.00 0.67 53.31381
class(cereal)
## [1] "data.frame"
cereal data frame? How
many rows are there? (dim and nrow)dim(cereal)
## [1] 77 16
nrow(cereal)
## [1] 77
calories column using the $
operator and using the [[ operator.cereal$calories
## [1] 70 120 70 50 110 110 110 130 90 90 120 110 120 110 110 110 100 110 110
## [20] 110 100 110 100 100 110 110 100 120 120 110 100 110 100 110 120 120 110 110
## [39] 110 140 110 100 110 100 150 150 160 100 120 140 90 130 120 100 50 50 100
## [58] 100 120 100 90 110 110 80 90 90 110 110 90 110 140 100 110 110 100 100
## [77] 110
cereal[["calories"]]
## [1] 70 120 70 50 110 110 110 130 90 90 120 110 120 110 110 110 100 110 110
## [20] 110 100 110 100 100 110 110 100 120 120 110 100 110 100 110 120 120 110 110
## [39] 110 140 110 100 110 100 150 150 160 100 120 140 90 130 120 100 50 50 100
## [58] 100 120 100 90 110 110 80 90 90 110 110 90 110 140 100 110 110 100 100
## [77] 110
cereal data frame.head(cereal,10)
## name mfr type calories protein fat sodium fiber carbo
## 1 100%_Bran N C 70 4 1 130 10.0 5.0
## 2 100%_Natural_Bran Q C 120 3 5 15 2.0 8.0
## 3 All-Bran K C 70 4 1 260 9.0 7.0
## 4 All-Bran_with_Extra_Fiber K C 50 4 0 140 14.0 8.0
## 5 Almond_Delight R C 110 2 2 200 1.0 14.0
## 6 Apple_Cinnamon_Cheerios G C 110 2 2 180 1.5 10.5
## 7 Apple_Jacks K C 110 2 0 125 1.0 11.0
## 8 Basic_4 G C 130 3 2 210 2.0 18.0
## 9 Bran_Chex R C 90 2 1 200 4.0 15.0
## 10 Bran_Flakes P C 90 3 0 210 5.0 13.0
## sugars potass vitamins shelf weight cups rating
## 1 6 280 25 3 1.00 0.33 68.40297
## 2 8 135 0 3 1.00 1.00 33.98368
## 3 5 320 25 3 1.00 0.33 59.42551
## 4 0 330 25 3 1.00 0.50 93.70491
## 5 8 -1 25 3 1.00 0.75 34.38484
## 6 10 70 25 1 1.00 0.75 29.50954
## 7 14 30 25 2 1.00 1.00 33.17409
## 8 8 100 25 3 1.33 0.75 37.03856
## 9 6 125 25 1 1.00 0.67 49.12025
## 10 5 190 25 3 1.00 0.67 53.31381
Kelloggs which only
contains rows that belongs to manufacturer, Kellogs (when
mfr takes the value "K").Kelloggs <- subset(cereal, mfr == "K")
Cereal data again with the
read.csv command again. This time, use the optional
argument, stringsAsFactors = TRUE.cereal <- read.csv('Cereal.csv',stringsAsFactors = TRUE)
mfr and type columns are now factors.
Check that this is true.class(cereal$mfr)
## [1] "factor"
class(cereal$type)
## [1] "factor"
mfr and type?
(use the functions levels or nlevels)levels(cereal$mfr)
## [1] "A" "G" "K" "N" "P" "Q" "R"
nlevels(cereal$mfr)
## [1] 7
calories into a new vector called
cereal.calories.cereal.calories <- cereal$calories
cereal.calories
## [1] 70 120 70 50 110 110 110 130 90 90 120 110 120 110 110 110 100 110 110
## [20] 110 100 110 100 100 110 110 100 120 120 110 100 110 100 110 120 120 110 110
## [39] 110 140 110 100 110 100 150 150 160 100 120 140 90 130 120 100 50 50 100
## [58] 100 120 100 90 110 110 80 90 90 110 110 90 110 140 100 110 110 100 100
## [77] 110
cereal.calories?
(length)length(cereal.calories)
## [1] 77
cereal.calories.cereal.calories[c(5:10)]
## [1] 110 110 110 130 90 90
cereal.calories using
c().length(cereal.calories)
## [1] 77
cereal.calories <- c(cereal.calories, 3)
length(cereal.calories)
## [1] 78
as.matrix(cereal)). Check that the elements have been
forced into the character type.cereal_matrix <- as.matrix(cereal)
class(cereal_matrix)
## [1] "matrix" "array"
mfr,
name and type columns. Check that the elements
are now numeric.cereal_matrix <- cereal[, !(names(cereal) %in% c("mfr", "name", "type"))]
cereal_matrix <- as.matrix(cereal_matrix)
str(cereal_matrix)
## num [1:77, 1:13] 70 120 70 50 110 110 110 130 90 90 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:13] "calories" "protein" "fat" "sodium" ...
summary function to extract the median, 1st
quartile and 3rd quartile data from the sodium column.cereal_number <- summary(cereal$sodium)
sodium_stats <- cereal_number[c("Median", "1st Qu.", "3rd Qu.")]
sodium_stats
## Median 1st Qu. 3rd Qu.
## 180 130 210
sodium (max(), min(),
sd(), mean())max(cereal$sodium)
## [1] 320
min(cereal$sodium)
## [1] 0
sd(cereal$sodium)
## [1] 83.8323
mean(cereal$sodium)
## [1] 159.6753
sodium of each mfr.mean_sodium_per_mfr <- aggregate(sodium ~ mfr, data = cereal, FUN=mean)
mean_sodium_per_mfr
## mfr sodium
## 1 A 0.0000
## 2 G 200.4545
## 3 K 174.7826
## 4 N 37.5000
## 5 P 146.1111
## 6 Q 92.5000
## 7 R 198.1250
sodium against mfr
using boxplot().boxplot(sodium ~ mfr, data = cereal, xlab = 'Manufacturer', ylab = 'Sodium', main = "Something")
calories against sodium using
plot().plot(calories ~ sodium, data = cereal, main = "Something")
kelloggs.csv. Use the write.csv
command.write.csv(Kelloggs, "kelloggs.csv", row.names = FALSE)
file.exists("Kelloggs.csv")
## [1] TRUE