Write a simple R Markdown to explain some of the R codes you have learned regarding data frame. You are free to write your own program and add new codes such as how would you show all the rows except the last one or how would you get the last 6 rows of the data frame. Eg. you can explain what a data frame is, and then write the code to show how R handles data frame, and so on. But one mandatory topic is you must explain the different ways R returns back a vector or a data frame when you access values from a data frame. Your program doesn’t have to be long. Play around with the different font sizes in markdown to make your markdown readable. Explore. Publish your markdown on RPubs and submit the link only. Adding new codes that have not been discussed will earn you high marks.
Note : Best to create your own data frame or use a data frame which is small in size and so that it would be easy to see the results after the execution of each or after a few codes.
Let’s load some data.
data("iris")
Next, test some basic R command.
class(iris)
## [1] "data.frame"
typeof(iris)
## [1] "list"
Data frame is a two-dimension data structure in R to stored tabular data.
Based on the this stackoverflow articles, typeof function returns the way an object is stored in memory, while class function return on the abstract type of an object.
In other words, a data frame is stored in memory as lists. In fact, a data frame can be regarded as a collection of special list with equal length.
A data frame can be created in several ways:-
Method 1:load from R built-in dataset.
data("iris")
Method 2:read from external source using read.csv() or read.table() functions.
#Read from external source:
Data_Entry<-read.csv("Data Entry.csv",stringsAsFactors = F,header = T)
Method 3:explicitly create a data frame using the data.frame() function.
allowance<-data.frame(Name=c("Ali","Sam","Chan","Vijay","Intan")
,Allowance=c(500,450,650,300,495)
)
Method 4: coerce from other object types using as.data.frame() function.
workhour<-as.data.frame(list(Name=c("Ali","Sam","Chan","Vijay","Intan")
,workhour=c(100,90,120,70,95)))
The dimensions of a data frame can be obtained via several method:-
dim() function
dim(iris)
is.vector(dim(iris))
## [1] 150 5
## [1] TRUE
ncol() and nrow()
ncol(iris)
nrow(iris)
## [1] 5
## [1] 150
dim() function returns a vector with two elements. The first element of the vector is the row numbers of the data frame (same as the value returned from ncol()) and the second element of the vector is the column numbers of the data frame (same as the value returned from nrow()).
cat("\nstr function \n")
str(iris)
cat("\nSummary function \n")
summary(iris)
##
## str function
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
##
## Summary function
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
str() and summary() functions are handy for having an overview of the data frame. str() function returns the column names, value type/class and the first few value for each column from the data frame. summary() function summarise the data into measures of central tendency and quantiles.
Unlike matrix, a data frame can contain different class under different columns or vectors.
There are several ways in viewing data frame, depending on your purpose.
View(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
head() function allow us to view the first 6 rows of a data frame, by default.
head(iris,10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
head() function will return different numbers of rows if it is specified in the function.
tail(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
tail() function allow us to view the last 6 rows of a data frame, by default.
tail(iris,10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
tail() function will return different numbers of rows if it is specified in the function.
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
row.names(iris)
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
## [13] "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24"
## [25] "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36"
## [37] "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48"
## [49] "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60"
## [61] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72"
## [73] "73" "74" "75" "76" "77" "78" "79" "80" "81" "82" "83" "84"
## [85] "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96"
## [97] "97" "98" "99" "100" "101" "102" "103" "104" "105" "106" "107" "108"
## [109] "109" "110" "111" "112" "113" "114" "115" "116" "117" "118" "119" "120"
## [121] "121" "122" "123" "124" "125" "126" "127" "128" "129" "130" "131" "132"
## [133] "133" "134" "135" "136" "137" "138" "139" "140" "141" "142" "143" "144"
## [145] "145" "146" "147" "148" "149" "150"
names() returns the column names of a data frame, while row.names() returns the row names of a data frame.
iris2<-iris
cat(paste0("\nThe original column names of the data frame:",paste0(names(iris2),collapse = ", "),"\n"))
names(iris2)[1]<-"Sepal.Length2"
names(iris2)[names(iris2)=="Species"]<-"Species2"
cat(paste0("\nThe new column names of the data frame:",paste0(names(iris2),collapse = ", "),"\n"))
##
## The original column names of the data frame:Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species
##
## The new column names of the data frame:Sepal.Length2, Sepal.Width, Petal.Length, Petal.Width, Species2
The column names of a data frame can be renamed by assigning a new character to a specific index of the data frame names.
There are several ways to access the elements in the data frame
Method 1: Integer Vectors as index
head(iris)
cat("\n")
iris[1,1]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
##
## [1] 5.1
iris[1,1] return the element from the first row and first column.
iris[2,1]
## [1] 4.9
iris[1,3]
## [1] 1.4
As shown in the example above, iris[2,1] returns the element from the second row and first column. While iris[1,3] returns the element from the first row and the third column. We could generalise the rule as data frame[x,y] returns the element from x row and y column of the data frame.
Now, we learn how to access a specific element in the data frame, what if we want to access more than one element in the data frame?
1:4
c(2,4)
iris[1:4,c(2,4)]
## [1] 1 2 3 4
## [1] 2 4
## Sepal.Width Petal.Width
## 1 3.5 0.2
## 2 3.0 0.2
## 3 3.2 0.2
## 4 3.1 0.2
We could make use of a series of integer vectors. Insert 1:4, which is equivalent to c(1,2,3,4) as the row indices return the first to fourth rows. Insert c(2,4) as the column indices return the second and fourth columns of the data frame.
cat("head(iris)\n")
head(iris)
cat("\nhead(iris[-2,-5])\n")
head(iris[,c(-2,-5)])
## head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
##
## head(iris[-2,-5])
## Sepal.Length Petal.Length Petal.Width
## 1 5.1 1.4 0.2
## 2 4.9 1.4 0.2
## 3 4.7 1.3 0.2
## 4 4.6 1.5 0.2
## 5 5.0 1.4 0.2
## 6 5.4 1.7 0.4
From the example above, negative indices remove the indicated columns from the data frame, e.g. c(-2,-5) remove the second columns and the fifth columns from the data frame.
Method 2: Logical vectors as index
head(iris[,c(T,F,T,F,T)])
## Sepal.Length Petal.Length Species
## 1 5.1 1.4 setosa
## 2 4.9 1.4 setosa
## 3 4.7 1.3 setosa
## 4 4.6 1.5 setosa
## 5 5.0 1.4 setosa
## 6 5.4 1.7 setosa
R returns the elements where its corresponding logical vector is TRUE.
iris[c(T,F,F,F,F,T),]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 150 5.9 3.0 5.1 1.8 virginica
If the T/F vector is less than the total length of the data frame row/columns, R merely replicates (recycles) the T/F vector until its length is the same as the data frame row/columns
iris[iris[,1]>5,1]
is.data.frame(iris[iris[,1]>5,1])
is.data.frame(iris[iris[,1]>5,1,drop=F]) #coerce to data frame
## [1] 5.1 5.4 5.4 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 5.1 5.2 5.2 5.4 5.2 5.5 5.5
## [19] 5.1 5.1 5.1 5.3 7.0 6.4 6.9 5.5 6.5 5.7 6.3 6.6 5.2 5.9 6.0 6.1 5.6 6.7
## [37] 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0
## [55] 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1
## [73] 6.3 6.5 7.6 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6
## [91] 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9
## [109] 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9
## [1] FALSE
## [1] TRUE
Since T/F vectors are allowed to access elements in the data frame, conditions which return a T/F vectors could return elements in R.
Notice from the above examples that R automatically return a vector if only 1 column is selected. To coerce the result into a data frame, we could use “drop=F” statement.
Method 3: Character vectors as index
names(iris)
cat('\niris[["Species"]]\n')
head(iris[["Species"]])
is.data.frame(iris[["Species"]])
cat('\niris["Species"]\n')
head(iris["Species"])
is.data.frame(iris["Species"])
cat('\niris$Species\n')
head(iris$Species)
is.data.frame(iris$Species)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
##
## iris[["Species"]]
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
## [1] FALSE
##
## iris["Species"]
## Species
## 1 setosa
## 2 setosa
## 3 setosa
## 4 setosa
## 5 setosa
## 6 setosa
## [1] TRUE
##
## iris$Species
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
## [1] FALSE
We could use the names of the data frame to access the elements of the data frame.
“[[]]”,“[]” and “$” return similar results with the exception that “[]” returns the result in data frame while the other two methods return the result in vectors.
A data frame can be modified in several ways:-
cat("\nThe original value if iris2[1,1] is:\n")
iris2[1,1]
iris2[1,1]<--999
cat("\nThe new value if iris2[1,1] is:\n")
iris2[1,1]
##
## The original value if iris2[1,1] is:
## [1] 5.1
##
## The new value if iris2[1,1] is:
## [1] -999
head(cbind(iris,Data_Entry))
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Data_Entry ID
## 1 5.1 3.5 1.4 0.2 setosa Ali A003
## 2 4.9 3.0 1.4 0.2 setosa Ali A003
## 3 4.7 3.2 1.3 0.2 setosa Ali A003
## 4 4.6 3.1 1.5 0.2 setosa Ali A003
## 5 5.0 3.6 1.4 0.2 setosa Ali A003
## 6 5.4 3.9 1.7 0.4 setosa Ali A003
## Student
## 1 Yes
## 2 Yes
## 3 Yes
## 4 Yes
## 5 Yes
## 6 Yes
cbind(), which stands for column-bind, combines two data frame by column.
iris2<-rbind(iris
,c(Sepal.Length=-999
,Sepal.Width=999
,Petal.Length=-999
,Petal.Width=0.999
,Species="setosa")
)
tail(iris2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 146 6.7 3 5.2 2.3 virginica
## 147 6.3 2.5 5 1.9 virginica
## 148 6.5 3 5.2 2 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3 5.1 1.8 virginica
## 151 -999 999 -999 0.999 setosa
rbind(), which stands for row-bind, combines two data frame by row.
s<-rep(c("train","test"),150/2)
iris2<-iris
iris2$sample<-s
head(iris2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species sample
## 1 5.1 3.5 1.4 0.2 setosa train
## 2 4.9 3.0 1.4 0.2 setosa test
## 3 4.7 3.2 1.3 0.2 setosa train
## 4 4.6 3.1 1.5 0.2 setosa test
## 5 5.0 3.6 1.4 0.2 setosa train
## 6 5.4 3.9 1.7 0.4 setosa test
In R, we can create a new column by assigning a vector to the column name after the $ sign.
iris2$sample<-NULL
head(iris2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Assigning “NULL” to an existing columns leads to the removal of the column from the data frame.
It’s easy to combine two data frames with the same length, as shown by the cbind() function.
head(iris)
head(Data_Entry)
head(iris2<-cbind(iris,Data_Entry))
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## Data_Entry ID Student
## 1 Ali A003 Yes
## 2 Ali A003 Yes
## 3 Ali A003 Yes
## 4 Ali A003 Yes
## 5 Ali A003 Yes
## 6 Ali A003 Yes
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Data_Entry ID
## 1 5.1 3.5 1.4 0.2 setosa Ali A003
## 2 4.9 3.0 1.4 0.2 setosa Ali A003
## 3 4.7 3.2 1.3 0.2 setosa Ali A003
## 4 4.6 3.1 1.5 0.2 setosa Ali A003
## 5 5.0 3.6 1.4 0.2 setosa Ali A003
## 6 5.4 3.9 1.7 0.4 setosa Ali A003
## Student
## 1 Yes
## 2 Yes
## 3 Yes
## 4 Yes
## 5 Yes
## 6 Yes
But, what if we want to merge two data frame based on specific a merging key, rather than column by column?
head(iris2)
allowance
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Data_Entry ID
## 1 5.1 3.5 1.4 0.2 setosa Ali A003
## 2 4.9 3.0 1.4 0.2 setosa Ali A003
## 3 4.7 3.2 1.3 0.2 setosa Ali A003
## 4 4.6 3.1 1.5 0.2 setosa Ali A003
## 5 5.0 3.6 1.4 0.2 setosa Ali A003
## 6 5.4 3.9 1.7 0.4 setosa Ali A003
## Student
## 1 Yes
## 2 Yes
## 3 Yes
## 4 Yes
## 5 Yes
## 6 Yes
## Name Allowance
## 1 Ali 500
## 2 Sam 450
## 3 Chan 650
## 4 Vijay 300
## 5 Intan 495
How should we merge the “Allowance” in the allowance data frame to iris2, using Name (names of the data entry staff) as the merging key?
Method 1: merge()
iris3<-merge(iris2,allowance,by.x = "Data_Entry",by.y="Name")
head(iris3)
## Data_Entry Sepal.Length Sepal.Width Petal.Length Petal.Width Species ID
## 1 Ali 5.1 3.5 1.4 0.2 setosa A003
## 2 Ali 4.9 3.0 1.4 0.2 setosa A003
## 3 Ali 4.7 3.2 1.3 0.2 setosa A003
## 4 Ali 4.6 3.1 1.5 0.2 setosa A003
## 5 Ali 5.0 3.6 1.4 0.2 setosa A003
## 6 Ali 5.4 3.9 1.7 0.4 setosa A003
## Student Allowance
## 1 Yes 500
## 2 Yes 500
## 3 Yes 500
## 4 Yes 500
## 5 Yes 500
## 6 Yes 500
Method 2: left_join() from library(dplyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
iris4<-left_join(iris3,workhour,by=c("Data_Entry"="Name"))
head(iris4)
## Data_Entry Sepal.Length Sepal.Width Petal.Length Petal.Width Species ID
## 1 Ali 5.1 3.5 1.4 0.2 setosa A003
## 2 Ali 4.9 3.0 1.4 0.2 setosa A003
## 3 Ali 4.7 3.2 1.3 0.2 setosa A003
## 4 Ali 4.6 3.1 1.5 0.2 setosa A003
## 5 Ali 5.0 3.6 1.4 0.2 setosa A003
## 6 Ali 5.4 3.9 1.7 0.4 setosa A003
## Student Allowance workhour
## 1 Yes 500 100
## 2 Yes 500 100
## 3 Yes 500 100
## 4 Yes 500 100
## 5 Yes 500 100
## 6 Yes 500 100
table() function summarise the count of records based on two variables (columns). The table above shows the records collected by each staff for each flower species.
table(iris4$Data_Entry,iris$Species)
##
## setosa versicolor virginica
## Ali 29 0 0
## Chan 21 32 0
## Intan 0 18 3
## Sam 0 0 14
## Vijay 0 0 33
hist() and plot() function enable us to visualise data. The following are some example on visualising the size of the flower petals:-
Example 1: Petal Length
ax=c(min(iris4$Petal.Length),max(iris4$Petal.Length))
brk<-pretty(ax,n=30)
h1<-hist(iris4$Petal.Length[iris4$Species=="setosa"],plot = F,breaks=brk)
h2<-hist(iris4$Petal.Length[iris4$Species=="versicolor"],plot = F,breaks=brk)
h3<-hist(iris4$Petal.Length[iris4$Species=="virginica"],plot = F,breaks=brk)
plot(h1,col="lightblue",xlim = ax,ylim = c(0,25),main=NULL,xlab = "Petal Length by Flower Species")
plot(h2,add=T,col="#FFC0CB7F",xlim = ax)
plot(h3,add=T,col="#00FF0033",xlim = ax)
legend("topright",legend = c("setosa","versicolor","virginica")
,col=c("lightblue","#FFC0CB7F","#00FF0033"), lwd=10)
Example 2: Petal Width
ax=c(min(iris4$Petal.Width),max(iris4$Petal.Width))
brk<-pretty(ax,n=30)
h1<-hist(iris4$Petal.Width[iris4$Species=="setosa"],plot = F,breaks=brk)
h2<-hist(iris4$Petal.Width[iris4$Species=="versicolor"],plot = F,breaks=brk)
h3<-hist(iris4$Petal.Width[iris4$Species=="virginica"],plot = F,breaks=brk)
plot(h1,col="lightblue",xlim = ax,ylim = c(0,30),main=NULL,xlab = "Petal Width by Flower Species")
plot(h2,add=T,col="#FFC0CB7F",xlim = ax)
plot(h3,add=T,col="#00FF0033",xlim = ax)
legend("topright",legend = c("setosa","versicolor","virginica")
,col=c("lightblue","#FFC0CB7F","#00FF0033"), lwd=10)
Example 3: Petal Size = Petal Length X Petal Width
iris4$Petal.Size<-iris4$Petal.Length*iris4$Petal.Width
ax=c(min(iris4$Petal.Size),max(iris4$Petal.Size))
brk<-pretty(ax,n=20)
h1<-hist(iris4$Petal.Size[iris4$Species=="setosa"],plot = F,breaks=brk)
h2<-hist(iris4$Petal.Size[iris4$Species=="versicolor"],plot = F,breaks=brk)
h3<-hist(iris4$Petal.Size[iris4$Species=="virginica"],plot = F,breaks=brk)
plot(h1,col="lightblue",xlim = ax,ylim = c(0,55),main=NULL,xlab = "Petal Size by Flower Species")
plot(h2,add=T,col="#FFC0CB7F",xlim = ax)
plot(h3,add=T,col="#00FF0033",xlim = ax)
legend("topright",legend = c("setosa","versicolor","virginica")
,col=c("lightblue","#FFC0CB7F","#00FF0033"), lwd=10)
The histogram above shows that the size of the petals varies among setosa, vericolor and virgica species.