We will introduce the data type and data structure of R. For example, we use c() to denote a set of numbers and strings.
A<-c("Taipei City","New Taipei City", "Taoyuan City", "Taichung City","Tainan City","Kaohsiung City")
print(A)## [1] "Taipei City" "New Taipei City" "Taoyuan City" "Taichung City"
## [5] "Tainan City" "Kaohsiung City"
B<-c(0,1,2,3,4,5,6,7,8,9)
print(B)## [1] 0 1 2 3 4 5 6 7 8 9
R?R is a free software for data analysis and statistical computing. It can be download from here. It has a base package and allows users to download packages from a specific repository or the “Comprehensive R Archive Network” (CRAN).
Please make sure that you download the right version compatible with your system.
RStudio is an IDE that expands the functions of R. For example, you can embed R codes in a html or \(\LaTeX\) document. You can also import different kinds of datasets easily. Notice that it cannot run without R.
Numeric data can be numeric or integer. For example:
x<-c(2, 4, 6, 8); x## [1] 2 4 6 8
class(x)## [1] "numeric"
The function
We can also use scientific notations to present the numeric data:
y=c(1.1e+06); y## [1] 1100000
class(y)## [1] "numeric"
Integer is a subset of numbers. In a 32-bit operation system, the maximum integer is 2147483647,far smaller than the maximum of number. Every number is composed of significan (\(+\-\)), mantissa and exponent to be computed, in particular the number with a decimal. For example:
u<-as.integer(c(4)); class(u)## [1] "integer"
Another example:
a=1.356e+3
is.numeric(a)## [1] TRUE
is.integer(a)## [1] FALSE
options(digits=20)
pi## [1] 3.141592653589793116
is.numeric(pi); is.integer(pi)## [1] TRUE
## [1] FALSE
Integer can be added, substracted, multiplied, and divided like number.
J<-c(3, 6, 33)
J<-as.integer(J)
J-1; J*2; J/3## [1] 2 5 32
## [1] 6 12 66
## [1] 1 2 11
Charaters are easy to understand, espcially for data visualization. For example:
library(lattice)
s77<-data.frame(state.x77)
s77<-s77[order(s77$Population,decreasing=T),]
dotchart(s77$Population,
labels=row.names(s77), pch=16,cex=.7, xlab='Population')Here we transform the matrix state.x77 to be a data frame, and then we draw a lattice package.
numbers can be treated as characters:
char1<-c("1","2", "Do", "Re", "Mi"); char1## [1] "1" "2" "Do" "Re" "Mi"
Can we convert characters to numbers? No. But we can convert it to factor, then converting it to numbers.
LETTERS## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
LETTERS.f <- as.factor(LETTERS)
as.numeric(LETTERS.f)## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26
Please read the councilor data. Notice that we can encode the data if it is not UTF-8.
library(foreign)
df <- read.csv('councilor.csv', header=T, fileEncoding = 'BIG5',
colClasses = "character")
df## Year budget unit contracter open
## 1 2015 676 Hydraulic Engineering Office 台球 Yes
## 2 2016 673 New Construction Office 茂盛 Yes
## 3 2016 270 New Construction Office 冠君 Yes
## 4 2016 255 New Construction Office 金煌 Yes
## 5 2016 235 New Construction Office 聖鋒 Yes
## 6 2016 190 New Construction Office 福呈 No
## 7 2015 155 Park and Stree Light Office 盛吉 Yes
## 8 2016 154 New Construction Office 茂盛 Yes
## 9 2016 142 New Construction Office 冠君 Yes
## 10 2016 123 New Construction Office 未發包 Yes
as.numeric(df$unit)## [1] NA NA NA NA NA NA NA NA NA NA
To make the conversion easier, we change the arguments
library(foreign)
df <- read.csv('councilor.csv', header=T, fileEncoding = 'BIG5',
colClasses = "factor")
df## Year budget unit contracter open
## 1 2015 676 Hydraulic Engineering Office 台球 Yes
## 2 2016 673 New Construction Office 茂盛 Yes
## 3 2016 270 New Construction Office 冠君 Yes
## 4 2016 255 New Construction Office 金煌 Yes
## 5 2016 235 New Construction Office 聖鋒 Yes
## 6 2016 190 New Construction Office 福呈 No
## 7 2015 155 Park and Stree Light Office 盛吉 Yes
## 8 2016 154 New Construction Office 茂盛 Yes
## 9 2016 142 New Construction Office 冠君 Yes
## 10 2016 123 New Construction Office 未發包 Yes
as.numeric(df$unit)## [1] 1 2 2 2 2 2 3 2 2 2
s77$ok<-s77$Income>5000; s77$ok## [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [23] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [45] TRUE TRUE FALSE FALSE FALSE TRUE
options(digits=3)
s77[s77$ok, ]## Population Income Illiteracy Life.Exp Murder HS.Grad Frost
## California 21198 5114 1.1 71.7 10.3 62.6 20
## Illinois 11197 5107 0.9 70.1 10.3 52.6 127
## New Jersey 7333 5237 1.1 70.9 5.2 52.5 115
## Maryland 4122 5299 0.9 70.2 8.5 52.3 101
## Connecticut 3100 5348 1.1 72.5 3.1 56.0 139
## North Dakota 637 5087 0.8 72.8 1.4 50.3 186
## Nevada 590 5149 0.5 69.0 11.5 65.2 188
## Alaska 365 6315 1.5 69.3 11.3 66.7 152
## Area ok
## California 156361 TRUE
## Illinois 55748 TRUE
## New Jersey 7521 TRUE
## Maryland 9891 TRUE
## Connecticut 4862 TRUE
## North Dakota 69273 TRUE
## Nevada 109889 TRUE
## Alaska 566432 TRUE
Please try to run the following codes:
library(car)
head(Duncan)
income.hi<-Duncan$income>70
Duncan[income.hi, ]How many “bc” are left in your dataset?
Sys.Date()v<-c("2/27/2018", "6/26/2018", "12/31/2018"); class(v)## [1] "character"
v.date<-as.Date(v, format='%m/%d/%Y'); class(v.date)## [1] "Date"
v.date## [1] "2018-02-27" "2018-06-26" "2018-12-31"
v.date <-format(v.date, "%b. %d, %Y"); v.date## [1] " 2. 27, 2018" " 6. 26, 2018" "12. 31, 2018"
Or
v<-c("", "6/26/2018", "12/31/2018")
as.Date(v, format='%m/%d/%Y')## [1] NA "2018-06-26" "2018-12-31"
S <- c(50, 100)
as.Date(S, origin="2018-01-01")## [1] "2018-02-20" "2018-04-11"
library(dplyr)
df <- tibble(date=c("2016", "2017", "2018"),
students=c(20, 22, 18),
teachers=c(12, 13, 20))
df## # A tibble: 3 x 3
## date students teachers
## <chr> <dbl> <dbl>
## 1 2016 20. 12.
## 2 2017 22. 13.
## 3 2018 18. 20.
library(reshape2); library(ggplot2)
df2 <- melt(df, id.vars="date", variable.name="Group")
df2$Group <-as.factor(df2$Group)
df2## date Group value
## 1 2016 students 20
## 2 2017 students 22
## 3 2018 students 18
## 4 2016 teachers 12
## 5 2017 teachers 13
## 6 2018 teachers 20
ggplot(df2, aes(x=date, y=value, col=Group)) +
geom_line( size=1) +
geom_point(shape=6, size=3) library(reshape2); library(ggplot2)
df2 <- melt(df, id.vars="date", variable.name="Group")
df2$Group <-as.factor(df2$Group)
df2$date <- as.Date(df2$date, format="%Y")
df2## date Group value
## 1 2016-07-05 students 20
## 2 2017-07-05 students 22
## 3 2018-07-05 students 18
## 4 2016-07-05 teachers 12
## 5 2017-07-05 teachers 13
## 6 2018-07-05 teachers 20
ggplot(df2, aes(x=date, y=value, col=Group)) +
geom_line( size=1) +
geom_point(shape=16, size=3) library(dplyr)
df <- tibble(date=c("2018/07/11", "2018/07/12", "2018/07/13"),
students=c(20, 22, 18),
teachers=c(2, 3, 4))
df ## # A tibble: 3 x 3
## date students teachers
## <chr> <dbl> <dbl>
## 1 2018/07/11 20. 2.
## 2 2018/07/12 22. 3.
## 3 2018/07/13 18. 4.
library(reshape2); library(ggplot2)
df2 <- melt(df, id.vars="date", variable.name="Group")
df2$Group <-as.factor(df2$Group)
df2$date <- as.Date(df2$date, format="%Y/%m/%d")
df2## date Group value
## 1 2018-07-11 students 20
## 2 2018-07-12 students 22
## 3 2018-07-13 students 18
## 4 2018-07-11 teachers 2
## 5 2018-07-12 teachers 3
## 6 2018-07-13 teachers 4
ggplot(df2, aes(x=date, y=value, col=Group)) +
geom_line( size=1) +
geom_point(shape=16, size=3) +
scale_x_date(date_labels = "%Y/%m/%d")xi<-"1953-06-15" #Xi's birthday
tsai<-"1956-08-31" #Tsai's birthdayas.Date(c(xi,tsai))## [1] "1953-06-15" "1956-08-31"
difftime(tsai, xi)## Time difference of 1173 days
| symbol | meaing | example |
|---|---|---|
| %d | date as number | 01-31 |
| %a | abbreviated weekday | Mon |
| %A | unabbreviated weekday | Monday |
| %m | month as number | 01-12 |
| %b | abbreviate month | Jan |
| %B | unabbreviate month | January |
| %y | 2-digit year | 18 |
| %Y | 4-digit year | 2018 |
Today<-Sys.Date(); Today## [1] "2018-07-05"
to_day<-format(Today, format='%Y-%b-%d'); to_day## [1] "2018- 7-05"
to_day<-format(Today, format='%m %d (%a), %y'); to_day## [1] "07 05 (四), 18"
example<-c(0,1,2,3,4)
print(example)## [1] 0 1 2 3 4
c(2,4,6,8)->Ac(letters)## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
c(LETTERS)## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
j<-c(2*2,2*9, 10-2, sqrt(25)); j## [1] 4 18 8 5
R<-c(30, 40, 50); A <- R/5; A; A^2 ## [1] 6 8 10
## [1] 36 64 100
attr(mtcars$wt, "myattribute") <-"Weights"
attr(mtcars$wt, "mylabel") <-"Weights of Cars"
attr(mtcars$wt, "myattribute")## [1] "Weights"
attr(mtcars$wt, "mylabel")## [1] "Weights of Cars"
attributes(mtcars$wt)## $myattribute
## [1] "Weights"
##
## $mylabel
## [1] "Weights of Cars"
U <- c(60, 70, 75, 82)
attr(U, "myattribute")<-"humanity"
names(U) <-c("Taichung", "Chiayi", "Tainan", "Kaohsiung")
U## Taichung Chiayi Tainan Kaohsiung
## 60 70 75 82
## attr(,"myattribute")
## [1] "humanity"
barchart(U)RA <- c(R, A)
RA## [1] 30 40 50 6 8 10
M1 <- c(1:3); M1## [1] 1 2 3
M2 <- rep(10, 4); M2## [1] 10 10 10 10
M1 + M2## [1] 11 12 13 11
Data can have categories, which look like characters.
library(ggplot2)
class(diamonds$cut)## [1] "ordered" "factor"
table(diamonds$cut)##
## Fair Good Very Good Premium Ideal
## 1610 4906 12082 13791 21551
H<-c("Hi", "Lo", "Lo", "Middle", "Middle", "Middle")
table(H)## H
## Hi Lo Middle
## 1 2 3
H.o <- ordered(H, levels=c("Lo", "middle", "Hi"))
table(H.o)## H.o
## Lo middle Hi
## 2 0 1
table(diamonds$cut, diamonds$color)##
## D E F G H I J
## Fair 163 224 312 314 303 175 119
## Good 662 933 909 871 702 522 307
## Very Good 1513 2400 2164 2299 1824 1204 678
## Premium 1603 2337 2331 2924 2360 1428 808
## Ideal 2834 3903 3826 4884 3115 2093 896
library(car)
library(lattice)
plot( Chile$vote ~ Chile$sex, xlab="Sex", ylab="Vote")library(car)
data(Chile)
class(Chile$region)## [1] "factor"
table(Chile$region)##
## C M N S SA
## 600 100 322 718 960
y<-as.numeric(Chile$region)
table(y)## y
## 1 2 3 4 5
## 600 100 322 718 960
Chile$gender[Chile$sex=="F"]<-1
Chile$gender[Chile$sex=="M"]<-2
table(Chile$gender)##
## 1 2
## 1379 1321
table(Chile$sex)##
## F M
## 1379 1321
m<-matrix(c(1:9), nrow=3, ncol=3); m## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
n<-matrix(c(1:6), nrow=3, ncol=2); n## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
m%*%n## [,1] [,2]
## [1,] 30 66
## [2,] 36 81
## [3,] 42 96
diag(m)## [1] 1 5 9
diag(n)## [1] 1 5
t(m)## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
We can replace one of the elements in a matrix with other numbers or strings by specifing the two dimensions:
m## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
m1<-m
m1[2,3]<-0
m1[3,]<-99
m1## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 0
## [3,] 99 99 99
a <- matrix(c(1:20), nrow=2, ncol=10)
b <- matrix(c(1:20), nrow=5, ncol=4)
a; b## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 3 5 7 9 11 13 15 17 19
## [2,] 2 4 6 8 10 12 14 16 18 20
## [,1] [,2] [,3] [,4]
## [1,] 1 6 11 16
## [2,] 2 7 12 17
## [3,] 3 8 13 18
## [4,] 4 9 14 19
## [5,] 5 10 15 20
options(digits=4)
R1<-c(170, 175, 166, 172, 165, 157, 167, 167,
156, 160)
R2<-c("F","M","M","M","F","F","F","F","M","F")
R3<-R1/10 + 42
R123<-data.frame(height=R1,gender=R2,weight=R3); R123## height gender weight
## 1 170 F 59.0
## 2 175 M 59.5
## 3 166 M 58.6
## 4 172 M 59.2
## 5 165 F 58.5
## 6 157 F 57.7
## 7 167 F 58.7
## 8 167 F 58.7
## 9 156 M 57.6
## 10 160 F 58.0
R not to convert it.
class(R123$gender)## [1] "factor"
R123<-data.frame(height=R1,gender=R2,weight=R3, stringsAsFactors = F)
str(R123)## 'data.frame': 10 obs. of 3 variables:
## $ height: num 170 175 166 172 165 157 167 167 156 160
## $ gender: chr "F" "M" "M" "M" ...
## $ weight: num 59 59.5 58.6 59.2 58.5 57.7 58.7 58.7 57.6 58
library(car)
nrow(AMSsurvey)## [1] 24
newsurvey <- AMSsurvey
colnames(newsurvey)<-c("v1","v2","v3", "v4", "v5"); head(newsurvey)## v1 v2 v3 v4 v5
## 1 I(Pu) Male US 132 148
## 2 I(Pu) Female US 35 40
## 3 I(Pr) Male US 87 63
## 4 I(Pr) Female US 20 22
## 5 II Male US 96 161
## 6 II Female US 47 53
Array1 <- array(1:12, dim = c(2, 6, 1)); Array1## , , 1
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 3 5 7 9 11
## [2,] 2 4 6 8 10 12
Array2 <- array(16:1, dim = c(2, 4, 2)); Array2## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 16 14 12 10
## [2,] 15 13 11 9
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 8 6 4 2
## [2,] 7 5 3 1
A12<-Array2[,,2]; A12## [,1] [,2] [,3] [,4]
## [1,] 8 6 4 2
## [2,] 7 5 3 1
A111 <- Array2[1, 1, 1]; A111## [1] 16
options(digits=4)
birthday <- c("1981/03/15", "1983/04/20", "1984/01/18")
listA<-list(R123, m, birthday); listA## [[1]]
## height gender weight
## 1 170 F 59.0
## 2 175 M 59.5
## 3 166 M 58.6
## 4 172 M 59.2
## 5 165 F 58.5
## 6 157 F 57.7
## 7 167 F 58.7
## 8 167 F 58.7
## 9 156 M 57.6
## 10 160 F 58.0
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
##
## [[3]]
## [1] "1981/03/15" "1983/04/20" "1984/01/18"
listA[[3]]## [1] "1981/03/15" "1983/04/20" "1984/01/18"
options(digits=4)
listB<-list(data=R123, vec=m, char=birthday);
listB[["data"]]## height gender weight
## 1 170 F 59.0
## 2 175 M 59.5
## 3 166 M 58.6
## 4 172 M 59.2
## 5 165 F 58.5
## 6 157 F 57.7
## 7 167 F 58.7
## 8 167 F 58.7
## 9 156 M 57.6
## 10 160 F 58.0
Please combine `c('a','b','c')`、`c(1,2,3,4)`以及`c('2018-01-01', '2018-04-04', '2018-04-05', '2018-06-18', '2018-10-10')` as a list. class(Titanic); Titanic## [1] "table"
## , , Age = Child, Survived = No
##
## Sex
## Class Male Female
## 1st 0 0
## 2nd 0 0
## 3rd 35 17
## Crew 0 0
##
## , , Age = Adult, Survived = No
##
## Sex
## Class Male Female
## 1st 118 4
## 2nd 154 13
## 3rd 387 89
## Crew 670 3
##
## , , Age = Child, Survived = Yes
##
## Sex
## Class Male Female
## 1st 5 1
## 2nd 11 13
## 3rd 13 14
## Crew 0 0
##
## , , Age = Adult, Survived = Yes
##
## Sex
## Class Male Female
## 1st 57 140
## 2nd 14 80
## 3rd 75 76
## Crew 192 20
Titanic[, , 1, 1]## Sex
## Class Male Female
## 1st 0 0
## 2nd 0 0
## 3rd 35 17
## Crew 0 0
options(digits=4)
g<-Titanic[ , , 2, 2]; g## Sex
## Class Male Female
## 1st 57 140
## 2nd 14 80
## 3rd 75 76
## Crew 192 20
prop.table(g, 1)## Sex
## Class Male Female
## 1st 0.28934 0.71066
## 2nd 0.14894 0.85106
## 3rd 0.49669 0.50331
## Crew 0.90566 0.09434
prop.table(g, 2)## Sex
## Class Male Female
## 1st 0.16864 0.44304
## 2nd 0.04142 0.25316
## 3rd 0.22189 0.24051
## Crew 0.56805 0.06329
margin.table(g,1)## Class
## 1st 2nd 3rd Crew
## 197 94 151 212
margin.table(g,2)## Sex
## Male Female
## 338 316
X<-c(10,20,30,40,50,60); Sca<-10
X+Sca## [1] 20 30 40 50 60 70
X/Sca## [1] 1 2 3 4 5 6
Y<-c(5,10,6,8,25,6)
X/Y; X*Y## [1] 2 2 5 5 2 10
## [1] 50 200 180 320 1250 360
Please see more details here
a=6
exp(a); log(a)## [1] 403.4
## [1] 1.792
log(exp(a)); exp(log(a))## [1] 6
## [1] 6
1*2*3## [1] 6
factorial(3)## [1] 6
Please see more details here
round:rounds the values in its first argument to the specified number of decimal placesfloor:takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x.ceiling: takes a single numeric argument x and returns a numeric vector containing the smallest integers not less than the corresponding elements of xa1<-c(2.54, 3.111, 10.999)
round(a1, digits=2)## [1] 2.54 3.11 11.00
floor(a1)## [1] 2 3 10
ceiling(a1)## [1] 3 4 11
\(\text{log}(\frac{14}{5})=\)?
\(1\times 2\times 3\times , \dots ,\times 8=\)?
About Titanic data, please show which coach that the survived children were taking.
Please use
How many observations that has mpg greater or equal to 21 in mtcars?
Please analyze if admission is related to gender in A department in
Please count the number of letters in English.
Please transform today’s temperature to Fahrenheit.
Please create a matrix with a diagnol (1,1,1)
Please count the number of days between when you are doing assignment and Jan. 1st.