We will introduce the data type and data structure of R
. For example, we use c()
to denote a set of numbers and strings.
A<-c("Taipei City","New Taipei City", "Taoyuan City", "Taichung City","Tainan City","Kaohsiung City")
print(A)
## [1] "Taipei City" "New Taipei City" "Taoyuan City" "Taichung City"
## [5] "Tainan City" "Kaohsiung City"
B<-c(0,1,2,3,4,5,6,7,8,9)
print(B)
## [1] 0 1 2 3 4 5 6 7 8 9
R
?R
is a free software for data analysis and statistical computing. It can be download from here. It has a base package and allows users to download packages from a specific repository or the “Comprehensive R Archive Network” (CRAN).
Please make sure that you download the right version compatible with your system.
RStudio
is an IDE that expands the functions of R
. For example, you can embed R
codes in a html or \(\LaTeX\) document. You can also import different kinds of datasets easily. Notice that it cannot run without R
.
Numeric data can be numeric or integer. For example:
x<-c(2, 4, 6, 8); x
## [1] 2 4 6 8
class(x)
## [1] "numeric"
The function
We can also use scientific notations to present the numeric data:
y=c(1.1e+06); y
## [1] 1100000
class(y)
## [1] "numeric"
Integer is a subset of numbers. In a 32-bit operation system, the maximum integer is 2147483647,far smaller than the maximum of number. Every number is composed of significan (\(+\-\)), mantissa and exponent to be computed, in particular the number with a decimal. For example:
u<-as.integer(c(4)); class(u)
## [1] "integer"
Another example:
a=1.356e+3
is.numeric(a)
## [1] TRUE
is.integer(a)
## [1] FALSE
options(digits=20)
pi
## [1] 3.141592653589793116
is.numeric(pi); is.integer(pi)
## [1] TRUE
## [1] FALSE
Integer can be added, substracted, multiplied, and divided like number.
J<-c(3, 6, 33)
J<-as.integer(J)
J-1; J*2; J/3
## [1] 2 5 32
## [1] 6 12 66
## [1] 1 2 11
Charaters are easy to understand, espcially for data visualization. For example:
library(lattice)
s77<-data.frame(state.x77)
s77<-s77[order(s77$Population,decreasing=T),]
dotchart(s77$Population,
labels=row.names(s77), pch=16,cex=.7, xlab='Population')
Here we transform the matrix state.x77 to be a data frame, and then we draw a lattice
package.
numbers can be treated as characters:
char1<-c("1","2", "Do", "Re", "Mi"); char1
## [1] "1" "2" "Do" "Re" "Mi"
Can we convert characters to numbers? No. But we can convert it to factor, then converting it to numbers.
LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
LETTERS.f <- as.factor(LETTERS)
as.numeric(LETTERS.f)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26
Please read the councilor data. Notice that we can encode the data if it is not UTF-8.
library(foreign)
df <- read.csv('councilor.csv', header=T, fileEncoding = 'BIG5',
colClasses = "character")
df
## Year budget unit contracter open
## 1 2015 676 Hydraulic Engineering Office 台球 Yes
## 2 2016 673 New Construction Office 茂盛 Yes
## 3 2016 270 New Construction Office 冠君 Yes
## 4 2016 255 New Construction Office 金煌 Yes
## 5 2016 235 New Construction Office 聖鋒 Yes
## 6 2016 190 New Construction Office 福呈 No
## 7 2015 155 Park and Stree Light Office 盛吉 Yes
## 8 2016 154 New Construction Office 茂盛 Yes
## 9 2016 142 New Construction Office 冠君 Yes
## 10 2016 123 New Construction Office 未發包 Yes
as.numeric(df$unit)
## [1] NA NA NA NA NA NA NA NA NA NA
To make the conversion easier, we change the arguments
library(foreign)
df <- read.csv('councilor.csv', header=T, fileEncoding = 'BIG5',
colClasses = "factor")
df
## Year budget unit contracter open
## 1 2015 676 Hydraulic Engineering Office 台球 Yes
## 2 2016 673 New Construction Office 茂盛 Yes
## 3 2016 270 New Construction Office 冠君 Yes
## 4 2016 255 New Construction Office 金煌 Yes
## 5 2016 235 New Construction Office 聖鋒 Yes
## 6 2016 190 New Construction Office 福呈 No
## 7 2015 155 Park and Stree Light Office 盛吉 Yes
## 8 2016 154 New Construction Office 茂盛 Yes
## 9 2016 142 New Construction Office 冠君 Yes
## 10 2016 123 New Construction Office 未發包 Yes
as.numeric(df$unit)
## [1] 1 2 2 2 2 2 3 2 2 2
s77$ok<-s77$Income>5000; s77$ok
## [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [23] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [45] TRUE TRUE FALSE FALSE FALSE TRUE
options(digits=3)
s77[s77$ok, ]
## Population Income Illiteracy Life.Exp Murder HS.Grad Frost
## California 21198 5114 1.1 71.7 10.3 62.6 20
## Illinois 11197 5107 0.9 70.1 10.3 52.6 127
## New Jersey 7333 5237 1.1 70.9 5.2 52.5 115
## Maryland 4122 5299 0.9 70.2 8.5 52.3 101
## Connecticut 3100 5348 1.1 72.5 3.1 56.0 139
## North Dakota 637 5087 0.8 72.8 1.4 50.3 186
## Nevada 590 5149 0.5 69.0 11.5 65.2 188
## Alaska 365 6315 1.5 69.3 11.3 66.7 152
## Area ok
## California 156361 TRUE
## Illinois 55748 TRUE
## New Jersey 7521 TRUE
## Maryland 9891 TRUE
## Connecticut 4862 TRUE
## North Dakota 69273 TRUE
## Nevada 109889 TRUE
## Alaska 566432 TRUE
Please try to run the following codes:
library(car)
head(Duncan)
income.hi<-Duncan$income>70
Duncan[income.hi, ]
How many “bc” are left in your dataset?
Sys.Date()
v<-c("2/27/2018", "6/26/2018", "12/31/2018"); class(v)
## [1] "character"
v.date<-as.Date(v, format='%m/%d/%Y'); class(v.date)
## [1] "Date"
v.date
## [1] "2018-02-27" "2018-06-26" "2018-12-31"
v.date <-format(v.date, "%b. %d, %Y"); v.date
## [1] " 2. 27, 2018" " 6. 26, 2018" "12. 31, 2018"
Or
v<-c("", "6/26/2018", "12/31/2018")
as.Date(v, format='%m/%d/%Y')
## [1] NA "2018-06-26" "2018-12-31"
S <- c(50, 100)
as.Date(S, origin="2018-01-01")
## [1] "2018-02-20" "2018-04-11"
library(dplyr)
df <- tibble(date=c("2016", "2017", "2018"),
students=c(20, 22, 18),
teachers=c(12, 13, 20))
df
## # A tibble: 3 x 3
## date students teachers
## <chr> <dbl> <dbl>
## 1 2016 20. 12.
## 2 2017 22. 13.
## 3 2018 18. 20.
library(reshape2); library(ggplot2)
df2 <- melt(df, id.vars="date", variable.name="Group")
df2$Group <-as.factor(df2$Group)
df2
## date Group value
## 1 2016 students 20
## 2 2017 students 22
## 3 2018 students 18
## 4 2016 teachers 12
## 5 2017 teachers 13
## 6 2018 teachers 20
ggplot(df2, aes(x=date, y=value, col=Group)) +
geom_line( size=1) +
geom_point(shape=6, size=3)
library(reshape2); library(ggplot2)
df2 <- melt(df, id.vars="date", variable.name="Group")
df2$Group <-as.factor(df2$Group)
df2$date <- as.Date(df2$date, format="%Y")
df2
## date Group value
## 1 2016-07-05 students 20
## 2 2017-07-05 students 22
## 3 2018-07-05 students 18
## 4 2016-07-05 teachers 12
## 5 2017-07-05 teachers 13
## 6 2018-07-05 teachers 20
ggplot(df2, aes(x=date, y=value, col=Group)) +
geom_line( size=1) +
geom_point(shape=16, size=3)
library(dplyr)
df <- tibble(date=c("2018/07/11", "2018/07/12", "2018/07/13"),
students=c(20, 22, 18),
teachers=c(2, 3, 4))
df
## # A tibble: 3 x 3
## date students teachers
## <chr> <dbl> <dbl>
## 1 2018/07/11 20. 2.
## 2 2018/07/12 22. 3.
## 3 2018/07/13 18. 4.
library(reshape2); library(ggplot2)
df2 <- melt(df, id.vars="date", variable.name="Group")
df2$Group <-as.factor(df2$Group)
df2$date <- as.Date(df2$date, format="%Y/%m/%d")
df2
## date Group value
## 1 2018-07-11 students 20
## 2 2018-07-12 students 22
## 3 2018-07-13 students 18
## 4 2018-07-11 teachers 2
## 5 2018-07-12 teachers 3
## 6 2018-07-13 teachers 4
ggplot(df2, aes(x=date, y=value, col=Group)) +
geom_line( size=1) +
geom_point(shape=16, size=3) +
scale_x_date(date_labels = "%Y/%m/%d")
xi<-"1953-06-15" #Xi's birthday
tsai<-"1956-08-31" #Tsai's birthday
as.Date(c(xi,tsai))
## [1] "1953-06-15" "1956-08-31"
difftime(tsai, xi)
## Time difference of 1173 days
symbol | meaing | example |
---|---|---|
%d | date as number | 01-31 |
%a | abbreviated weekday | Mon |
%A | unabbreviated weekday | Monday |
%m | month as number | 01-12 |
%b | abbreviate month | Jan |
%B | unabbreviate month | January |
%y | 2-digit year | 18 |
%Y | 4-digit year | 2018 |
Today<-Sys.Date(); Today
## [1] "2018-07-05"
to_day<-format(Today, format='%Y-%b-%d'); to_day
## [1] "2018- 7-05"
to_day<-format(Today, format='%m %d (%a), %y'); to_day
## [1] "07 05 (四), 18"
example<-c(0,1,2,3,4)
print(example)
## [1] 0 1 2 3 4
c(2,4,6,8)->A
c(letters)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
c(LETTERS)
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
j<-c(2*2,2*9, 10-2, sqrt(25)); j
## [1] 4 18 8 5
R<-c(30, 40, 50); A <- R/5; A; A^2
## [1] 6 8 10
## [1] 36 64 100
attr(mtcars$wt, "myattribute") <-"Weights"
attr(mtcars$wt, "mylabel") <-"Weights of Cars"
attr(mtcars$wt, "myattribute")
## [1] "Weights"
attr(mtcars$wt, "mylabel")
## [1] "Weights of Cars"
attributes(mtcars$wt)
## $myattribute
## [1] "Weights"
##
## $mylabel
## [1] "Weights of Cars"
U <- c(60, 70, 75, 82)
attr(U, "myattribute")<-"humanity"
names(U) <-c("Taichung", "Chiayi", "Tainan", "Kaohsiung")
U
## Taichung Chiayi Tainan Kaohsiung
## 60 70 75 82
## attr(,"myattribute")
## [1] "humanity"
barchart(U)
RA <- c(R, A)
RA
## [1] 30 40 50 6 8 10
M1 <- c(1:3); M1
## [1] 1 2 3
M2 <- rep(10, 4); M2
## [1] 10 10 10 10
M1 + M2
## [1] 11 12 13 11
Data can have categories, which look like characters.
library(ggplot2)
class(diamonds$cut)
## [1] "ordered" "factor"
table(diamonds$cut)
##
## Fair Good Very Good Premium Ideal
## 1610 4906 12082 13791 21551
H<-c("Hi", "Lo", "Lo", "Middle", "Middle", "Middle")
table(H)
## H
## Hi Lo Middle
## 1 2 3
H.o <- ordered(H, levels=c("Lo", "middle", "Hi"))
table(H.o)
## H.o
## Lo middle Hi
## 2 0 1
table(diamonds$cut, diamonds$color)
##
## D E F G H I J
## Fair 163 224 312 314 303 175 119
## Good 662 933 909 871 702 522 307
## Very Good 1513 2400 2164 2299 1824 1204 678
## Premium 1603 2337 2331 2924 2360 1428 808
## Ideal 2834 3903 3826 4884 3115 2093 896
library(car)
library(lattice)
plot( Chile$vote ~ Chile$sex, xlab="Sex", ylab="Vote")
library(car)
data(Chile)
class(Chile$region)
## [1] "factor"
table(Chile$region)
##
## C M N S SA
## 600 100 322 718 960
y<-as.numeric(Chile$region)
table(y)
## y
## 1 2 3 4 5
## 600 100 322 718 960
Chile$gender[Chile$sex=="F"]<-1
Chile$gender[Chile$sex=="M"]<-2
table(Chile$gender)
##
## 1 2
## 1379 1321
table(Chile$sex)
##
## F M
## 1379 1321
m<-matrix(c(1:9), nrow=3, ncol=3); m
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
n<-matrix(c(1:6), nrow=3, ncol=2); n
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
m%*%n
## [,1] [,2]
## [1,] 30 66
## [2,] 36 81
## [3,] 42 96
diag(m)
## [1] 1 5 9
diag(n)
## [1] 1 5
t(m)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
We can replace one of the elements in a matrix with other numbers or strings by specifing the two dimensions:
m
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
m1<-m
m1[2,3]<-0
m1[3,]<-99
m1
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 0
## [3,] 99 99 99
a <- matrix(c(1:20), nrow=2, ncol=10)
b <- matrix(c(1:20), nrow=5, ncol=4)
a; b
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 3 5 7 9 11 13 15 17 19
## [2,] 2 4 6 8 10 12 14 16 18 20
## [,1] [,2] [,3] [,4]
## [1,] 1 6 11 16
## [2,] 2 7 12 17
## [3,] 3 8 13 18
## [4,] 4 9 14 19
## [5,] 5 10 15 20
options(digits=4)
R1<-c(170, 175, 166, 172, 165, 157, 167, 167,
156, 160)
R2<-c("F","M","M","M","F","F","F","F","M","F")
R3<-R1/10 + 42
R123<-data.frame(height=R1,gender=R2,weight=R3); R123
## height gender weight
## 1 170 F 59.0
## 2 175 M 59.5
## 3 166 M 58.6
## 4 172 M 59.2
## 5 165 F 58.5
## 6 157 F 57.7
## 7 167 F 58.7
## 8 167 F 58.7
## 9 156 M 57.6
## 10 160 F 58.0
R
not to convert it.
class(R123$gender)
## [1] "factor"
R123<-data.frame(height=R1,gender=R2,weight=R3, stringsAsFactors = F)
str(R123)
## 'data.frame': 10 obs. of 3 variables:
## $ height: num 170 175 166 172 165 157 167 167 156 160
## $ gender: chr "F" "M" "M" "M" ...
## $ weight: num 59 59.5 58.6 59.2 58.5 57.7 58.7 58.7 57.6 58
library(car)
nrow(AMSsurvey)
## [1] 24
newsurvey <- AMSsurvey
colnames(newsurvey)<-c("v1","v2","v3", "v4", "v5"); head(newsurvey)
## v1 v2 v3 v4 v5
## 1 I(Pu) Male US 132 148
## 2 I(Pu) Female US 35 40
## 3 I(Pr) Male US 87 63
## 4 I(Pr) Female US 20 22
## 5 II Male US 96 161
## 6 II Female US 47 53
Array1 <- array(1:12, dim = c(2, 6, 1)); Array1
## , , 1
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 3 5 7 9 11
## [2,] 2 4 6 8 10 12
Array2 <- array(16:1, dim = c(2, 4, 2)); Array2
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 16 14 12 10
## [2,] 15 13 11 9
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 8 6 4 2
## [2,] 7 5 3 1
A12<-Array2[,,2]; A12
## [,1] [,2] [,3] [,4]
## [1,] 8 6 4 2
## [2,] 7 5 3 1
A111 <- Array2[1, 1, 1]; A111
## [1] 16
options(digits=4)
birthday <- c("1981/03/15", "1983/04/20", "1984/01/18")
listA<-list(R123, m, birthday); listA
## [[1]]
## height gender weight
## 1 170 F 59.0
## 2 175 M 59.5
## 3 166 M 58.6
## 4 172 M 59.2
## 5 165 F 58.5
## 6 157 F 57.7
## 7 167 F 58.7
## 8 167 F 58.7
## 9 156 M 57.6
## 10 160 F 58.0
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
##
## [[3]]
## [1] "1981/03/15" "1983/04/20" "1984/01/18"
listA[[3]]
## [1] "1981/03/15" "1983/04/20" "1984/01/18"
options(digits=4)
listB<-list(data=R123, vec=m, char=birthday);
listB[["data"]]
## height gender weight
## 1 170 F 59.0
## 2 175 M 59.5
## 3 166 M 58.6
## 4 172 M 59.2
## 5 165 F 58.5
## 6 157 F 57.7
## 7 167 F 58.7
## 8 167 F 58.7
## 9 156 M 57.6
## 10 160 F 58.0
Please combine `c('a','b','c')`、`c(1,2,3,4)`以及`c('2018-01-01', '2018-04-04', '2018-04-05', '2018-06-18', '2018-10-10')` as a list.
class(Titanic); Titanic
## [1] "table"
## , , Age = Child, Survived = No
##
## Sex
## Class Male Female
## 1st 0 0
## 2nd 0 0
## 3rd 35 17
## Crew 0 0
##
## , , Age = Adult, Survived = No
##
## Sex
## Class Male Female
## 1st 118 4
## 2nd 154 13
## 3rd 387 89
## Crew 670 3
##
## , , Age = Child, Survived = Yes
##
## Sex
## Class Male Female
## 1st 5 1
## 2nd 11 13
## 3rd 13 14
## Crew 0 0
##
## , , Age = Adult, Survived = Yes
##
## Sex
## Class Male Female
## 1st 57 140
## 2nd 14 80
## 3rd 75 76
## Crew 192 20
Titanic[, , 1, 1]
## Sex
## Class Male Female
## 1st 0 0
## 2nd 0 0
## 3rd 35 17
## Crew 0 0
options(digits=4)
g<-Titanic[ , , 2, 2]; g
## Sex
## Class Male Female
## 1st 57 140
## 2nd 14 80
## 3rd 75 76
## Crew 192 20
prop.table(g, 1)
## Sex
## Class Male Female
## 1st 0.28934 0.71066
## 2nd 0.14894 0.85106
## 3rd 0.49669 0.50331
## Crew 0.90566 0.09434
prop.table(g, 2)
## Sex
## Class Male Female
## 1st 0.16864 0.44304
## 2nd 0.04142 0.25316
## 3rd 0.22189 0.24051
## Crew 0.56805 0.06329
margin.table(g,1)
## Class
## 1st 2nd 3rd Crew
## 197 94 151 212
margin.table(g,2)
## Sex
## Male Female
## 338 316
X<-c(10,20,30,40,50,60); Sca<-10
X+Sca
## [1] 20 30 40 50 60 70
X/Sca
## [1] 1 2 3 4 5 6
Y<-c(5,10,6,8,25,6)
X/Y; X*Y
## [1] 2 2 5 5 2 10
## [1] 50 200 180 320 1250 360
Please see more details here
a=6
exp(a); log(a)
## [1] 403.4
## [1] 1.792
log(exp(a)); exp(log(a))
## [1] 6
## [1] 6
1*2*3
## [1] 6
factorial(3)
## [1] 6
Please see more details here
round
:rounds the values in its first argument to the specified number of decimal placesfloor
:takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x.ceiling
: takes a single numeric argument x and returns a numeric vector containing the smallest integers not less than the corresponding elements of xa1<-c(2.54, 3.111, 10.999)
round(a1, digits=2)
## [1] 2.54 3.11 11.00
floor(a1)
## [1] 2 3 10
ceiling(a1)
## [1] 3 4 11
\(\text{log}(\frac{14}{5})=\)?
\(1\times 2\times 3\times , \dots ,\times 8=\)?
About Titanic data, please show which coach that the survived children were taking.
Please use
How many observations that has mpg greater or equal to 21 in mtcars
?
Please analyze if admission is related to gender in A department in
Please count the number of letters in English.
Please transform today’s temperature to Fahrenheit.
Please create a matrix with a diagnol (1,1,1)
Please count the number of days between when you are doing assignment and Jan. 1st.