Exercise 1

Split the ChickWeight{datasets} data by individual chicks to extract separate slope estimates of regressing weight onto Time for each chick.

library(dplyr)
library(MASS)
df1<-ChickWeight
str(df1)

## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   578 obs. of  4 variables:
##  $ weight: num  42 51 59 64 76 93 106 125 149 171 ...
##  $ Time  : num  0 2 4 6 8 10 12 14 16 18 ...
##  $ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ...
##  $ Diet  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "formula")=Class 'formula'  language weight ~ Time | Chick
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "outer")=Class 'formula'  language ~Diet
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Time"
##   ..$ y: chr "Body weight"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "(days)"
##   ..$ y: chr "(gm)"

df1.r<-df1%>%split(., .$Chick)%>%lapply(., function(x) coef(lm(x$Time~x$weight)))%>%
  do.call(rbind, .) # bind the list by row
head(df1.r)

##    (Intercept)   x$weight
## 18   19.500000 -0.5000000
## 16  -27.825371  0.6803954
## 15  -18.408015  0.4225865
## 13  -17.631950  0.4208641
## 9   -14.574242  0.3140564
## 20   -9.888271  0.2653127

Exercise 2

Explain what does this statement do:“lapply(lapply(search(), ls), length)”

# using ? to get description of functions 
?ls
?search()
lapply(lapply(search(), ls), length)

## [[1]]
## [1] 2
## 
## [[2]]
## [1] 165
## 
## [[3]]
## [1] 267
## 
## [[4]]
## [1] 448
## 
## [[5]]
## [1] 87
## 
## [[6]]
## [1] 110
## 
## [[7]]
## [1] 245
## 
## [[8]]
## [1] 104
## 
## [[9]]
## [1] 218
## 
## [[10]]
## [1] 0
## 
## [[11]]
## [1] 1232

search():Gives a list of attached packages (see library), and R objects, usually data.frames.
ls: list objectives
lapply(lapply(search(), ls), length): to get lists of length of each attached packages and R objects

Exercise 3

The following R script uses Cushings{MASS} to demonstrates several ways to achieve the same objective in R. Explain the advantages or disadvantages of each method. The following synax aims to get the mean of Tetrahydrocortisone and Pregnanetriol in each type using dataset Cushings

library(MASS)
library(tidyverse)

# method 1

Method 1. using aggregate.

baseR, easy to understand and learn, the result can be saved as dataframe directly. It will return the means of two variables in dataset Cushings of each type by row. It is suitable for further data explantory analysis while using ggplot 4 row, 3 column

# method 2 
m1<-aggregate( . ~ Type, data = Cushings, mean)  
str(m1)

## 'data.frame':    4 obs. of  3 variables:
##  $ Type               : Factor w/ 4 levels "a","b","c","u": 1 2 3 4
##  $ Tetrahydrocortisone: num  2.97 8.18 19.72 14.02
##  $ Pregnanetriol      : num  2.44 1.12 5.5 1.2

dim(m1)

## [1] 4 3

Method 2. using `split` and the `apply family` function .

split data by type then apply the mean function to each list by column then return as a vector It’s easy to understand. It allows to using user-defined function in the function() sapply. 2 row, 4 column

# method 2 
m2<-sapply(split(Cushings[,-3], Cushings$Type), function(x) apply(x, 2, mean))  
str(m2)

##  num [1:2, 1:4] 2.97 2.44 8.18 1.12 19.72 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:2] "Tetrahydrocortisone" "Pregnanetriol"
##   ..$ : chr [1:4] "a" "b" "c" "u"

dim(m2)

## [1] 2 4

Method 3. using `subset`, `rbind`, `rbind` function.

using rbind to bind list by row. It is similar to but much complex than method 2 4 row, 2 column

# method 3 
do.call("rbind", as.list(
  by(Cushings, list(Cushings$Type), function(x) {
    y <- subset(x, select =  -Type)
    apply(y, 2, mean)
  }
  )))

##   Tetrahydrocortisone Pregnanetriol
## a            2.966667          2.44
## b            8.180000          1.12
## c           19.720000          5.50
## u           14.016667          1.20

Method 4. using forward pipe and `dplyr` package

first to group_by type then summarize the meand of tow other variables as t_m and p_m.
This method has better readability than method 2, 3 and 5. Result will be saved as tibble, data.frame and table. 4 row, 3 column

# method 4 
Cushings %>% group_by(Type) %>% summarize( t_m = mean(Tetrahydrocortisone), p_m = mean(Pregnanetriol))

## # A tibble: 4 x 3
##   Type    t_m   p_m
##   <fct> <dbl> <dbl>
## 1 a      2.97  2.44
## 2 b      8.18  1.12
## 3 c     19.7   5.5 
## 4 u     14.0   1.2

Method 5. using forward pipe and `nest`, `map` function

nest function can create list-column dataframe contain all the nested variables. apply the function to each element by using map, return a list same length as dataset.map_dbl to return double vector. It return Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’ and 4 row, 5 column.
*avg is a list of mean for each variable in each group. res_1 is mean of Tetrahydrocortisone in each group, saved as vector and res_2 is mean of Pregnanetriol in each group, also saved as vector.

# method 5 
Cushings %>% nest(-Type) %>% mutate(avg = map(data, ~ apply(., 2, mean)), res_1 = map_dbl(avg, "Tetrahydrocortisone"), res_2 = map_dbl(avg, "Pregnanetriol"))

## Warning: All elements of `...` must be named.
## Did you want `data = c(Tetrahydrocortisone, Pregnanetriol)`?

## # A tibble: 4 x 5
##   Type  data              avg       res_1 res_2
##   <fct> <list>            <list>    <dbl> <dbl>
## 1 a     <tibble [6 x 2]>  <dbl [2]>  2.97  2.44
## 2 b     <tibble [10 x 2]> <dbl [2]>  8.18  1.12
## 3 c     <tibble [5 x 2]>  <dbl [2]> 19.7   5.5 
## 4 u     <tibble [6 x 2]>  <dbl [2]> 14.0   1.2

Conclusion
Readability: Method 1=Method 4>Method 2>Method 3= Method 5.

Exercise 4.

Go through the script in the NZ schools example and provide comments to each code chunk indicated by ‘##’. Give alternative code to perform the same calculation where appropriate.

# 
# a case study 
# 
## keep the school names with white spaces 
dta <- read.csv("C:/Users/USER/Desktop/R_data management/0427/nzSchools.csv", as.is=2) 

## check data structure 
str(dta)

## 'data.frame':    2571 obs. of  6 variables:
##  $ ID  : int  1015 1052 1062 1092 1130 1018 1029 1030 1588 1154 ...
##  $ Name: chr  "Hora Hora School" "Morningside School" "Onerahi School" "Raurimu Avenue School" ...
##  $ City: Factor w/ 541 levels "Ahaura","Ahipara",..: 533 533 533 533 533 533 533 533 533 533 ...
##  $ Auth: Factor w/ 4 levels "Other","Private",..: 3 3 3 3 3 3 3 3 4 3 ...
##  $ Dec : int  2 3 4 2 4 8 5 5 6 1 ...
##  $ Roll: int  318 200 455 86 577 329 637 395 438 201 ...

## data dimension 
dim(dta)

## [1] 2571    6

## binning 
## create a variable size value of roll > median of roll then size="Large", others size="Small"  
dta$Size <- ifelse(dta$Roll > median(dta$Roll), "Large", "Small") 
## assign size=null
dta$Size <- NULL 
## show first 6 rows of dta
head(dta)

##     ID                  Name      City  Auth Dec Roll
## 1 1015      Hora Hora School Whangarei State   2  318
## 2 1052    Morningside School Whangarei State   3  200
## 3 1062        Onerahi School Whangarei State   4  455
## 4 1092 Raurimu Avenue School Whangarei State   2   86
## 5 1130      Whangarei School Whangarei State   4  577
## 6 1018       Hurupaki School Whangarei State   8  329

## using cut function to divide variable Roll to 3 parts and give lables
dta$Size <- cut(dta$Roll, 3, labels=c("Small", "Mediam", "Large")) 
## summarize the count of each category of variable Size
table(dta$Size) #----(1)

## 
##  Small Mediam  Large 
##   2555     15      1

## alternative (1)
 dta%>%group_by(Size)%>%summarise(n=n())

## # A tibble: 3 x 2
##   Size       n
##   <fct>  <int>
## 1 Small   2555
## 2 Mediam    15
## 3 Large      1

## sorting 

 ## create new variable RollOrd, show the ranking of roll by decreasing
dta$RollOrd <- order(dta$Roll, decreasing=T) 
## show first 6 rows  with higher ranking of roll   
head(dta[dta$RollOrd, ])

##       ID                  Name         City  Auth Dec Roll   Size RollOrd
## 1726 498 Correspondence School   Wellington State  NA 5546  Large     753
## 301   28     Rangitoto College     Auckland State  10 3022 Mediam     353
## 376   78      Avondale College     Auckland State   4 2613 Mediam     712
## 2307 319  Burnside High School Christchurch State   8 2588 Mediam     709
## 615   41      Macleans College     Auckland State  10 2476 Mediam    1915
## 199   43    Massey High School     Auckland State   5 2452 Mediam    1683

## show last 6 rows with higher ranking of roll 
tail(dta[dta$RollOrd, ])

##        ID                    Name                  City    Auth Dec Roll  Size
## 2401 1641  Amana Christian School               Dunedin Private   9    7 Small
## 1590 2461       Tangimoana School              Manawatu   State   4    6 Small
## 1996 3598         Woodbank School              Kaikoura   State   4    6 Small
## 2112 3386     Jacobs River School          Jacobs River   State   5    6 Small
## 1514 2407     Ngamatapouri School Sth Taranaki District   State   9    5 Small
## 1575 2420 Papanui Junction School               Taihape   State   5    5 Small
##      RollOrd
## 2401    2562
## 1590     266
## 1996    2478
## 2112    1501
## 1514    2377
## 1575    1542

## show first 6 rows with  higher ranking of city and  roll
head(dta[order(dta$City, dta$Roll, decreasing=T), ])

##        ID                      Name      City  Auth Dec Roll  Size RollOrd
## 2548  401           Menzies College   Wyndham State   4  356 Small     859
## 2549 4054            Wyndham School   Wyndham State   5   94 Small    1163
## 1611 2742          Woodville School Woodville State   3  147 Small     726
## 1630 2640           Papatawa School Woodville State   7   27 Small    2273
## 2041 3600            Woodend School   Woodend State   9  375 Small    1401
## 1601  399 Central Southland College    Winton State   7  549 Small     450

## show last 6 rows with  higher ranking of city and  roll
tail(dta[order(dta$City, dta$Roll, decreasing=T), ])

##        ID                         Name    City  Auth Dec Roll  Size RollOrd
## 2169 3273                Albury School  Albury State   8   30 Small    1010
## 2018  350           Akaroa Area School  Akaroa State   8  125 Small    1051
## 2023 3332           Duvauchelle School  Akaroa State   9   41 Small     749
## 335  1200                Ahuroa School  Ahuroa State   7   22 Small     193
## 99   1000               Ahipara School Ahipara State   3  241 Small    1963
## 2117 2105 Awahono School - Grey Valley  Ahaura State   4  119 Small     364

## counting 
## 
table(dta$Auth) #----(2)

## 
##            Other          Private            State State Integrated 
##                1               99             2144              327

## Alternative (2)
count(dta, Auth)

## # A tibble: 4 x 2
##   Auth                 n
##   <fct>            <int>
## 1 Other                1
## 2 Private             99
## 3 State             2144
## 4 State Integrated   327

## save the numbers of each category in the authtbl
authtbl <- table(dta$Auth); authtbl

## 
##            Other          Private            State State Integrated 
##                1               99             2144              327

## check the class of authbl: a table
class(authtbl)

## [1] "table"

## show the row while Auth="Other"  
dta[dta$Auth == "Other", ]

##       ID            Name         City  Auth Dec Roll  Size RollOrd
## 2315 518 Kingslea School Christchurch Other   1   51 Small    1579

## cross table of Auth and Dec
xtabs(~ Auth + Dec, data=dta) #---(3)

##                   Dec
## Auth                 1   2   3   4   5   6   7   8   9  10
##   Other              1   0   0   0   0   0   0   0   0   0
##   Private            0   0   2   6   2   2   6  11  12  38
##   State            259 230 208 219 214 215 188 200 205 205
##   State Integrated  12  22  35  28  38  34  45  45  37  31

## Alternative (3)
table(dta$Auth, dta$Dec )

##                   
##                      1   2   3   4   5   6   7   8   9  10
##   Other              1   0   0   0   0   0   0   0   0   0
##   Private            0   0   2   6   2   2   6  11  12  38
##   State            259 230 208 219 214 215 188 200 205 205
##   State Integrated  12  22  35  28  38  34  45  45  37  31

## aggregating 
## average of Roll
mean(dta$Roll) #---(4)

## [1] 295.4737

## Alternative (4)
sum(dta$Roll)/length(dta$Roll)

## [1] 295.4737

##  average of roll in Auth="Private"
mean(dta$Roll[dta$Auth == "Private"]) #---(5)

## [1] 308.798

## Alternative (5)
dta%>%subset(., Auth=="Private")%>%summarise(mean=mean(Roll))

##      mean
## 1 308.798

# average of roll in each category of Auth
aggregate(dta["Roll"], by=list(dta$Auth), FUN=mean)  #--(6)

##            Group.1     Roll
## 1            Other  51.0000
## 2          Private 308.7980
## 3            State 300.6301
## 4 State Integrated 258.3792

## Alternative (6)
dta%>% group_by(Auth)%>%summarise(mu=mean(Roll))

## # A tibble: 4 x 2
##   Auth                mu
##   <fct>            <dbl>
## 1 Other              51 
## 2 Private           309.
## 3 State             301.
## 4 State Integrated  258.

## create new variable Rich if Dec> 5 than Rich=True, else Rich=FALSE
dta$Rich <- dta$Dec > 5; 
# dta$Rich
head(dta$Rich)

## [1] FALSE FALSE FALSE FALSE FALSE  TRUE

## average of roll in each category of Auth and rich
aggregate(dta["Roll"], by=list(dta$Auth, dta$Rich), FUN=mean) #--(7)

##            Group.1 Group.2     Roll
## 1            Other   FALSE  51.0000
## 2          Private   FALSE 151.4000
## 3            State   FALSE 261.7487
## 4 State Integrated   FALSE 183.2370
## 5          Private    TRUE 402.5362
## 6            State    TRUE 338.8243
## 7 State Integrated    TRUE 311.2135

## Alternative (7)
nn=xtabs(~Auth+Rich,dta)
xtabs(Roll~Auth+Rich,dta)/nn

##                   Rich
## Auth                  FALSE     TRUE
##   Other             51.0000         
##   Private          151.4000 402.5362
##   State            261.7487 338.8243
##   State Integrated 183.2370 311.2135

## to find the first and the last element position in variable Roll in each category of Auth 
by(dta["Roll"], INDICES=list(dta$Auth), FUN=range) #--(8)

## : Other
## [1] 51 51
## ------------------------------------------------------------ 
## : Private
## [1]    7 1663
## ------------------------------------------------------------ 
## : State
## [1]    5 5546
## ------------------------------------------------------------ 
## : State Integrated
## [1]   18 1475

# Alternative (8)
aggregate(dta["Roll"], by=list(dta$Auth
                               ), FUN=range)

##            Group.1 Roll.1 Roll.2
## 1            Other     51     51
## 2          Private      7   1663
## 3            State      5   5546
## 4 State Integrated     18   1475

###

Exercise 5

Go through the script in the NCEA 2007 example and provide comments to each code chunk indicated by ‘##’. Give alternative code to perform the same calculation where appropriate.

# 
# a case study - II 
# 
## 
dta2 <- read.table("C:/Users/USER/Desktop/R_data management/0427/NCEA2007.txt", sep=":", quote="", h=T, as.is=T) 
##  check data dimension 
dim(dta2)

## [1] 88  4

## check data structure
str(dta2)

## 'data.frame':    88 obs. of  4 variables:
##  $ Name  : chr  "Al-Madinah School" "Alfriston College" "Ambury Park Centre for Riding Therapy" "Aorere College" ...
##  $ Level1: num  61.5 53.9 33.3 39.5 71.2 22.1 50.8 57.3 89.3 59.8 ...
##  $ Level2: num  75 44.1 20 50.2 78.9 30.8 34.8 49.8 89.7 65.7 ...
##  $ Level3: num  0 0 0 30.6 55.5 26.3 48.9 44.6 88.6 50.4 ...

## show first 6 rows of data
head(dta2)

##                                    Name Level1 Level2 Level3
## 1                     Al-Madinah School   61.5   75.0    0.0
## 2                     Alfriston College   53.9   44.1    0.0
## 3 Ambury Park Centre for Riding Therapy   33.3   20.0    0.0
## 4                        Aorere College   39.5   50.2   30.6
## 5        Auckland Girls' Grammar School   71.2   78.9   55.5
## 6                      Auckland Grammar   22.1   30.8   26.3

## average of level1 - level3 by column
apply(dta2[, -1], MARGIN=2, FUN=mean)

##   Level1   Level2   Level3 
## 62.26705 61.06818 47.97614

## average of level1 - level3 and return list
lapply(dta2[, -1], FUN=mean)

## $Level1
## [1] 62.26705
## 
## $Level2
## [1] 61.06818
## 
## $Level3
## [1] 47.97614

## simplify the list apply , return vector
sapply(dta2[, -1], FUN=mean)

##   Level1   Level2   Level3 
## 62.26705 61.06818 47.97614

## range of level1 - level3 by column
apply(dta2[, -1], MARGIN=2, FUN=range)

##      Level1 Level2 Level3
## [1,]    2.8    0.0    0.0
## [2,]   97.4   95.7   95.7

## range of level1 - level3, return list
lapply(dta2[, -1], FUN=range)

## $Level1
## [1]  2.8 97.4
## 
## $Level2
## [1]  0.0 95.7
## 
## $Level3
## [1]  0.0 95.7

## simplify the list apply , return vector
sapply(dta2[, -1], FUN=range)

##      Level1 Level2 Level3
## [1,]    2.8    0.0    0.0
## [2,]   97.4   95.7   95.7

## splitting 
## split Roll by Auth
rollsByAuth <- split(dta$Roll, dta$Auth) 

## check data structure  
str(rollsByAuth)

## List of 4
##  $ Other           : int 51
##  $ Private         : int [1:99] 255 39 154 73 83 25 95 85 94 729 ...
##  $ State           : int [1:2144] 318 200 455 86 577 329 637 395 201 267 ...
##  $ State Integrated: int [1:327] 438 26 191 560 151 114 126 171 211 57 ...

## rollsByAuth is a list
class(rollsByAuth)

## [1] "list"

## mean of Roll in each category od Auth, return list 
lapply(split(dta$Roll, dta$Auth), mean) #--(1)

## $Other
## [1] 51
## 
## $Private
## [1] 308.798
## 
## $State
## [1] 300.6301
## 
## $`State Integrated`
## [1] 258.3792

## Alternative (1)
aggregate(dta["Roll"], by=list(dta$Auth), FUN=mean)

##            Group.1     Roll
## 1            Other  51.0000
## 2          Private 308.7980
## 3            State 300.6301
## 4 State Integrated 258.3792

## mean of Roll in each category od Auth, return vector
sapply(split(dta$Roll, dta$Auth), mean)

##            Other          Private            State State Integrated 
##          51.0000         308.7980         300.6301         258.3792

###

Functions exercise 1-5

Functions exercise 1-5

Exercise 1

Exercise 2

Exercise 3

Method 1. using aggregate.

Method 2. using `split` and the `apply family` function .

Method 3. using `subset`, `rbind`, `rbind` function.

Method 4. using forward pipe and `dplyr` package

Method 5. using forward pipe and `nest`, `map` function

Exercise 4.

Exercise 5

Functions exercise 1-5

Functions exercise 1-5

Exercise 1

Exercise 2

Exercise 3

Method 1. using aggregate.

Method 2. using split and the apply family function .

Method 3. using subset, rbind, rbind function.

Method 4. using forward pipe and dplyr package

Method 5. using forward pipe and nest, map function

Exercise 4.

Exercise 5

Method 2. using `split` and the `apply family` function .

Method 3. using `subset`, `rbind`, `rbind` function.

Method 4. using forward pipe and `dplyr` package

Method 5. using forward pipe and `nest`, `map` function