0330 In-class-exercise 4-6

Exercise4

The data set Vocab{car} gives observations on gender, education and vocabulary, from respondents to U.S. General Social Surveys, 1972-2004. Summarize the relationship between education and vocabulary over the years by gender.

load the data and check data structure, display first 6 rows of Vocab

library(car)

## Warning: package 'car' was built under R version 3.6.3

## Loading required package: carData

data(Vocab)
str(Vocab)

## 'data.frame':    30351 obs. of  4 variables:
##  $ year      : num  1974 1974 1974 1974 1974 ...
##  $ sex       : Factor w/ 2 levels "Female","Male": 2 2 1 1 1 2 2 2 1 1 ...
##  $ education : num  14 16 10 10 12 16 17 10 12 11 ...
##  $ vocabulary: num  9 9 9 5 8 8 9 5 3 5 ...
##  - attr(*, "na.action")= 'omit' Named int  1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "names")= chr  "19720001" "19720002" "19720003" "19720004" ...

head(Vocab)

##          year    sex education vocabulary
## 19740001 1974   Male        14          9
## 19740002 1974   Male        16          9
## 19740003 1974 Female        10          9
## 19740004 1974 Female        10          5
## 19740005 1974 Female        12          8
## 19740006 1974   Male        16          8

xyplot to display the liner relationship between education and vocabulary in groups in sex

dta<-Vocab
 lattice::xyplot(vocabulary ~ education| sex, groups=year, data=dta, type=c("g","r"), auto.key=list(columns=6))

frome the plot, it seems liner relationships between education and vocabulary in each year are similar.

subset by gender

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.3

dtam<-dta%>%filter(dta$sex=="Male")
dtaf<-dta%>%filter(dta$sex=="Female")

extract regression coefficient

lapply(split(dtam, dtam$year), function(x) coef(lm(x$vocabulary ~ x$education)))

## $`1974`
## (Intercept) x$education 
##   1.5318434   0.3713183 
## 
## $`1976`
## (Intercept) x$education 
##   1.6342960   0.3555403 
## 
## $`1978`
## (Intercept) x$education 
##   0.9762161   0.3963762 
## 
## $`1982`
## (Intercept) x$education 
##   0.9730291   0.3832637 
## 
## $`1984`
## (Intercept) x$education 
##    1.678465    0.337124 
## 
## $`1987`
## (Intercept) x$education 
##   0.8103651   0.3818373 
## 
## $`1988`
## (Intercept) x$education 
##   1.0459936   0.3592442 
## 
## $`1989`
## (Intercept) x$education 
##   1.0596176   0.3708525 
## 
## $`1990`
## (Intercept) x$education 
##   1.7000935   0.3377029 
## 
## $`1991`
## (Intercept) x$education 
##   1.2504604   0.3683962 
## 
## $`1993`
## (Intercept) x$education 
##   1.6384884   0.3221049 
## 
## $`1994`
## (Intercept) x$education 
##   1.8684770   0.3146151 
## 
## $`1996`
## (Intercept) x$education 
##   0.8221711   0.3770325 
## 
## $`1998`
## (Intercept) x$education 
##   1.5199973   0.3314754 
## 
## $`2000`
## (Intercept) x$education 
##   1.1203888   0.3558918 
## 
## $`2004`
## (Intercept) x$education 
##   1.4259424   0.3411153 
## 
## $`2006`
## (Intercept) x$education 
##   2.1383454   0.2952926 
## 
## $`2008`
## (Intercept) x$education 
##   1.4212286   0.3277987 
## 
## $`2010`
## (Intercept) x$education 
##   1.7996389   0.3135749 
## 
## $`2012`
## (Intercept) x$education 
##   1.7303105   0.3061534 
## 
## $`2014`
## (Intercept) x$education 
##   1.4804789   0.3262112 
## 
## $`2016`
## (Intercept) x$education 
##   1.8562367   0.3031146

lapply(split(dtaf, dtaf$year), function(x) coef(lm(x$vocabulary ~ x$education)))

## $`1974`
## (Intercept) x$education 
##   1.5652579   0.3816095 
## 
## $`1976`
## (Intercept) x$education 
##   1.7021281   0.3824002 
## 
## $`1978`
## (Intercept) x$education 
##   1.3006416   0.4002707 
## 
## $`1982`
## (Intercept) x$education 
##   0.9829602   0.3949758 
## 
## $`1984`
## (Intercept) x$education 
##   1.4536872   0.3728698 
## 
## $`1987`
## (Intercept) x$education 
##   0.9647931   0.3843508 
## 
## $`1988`
## (Intercept) x$education 
##   1.1634561   0.3763999 
## 
## $`1989`
## (Intercept) x$education 
##   1.0682600   0.3863606 
## 
## $`1990`
## (Intercept) x$education 
##   0.4594812   0.4346902 
## 
## $`1991`
## (Intercept) x$education 
##   1.1543766   0.3875821 
## 
## $`1993`
## (Intercept) x$education 
##   1.7388287   0.3286325 
## 
## $`1994`
## (Intercept) x$education 
##   1.6453365   0.3422146 
## 
## $`1996`
## (Intercept) x$education 
##   1.1482811   0.3727178 
## 
## $`1998`
## (Intercept) x$education 
##   1.4472751   0.3592843 
## 
## $`2000`
## (Intercept) x$education 
##   1.9276040   0.3155532 
## 
## $`2004`
## (Intercept) x$education 
##    2.104150    0.304056 
## 
## $`2006`
## (Intercept) x$education 
##   2.7777171   0.2535376 
## 
## $`2008`
## (Intercept) x$education 
##   2.6074315   0.2553971 
## 
## $`2010`
## (Intercept) x$education 
##   1.3520300   0.3468821 
## 
## $`2012`
## (Intercept) x$education 
##   1.7535298   0.3080832 
## 
## $`2014`
## (Intercept) x$education 
##   2.3445239   0.2663464 
## 
## $`2016`
## (Intercept) x$education 
##   2.0055919   0.2928955

Exercise 5

the ‘MASS’ library has these two data sets: ‘Animals’ and ‘mammals’. Merge the two files and remove duplicated observations using ‘duplicated’.

loading datasets

library(MASS)

## Warning: package 'MASS' was built under R version 3.6.3

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

a<-MASS::Animals
b<-MASS::mammals

check data structure

str(a)

## 'data.frame':    28 obs. of  2 variables:
##  $ body : num  1.35 465 36.33 27.66 1.04 ...
##  $ brain: num  8.1 423 119.5 115 5.5 ...

str(b)

## 'data.frame':    62 obs. of  2 variables:
##  $ body : num  3.38 0.48 1.35 465 36.33 ...
##  $ brain: num  44.5 15.5 8.1 423 119.5 ...

head(a)

##                     body brain
## Mountain beaver     1.35   8.1
## Cow               465.00 423.0
## Grey wolf          36.33 119.5
## Goat               27.66 115.0
## Guinea pig          1.04   5.5
## Dipliodocus     11700.00  50.0

head(b)

##                    body brain
## Arctic fox        3.385  44.5
## Owl monkey        0.480  15.5
## Mountain beaver   1.350   8.1
## Cow             465.000 423.0
## Grey wolf        36.330 119.5
## Goat             27.660 115.0

merge two dataset and remove duplicated rows

a$specie<-row.names(a)
b$specie<-row.names(b)
unique(merge(a, b, all=TRUE))

##         body   brain                    specie
## 1      0.005    0.14 Lesser short-tailed shrew
## 2      0.010    0.25          Little brown bat
## 3      0.023    0.30             Big brown bat
## 4      0.023    0.40                     Mouse
## 5      0.048    0.33                Musk shrew
## 6      0.060    1.00           Star-nosed mole
## 7      0.075    1.20          E. American mole
## 8      0.101    4.00           Ground squirrel
## 9      0.104    2.50                Tree shrew
## 10     0.120    1.00            Golden hamster
## 11     0.122    3.00                      Mole
## 12     0.122    3.00                  Mole rat
## 13     0.200    5.00                    Galago
## 14     0.280    1.90                       Rat
## 15     0.425    6.40                Chinchilla
## 16     0.480   15.50                Owl monkey
## 17     0.550    2.40           Desert hedgehog
## 18     0.750   12.30              Rock hyrax-a
## 19     0.785    3.50         European hedgehog
## 20     0.900    2.60                    Tenrec
## 21     0.920    5.70    Arctic ground squirrel
## 22     1.000    6.60 African giant pouched rat
## 23     1.040    5.50                Guinea pig
## 24     1.350    8.10           Mountain beaver
## 25     1.400   12.50                Slow loris
## 26     1.410   17.50                     Genet
## 27     1.620   11.40                 Phalanger
## 28     1.700    6.30              N.A. opossum
## 29     2.000   12.30                Tree hyrax
## 30     2.500   12.10                    Rabbit
## 31     3.000   25.00                   Echidna
## 32     3.300   25.60                       Cat
## 33     3.385   44.50                Arctic fox
## 34     3.500    3.90             Water opossum
## 35     3.500   10.80     Nine-banded armadillo
## 36     3.600   21.00              Rock hyrax-b
## 37     4.050   17.00     Yellow-bellied marmot
## 38     4.190   58.00                    Verbet
## 39     4.235   50.40                   Red fox
## 40     4.288   39.20                   Raccoon
## 41     6.800  179.00             Rhesus monkey
## 42    10.000  115.00              Patas monkey
## 43    10.000  115.00              Potar monkey
## 44    10.550  179.50                    Baboon
## 45    14.830   98.20                  Roe deer
## 46    27.660  115.00                      Goat
## 47    35.000   56.00                  Kangaroo
## 48    36.330  119.50                 Grey wolf
## 49    52.160  440.00                Chimpanzee
## 50    55.500  175.00                     Sheep
## 51    60.000   81.00           Giant armadillo
## 52    62.000 1320.00                     Human
## 53    85.000  325.00                 Grey seal
## 54   100.000  157.00                    Jaguar
## 55   160.000  169.00           Brazilian tapir
## 56   187.100  419.00                    Donkey
## 57   192.000  180.00                       Pig
## 58   207.000  406.00                   Gorilla
## 59   250.000  490.00                     Okapi
## 60   465.000  423.00                       Cow
## 61   521.000  655.00                     Horse
## 62   529.000  680.00                   Giraffe
## 63  2547.000 4603.00            Asian elephant
## 64  6654.000 5712.00          African elephant
## 65  9400.000   70.00               Triceratops
## 66 11700.000   50.00               Dipliodocus
## 67 87000.000  154.50             Brachiosaurus

Exercise 6

Convert the data set probe words from long to wide format as described.

Load data

probword<-read.table("C:/Users/USER/Desktop/R_data management/0330/probeL.txt", header=TRUE)

check data structure and first 6 rows of data

str(probword)

## 'data.frame':    55 obs. of  3 variables:
##  $ ID           : Factor w/ 11 levels "S01","S02","S03",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ Response_Time: int  51 36 50 35 42 27 20 26 17 27 ...
##  $ Position     : int  1 2 3 4 5 1 2 3 4 5 ...

head(probword)

##    ID Response_Time Position
## 1 S01            51        1
## 2 S01            36        2
## 3 S01            50        3
## 4 S01            35        4
## 5 S01            42        5
## 6 S02            27        1

using function{spread} to convert dataset from long form to wide form and rename the columns

library(tidyr)

## Warning: package 'tidyr' was built under R version 3.6.3

probw<-probword%>%spread(key=Position, value=Response_Time, sep="")
colnames(probw)<-c("ID", paste0("Pos_", c(1:5)))
head(probw)

##    ID Pos_1 Pos_2 Pos_3 Pos_4 Pos_5
## 1 S01    51    36    50    35    42
## 2 S02    27    20    26    17    27
## 3 S03    37    22    41    37    30
## 4 S04    42    36    32    34    27
## 5 S05    27    18    33    14    29
## 6 S06    43    32    43    35    40

str(probw)

## 'data.frame':    11 obs. of  6 variables:
##  $ ID   : Factor w/ 11 levels "S01","S02","S03",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Pos_1: int  51 27 37 42 27 43 41 38 36 26 ...
##  $ Pos_2: int  36 20 22 36 18 32 22 21 23 31 ...
##  $ Pos_3: int  50 26 41 32 33 43 36 31 27 31 ...
##  $ Pos_4: int  35 17 37 34 14 35 25 20 25 32 ...
##  $ Pos_5: int  42 27 30 27 29 40 38 16 28 36 ...