Exercise4

The data set Vocab{car} gives observations on gender, education and vocabulary, from respondents to U.S. General Social Surveys, 1972-2004. Summarize the relationship between education and vocabulary over the years by gender.

load the data and check data structure, display first 6 rows of Vocab

## Warning: package 'car' was built under R version 3.6.3
## Loading required package: carData
## 'data.frame':    30351 obs. of  4 variables:
##  $ year      : num  1974 1974 1974 1974 1974 ...
##  $ sex       : Factor w/ 2 levels "Female","Male": 2 2 1 1 1 2 2 2 1 1 ...
##  $ education : num  14 16 10 10 12 16 17 10 12 11 ...
##  $ vocabulary: num  9 9 9 5 8 8 9 5 3 5 ...
##  - attr(*, "na.action")= 'omit' Named int  1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "names")= chr  "19720001" "19720002" "19720003" "19720004" ...
##          year    sex education vocabulary
## 19740001 1974   Male        14          9
## 19740002 1974   Male        16          9
## 19740003 1974 Female        10          9
## 19740004 1974 Female        10          5
## 19740005 1974 Female        12          8
## 19740006 1974   Male        16          8

xyplot to display the liner relationship between education and vocabulary in groups in sex

frome the plot, it seems liner relationships between education and vocabulary in each year are similar.

subset by gender

## Warning: package 'dplyr' was built under R version 3.6.3

extract regression coefficient

## $`1974`
## (Intercept) x$education 
##   1.5318434   0.3713183 
## 
## $`1976`
## (Intercept) x$education 
##   1.6342960   0.3555403 
## 
## $`1978`
## (Intercept) x$education 
##   0.9762161   0.3963762 
## 
## $`1982`
## (Intercept) x$education 
##   0.9730291   0.3832637 
## 
## $`1984`
## (Intercept) x$education 
##    1.678465    0.337124 
## 
## $`1987`
## (Intercept) x$education 
##   0.8103651   0.3818373 
## 
## $`1988`
## (Intercept) x$education 
##   1.0459936   0.3592442 
## 
## $`1989`
## (Intercept) x$education 
##   1.0596176   0.3708525 
## 
## $`1990`
## (Intercept) x$education 
##   1.7000935   0.3377029 
## 
## $`1991`
## (Intercept) x$education 
##   1.2504604   0.3683962 
## 
## $`1993`
## (Intercept) x$education 
##   1.6384884   0.3221049 
## 
## $`1994`
## (Intercept) x$education 
##   1.8684770   0.3146151 
## 
## $`1996`
## (Intercept) x$education 
##   0.8221711   0.3770325 
## 
## $`1998`
## (Intercept) x$education 
##   1.5199973   0.3314754 
## 
## $`2000`
## (Intercept) x$education 
##   1.1203888   0.3558918 
## 
## $`2004`
## (Intercept) x$education 
##   1.4259424   0.3411153 
## 
## $`2006`
## (Intercept) x$education 
##   2.1383454   0.2952926 
## 
## $`2008`
## (Intercept) x$education 
##   1.4212286   0.3277987 
## 
## $`2010`
## (Intercept) x$education 
##   1.7996389   0.3135749 
## 
## $`2012`
## (Intercept) x$education 
##   1.7303105   0.3061534 
## 
## $`2014`
## (Intercept) x$education 
##   1.4804789   0.3262112 
## 
## $`2016`
## (Intercept) x$education 
##   1.8562367   0.3031146
## $`1974`
## (Intercept) x$education 
##   1.5652579   0.3816095 
## 
## $`1976`
## (Intercept) x$education 
##   1.7021281   0.3824002 
## 
## $`1978`
## (Intercept) x$education 
##   1.3006416   0.4002707 
## 
## $`1982`
## (Intercept) x$education 
##   0.9829602   0.3949758 
## 
## $`1984`
## (Intercept) x$education 
##   1.4536872   0.3728698 
## 
## $`1987`
## (Intercept) x$education 
##   0.9647931   0.3843508 
## 
## $`1988`
## (Intercept) x$education 
##   1.1634561   0.3763999 
## 
## $`1989`
## (Intercept) x$education 
##   1.0682600   0.3863606 
## 
## $`1990`
## (Intercept) x$education 
##   0.4594812   0.4346902 
## 
## $`1991`
## (Intercept) x$education 
##   1.1543766   0.3875821 
## 
## $`1993`
## (Intercept) x$education 
##   1.7388287   0.3286325 
## 
## $`1994`
## (Intercept) x$education 
##   1.6453365   0.3422146 
## 
## $`1996`
## (Intercept) x$education 
##   1.1482811   0.3727178 
## 
## $`1998`
## (Intercept) x$education 
##   1.4472751   0.3592843 
## 
## $`2000`
## (Intercept) x$education 
##   1.9276040   0.3155532 
## 
## $`2004`
## (Intercept) x$education 
##    2.104150    0.304056 
## 
## $`2006`
## (Intercept) x$education 
##   2.7777171   0.2535376 
## 
## $`2008`
## (Intercept) x$education 
##   2.6074315   0.2553971 
## 
## $`2010`
## (Intercept) x$education 
##   1.3520300   0.3468821 
## 
## $`2012`
## (Intercept) x$education 
##   1.7535298   0.3080832 
## 
## $`2014`
## (Intercept) x$education 
##   2.3445239   0.2663464 
## 
## $`2016`
## (Intercept) x$education 
##   2.0055919   0.2928955

Exercise 5

the ‘MASS’ library has these two data sets: ‘Animals’ and ‘mammals’. Merge the two files and remove duplicated observations using ‘duplicated’.

loading datasets

## Warning: package 'MASS' was built under R version 3.6.3
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select

check data structure

## 'data.frame':    28 obs. of  2 variables:
##  $ body : num  1.35 465 36.33 27.66 1.04 ...
##  $ brain: num  8.1 423 119.5 115 5.5 ...
## 'data.frame':    62 obs. of  2 variables:
##  $ body : num  3.38 0.48 1.35 465 36.33 ...
##  $ brain: num  44.5 15.5 8.1 423 119.5 ...
##                     body brain
## Mountain beaver     1.35   8.1
## Cow               465.00 423.0
## Grey wolf          36.33 119.5
## Goat               27.66 115.0
## Guinea pig          1.04   5.5
## Dipliodocus     11700.00  50.0
##                    body brain
## Arctic fox        3.385  44.5
## Owl monkey        0.480  15.5
## Mountain beaver   1.350   8.1
## Cow             465.000 423.0
## Grey wolf        36.330 119.5
## Goat             27.660 115.0

merge two dataset and remove duplicated rows

##         body   brain                    specie
## 1      0.005    0.14 Lesser short-tailed shrew
## 2      0.010    0.25          Little brown bat
## 3      0.023    0.30             Big brown bat
## 4      0.023    0.40                     Mouse
## 5      0.048    0.33                Musk shrew
## 6      0.060    1.00           Star-nosed mole
## 7      0.075    1.20          E. American mole
## 8      0.101    4.00           Ground squirrel
## 9      0.104    2.50                Tree shrew
## 10     0.120    1.00            Golden hamster
## 11     0.122    3.00                      Mole
## 12     0.122    3.00                  Mole rat
## 13     0.200    5.00                    Galago
## 14     0.280    1.90                       Rat
## 15     0.425    6.40                Chinchilla
## 16     0.480   15.50                Owl monkey
## 17     0.550    2.40           Desert hedgehog
## 18     0.750   12.30              Rock hyrax-a
## 19     0.785    3.50         European hedgehog
## 20     0.900    2.60                    Tenrec
## 21     0.920    5.70    Arctic ground squirrel
## 22     1.000    6.60 African giant pouched rat
## 23     1.040    5.50                Guinea pig
## 24     1.350    8.10           Mountain beaver
## 25     1.400   12.50                Slow loris
## 26     1.410   17.50                     Genet
## 27     1.620   11.40                 Phalanger
## 28     1.700    6.30              N.A. opossum
## 29     2.000   12.30                Tree hyrax
## 30     2.500   12.10                    Rabbit
## 31     3.000   25.00                   Echidna
## 32     3.300   25.60                       Cat
## 33     3.385   44.50                Arctic fox
## 34     3.500    3.90             Water opossum
## 35     3.500   10.80     Nine-banded armadillo
## 36     3.600   21.00              Rock hyrax-b
## 37     4.050   17.00     Yellow-bellied marmot
## 38     4.190   58.00                    Verbet
## 39     4.235   50.40                   Red fox
## 40     4.288   39.20                   Raccoon
## 41     6.800  179.00             Rhesus monkey
## 42    10.000  115.00              Patas monkey
## 43    10.000  115.00              Potar monkey
## 44    10.550  179.50                    Baboon
## 45    14.830   98.20                  Roe deer
## 46    27.660  115.00                      Goat
## 47    35.000   56.00                  Kangaroo
## 48    36.330  119.50                 Grey wolf
## 49    52.160  440.00                Chimpanzee
## 50    55.500  175.00                     Sheep
## 51    60.000   81.00           Giant armadillo
## 52    62.000 1320.00                     Human
## 53    85.000  325.00                 Grey seal
## 54   100.000  157.00                    Jaguar
## 55   160.000  169.00           Brazilian tapir
## 56   187.100  419.00                    Donkey
## 57   192.000  180.00                       Pig
## 58   207.000  406.00                   Gorilla
## 59   250.000  490.00                     Okapi
## 60   465.000  423.00                       Cow
## 61   521.000  655.00                     Horse
## 62   529.000  680.00                   Giraffe
## 63  2547.000 4603.00            Asian elephant
## 64  6654.000 5712.00          African elephant
## 65  9400.000   70.00               Triceratops
## 66 11700.000   50.00               Dipliodocus
## 67 87000.000  154.50             Brachiosaurus

Exercise 6

Convert the data set probe words from long to wide format as described.

Load data

check data structure and first 6 rows of data

## 'data.frame':    55 obs. of  3 variables:
##  $ ID           : Factor w/ 11 levels "S01","S02","S03",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ Response_Time: int  51 36 50 35 42 27 20 26 17 27 ...
##  $ Position     : int  1 2 3 4 5 1 2 3 4 5 ...
##    ID Response_Time Position
## 1 S01            51        1
## 2 S01            36        2
## 3 S01            50        3
## 4 S01            35        4
## 5 S01            42        5
## 6 S02            27        1

using function{spread} to convert dataset from long form to wide form and rename the columns

## Warning: package 'tidyr' was built under R version 3.6.3
##    ID Pos_1 Pos_2 Pos_3 Pos_4 Pos_5
## 1 S01    51    36    50    35    42
## 2 S02    27    20    26    17    27
## 3 S03    37    22    41    37    30
## 4 S04    42    36    32    34    27
## 5 S05    27    18    33    14    29
## 6 S06    43    32    43    35    40
## 'data.frame':    11 obs. of  6 variables:
##  $ ID   : Factor w/ 11 levels "S01","S02","S03",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Pos_1: int  51 27 37 42 27 43 41 38 36 26 ...
##  $ Pos_2: int  36 20 22 36 18 32 22 21 23 31 ...
##  $ Pos_3: int  50 26 41 32 33 43 36 31 27 31 ...
##  $ Pos_4: int  35 17 37 34 14 35 25 20 25 32 ...
##  $ Pos_5: int  42 27 30 27 29 40 38 16 28 36 ...