0330 In-class-exercise 4-6
Exercise4
The data set Vocab{car} gives observations on gender, education and vocabulary, from respondents to U.S. General Social Surveys, 1972-2004. Summarize the relationship between education and vocabulary over the years by gender.
load the data and check data structure, display first 6 rows of Vocab
## Warning: package 'car' was built under R version 3.6.3
## Loading required package: carData
## 'data.frame': 30351 obs. of 4 variables:
## $ year : num 1974 1974 1974 1974 1974 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 1 1 2 2 2 1 1 ...
## $ education : num 14 16 10 10 12 16 17 10 12 11 ...
## $ vocabulary: num 9 9 9 5 8 8 9 5 3 5 ...
## - attr(*, "na.action")= 'omit' Named int 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "names")= chr "19720001" "19720002" "19720003" "19720004" ...
## year sex education vocabulary
## 19740001 1974 Male 14 9
## 19740002 1974 Male 16 9
## 19740003 1974 Female 10 9
## 19740004 1974 Female 10 5
## 19740005 1974 Female 12 8
## 19740006 1974 Male 16 8
xyplot to display the liner relationship between education and vocabulary in groups in sex
dta<-Vocab
lattice::xyplot(vocabulary ~ education| sex, groups=year, data=dta, type=c("g","r"), auto.key=list(columns=6))frome the plot, it seems liner relationships between education and vocabulary in each year are similar.
subset by gender
## Warning: package 'dplyr' was built under R version 3.6.3
extract regression coefficient
## $`1974`
## (Intercept) x$education
## 1.5318434 0.3713183
##
## $`1976`
## (Intercept) x$education
## 1.6342960 0.3555403
##
## $`1978`
## (Intercept) x$education
## 0.9762161 0.3963762
##
## $`1982`
## (Intercept) x$education
## 0.9730291 0.3832637
##
## $`1984`
## (Intercept) x$education
## 1.678465 0.337124
##
## $`1987`
## (Intercept) x$education
## 0.8103651 0.3818373
##
## $`1988`
## (Intercept) x$education
## 1.0459936 0.3592442
##
## $`1989`
## (Intercept) x$education
## 1.0596176 0.3708525
##
## $`1990`
## (Intercept) x$education
## 1.7000935 0.3377029
##
## $`1991`
## (Intercept) x$education
## 1.2504604 0.3683962
##
## $`1993`
## (Intercept) x$education
## 1.6384884 0.3221049
##
## $`1994`
## (Intercept) x$education
## 1.8684770 0.3146151
##
## $`1996`
## (Intercept) x$education
## 0.8221711 0.3770325
##
## $`1998`
## (Intercept) x$education
## 1.5199973 0.3314754
##
## $`2000`
## (Intercept) x$education
## 1.1203888 0.3558918
##
## $`2004`
## (Intercept) x$education
## 1.4259424 0.3411153
##
## $`2006`
## (Intercept) x$education
## 2.1383454 0.2952926
##
## $`2008`
## (Intercept) x$education
## 1.4212286 0.3277987
##
## $`2010`
## (Intercept) x$education
## 1.7996389 0.3135749
##
## $`2012`
## (Intercept) x$education
## 1.7303105 0.3061534
##
## $`2014`
## (Intercept) x$education
## 1.4804789 0.3262112
##
## $`2016`
## (Intercept) x$education
## 1.8562367 0.3031146
## $`1974`
## (Intercept) x$education
## 1.5652579 0.3816095
##
## $`1976`
## (Intercept) x$education
## 1.7021281 0.3824002
##
## $`1978`
## (Intercept) x$education
## 1.3006416 0.4002707
##
## $`1982`
## (Intercept) x$education
## 0.9829602 0.3949758
##
## $`1984`
## (Intercept) x$education
## 1.4536872 0.3728698
##
## $`1987`
## (Intercept) x$education
## 0.9647931 0.3843508
##
## $`1988`
## (Intercept) x$education
## 1.1634561 0.3763999
##
## $`1989`
## (Intercept) x$education
## 1.0682600 0.3863606
##
## $`1990`
## (Intercept) x$education
## 0.4594812 0.4346902
##
## $`1991`
## (Intercept) x$education
## 1.1543766 0.3875821
##
## $`1993`
## (Intercept) x$education
## 1.7388287 0.3286325
##
## $`1994`
## (Intercept) x$education
## 1.6453365 0.3422146
##
## $`1996`
## (Intercept) x$education
## 1.1482811 0.3727178
##
## $`1998`
## (Intercept) x$education
## 1.4472751 0.3592843
##
## $`2000`
## (Intercept) x$education
## 1.9276040 0.3155532
##
## $`2004`
## (Intercept) x$education
## 2.104150 0.304056
##
## $`2006`
## (Intercept) x$education
## 2.7777171 0.2535376
##
## $`2008`
## (Intercept) x$education
## 2.6074315 0.2553971
##
## $`2010`
## (Intercept) x$education
## 1.3520300 0.3468821
##
## $`2012`
## (Intercept) x$education
## 1.7535298 0.3080832
##
## $`2014`
## (Intercept) x$education
## 2.3445239 0.2663464
##
## $`2016`
## (Intercept) x$education
## 2.0055919 0.2928955
Exercise 5
the ‘MASS’ library has these two data sets: ‘Animals’ and ‘mammals’. Merge the two files and remove duplicated observations using ‘duplicated’.
loading datasets
## Warning: package 'MASS' was built under R version 3.6.3
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
check data structure
## 'data.frame': 28 obs. of 2 variables:
## $ body : num 1.35 465 36.33 27.66 1.04 ...
## $ brain: num 8.1 423 119.5 115 5.5 ...
## 'data.frame': 62 obs. of 2 variables:
## $ body : num 3.38 0.48 1.35 465 36.33 ...
## $ brain: num 44.5 15.5 8.1 423 119.5 ...
## body brain
## Mountain beaver 1.35 8.1
## Cow 465.00 423.0
## Grey wolf 36.33 119.5
## Goat 27.66 115.0
## Guinea pig 1.04 5.5
## Dipliodocus 11700.00 50.0
## body brain
## Arctic fox 3.385 44.5
## Owl monkey 0.480 15.5
## Mountain beaver 1.350 8.1
## Cow 465.000 423.0
## Grey wolf 36.330 119.5
## Goat 27.660 115.0
merge two dataset and remove duplicated rows
## body brain specie
## 1 0.005 0.14 Lesser short-tailed shrew
## 2 0.010 0.25 Little brown bat
## 3 0.023 0.30 Big brown bat
## 4 0.023 0.40 Mouse
## 5 0.048 0.33 Musk shrew
## 6 0.060 1.00 Star-nosed mole
## 7 0.075 1.20 E. American mole
## 8 0.101 4.00 Ground squirrel
## 9 0.104 2.50 Tree shrew
## 10 0.120 1.00 Golden hamster
## 11 0.122 3.00 Mole
## 12 0.122 3.00 Mole rat
## 13 0.200 5.00 Galago
## 14 0.280 1.90 Rat
## 15 0.425 6.40 Chinchilla
## 16 0.480 15.50 Owl monkey
## 17 0.550 2.40 Desert hedgehog
## 18 0.750 12.30 Rock hyrax-a
## 19 0.785 3.50 European hedgehog
## 20 0.900 2.60 Tenrec
## 21 0.920 5.70 Arctic ground squirrel
## 22 1.000 6.60 African giant pouched rat
## 23 1.040 5.50 Guinea pig
## 24 1.350 8.10 Mountain beaver
## 25 1.400 12.50 Slow loris
## 26 1.410 17.50 Genet
## 27 1.620 11.40 Phalanger
## 28 1.700 6.30 N.A. opossum
## 29 2.000 12.30 Tree hyrax
## 30 2.500 12.10 Rabbit
## 31 3.000 25.00 Echidna
## 32 3.300 25.60 Cat
## 33 3.385 44.50 Arctic fox
## 34 3.500 3.90 Water opossum
## 35 3.500 10.80 Nine-banded armadillo
## 36 3.600 21.00 Rock hyrax-b
## 37 4.050 17.00 Yellow-bellied marmot
## 38 4.190 58.00 Verbet
## 39 4.235 50.40 Red fox
## 40 4.288 39.20 Raccoon
## 41 6.800 179.00 Rhesus monkey
## 42 10.000 115.00 Patas monkey
## 43 10.000 115.00 Potar monkey
## 44 10.550 179.50 Baboon
## 45 14.830 98.20 Roe deer
## 46 27.660 115.00 Goat
## 47 35.000 56.00 Kangaroo
## 48 36.330 119.50 Grey wolf
## 49 52.160 440.00 Chimpanzee
## 50 55.500 175.00 Sheep
## 51 60.000 81.00 Giant armadillo
## 52 62.000 1320.00 Human
## 53 85.000 325.00 Grey seal
## 54 100.000 157.00 Jaguar
## 55 160.000 169.00 Brazilian tapir
## 56 187.100 419.00 Donkey
## 57 192.000 180.00 Pig
## 58 207.000 406.00 Gorilla
## 59 250.000 490.00 Okapi
## 60 465.000 423.00 Cow
## 61 521.000 655.00 Horse
## 62 529.000 680.00 Giraffe
## 63 2547.000 4603.00 Asian elephant
## 64 6654.000 5712.00 African elephant
## 65 9400.000 70.00 Triceratops
## 66 11700.000 50.00 Dipliodocus
## 67 87000.000 154.50 Brachiosaurus
Exercise 6
Convert the data set probe words from long to wide format as described.
Load data
check data structure and first 6 rows of data
## 'data.frame': 55 obs. of 3 variables:
## $ ID : Factor w/ 11 levels "S01","S02","S03",..: 1 1 1 1 1 2 2 2 2 2 ...
## $ Response_Time: int 51 36 50 35 42 27 20 26 17 27 ...
## $ Position : int 1 2 3 4 5 1 2 3 4 5 ...
## ID Response_Time Position
## 1 S01 51 1
## 2 S01 36 2
## 3 S01 50 3
## 4 S01 35 4
## 5 S01 42 5
## 6 S02 27 1
using function{spread} to convert dataset from long form to wide form and rename the columns
## Warning: package 'tidyr' was built under R version 3.6.3
probw<-probword%>%spread(key=Position, value=Response_Time, sep="")
colnames(probw)<-c("ID", paste0("Pos_", c(1:5)))
head(probw)## ID Pos_1 Pos_2 Pos_3 Pos_4 Pos_5
## 1 S01 51 36 50 35 42
## 2 S02 27 20 26 17 27
## 3 S03 37 22 41 37 30
## 4 S04 42 36 32 34 27
## 5 S05 27 18 33 14 29
## 6 S06 43 32 43 35 40
## 'data.frame': 11 obs. of 6 variables:
## $ ID : Factor w/ 11 levels "S01","S02","S03",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Pos_1: int 51 27 37 42 27 43 41 38 36 26 ...
## $ Pos_2: int 36 20 22 36 18 32 22 21 23 31 ...
## $ Pos_3: int 50 26 41 32 33 43 36 31 27 31 ...
## $ Pos_4: int 35 17 37 34 14 35 25 20 25 32 ...
## $ Pos_5: int 42 27 30 27 29 40 38 16 28 36 ...